The Two Data Quality Definitions

If you search on Google for “data quality” you will find the ever-recurring discussion on how we can define data quality.

This is also true for the top ranked none sponsored articles as the Wikipedia page on data quality and an article from Profisee called Data Quality – What, Why, How, 10 Best Practices & More!

The two predominant definitions are that data is of high quality if the data:

  • Is fit for the intended purpose of use.
  • Correctly represent the real-world construct that the data describes.

Personally, I think it is a balance.

Data Quality Definition

In theory I am on the right side. This is probably because I most often work with master data, where the same data have multiple purposes.

However, as a consultant helping organizations with getting the funding in place and getting the data quality improvement done within time and budget I do end up on the other side.

What about you? Where do you stand in this question?

2 thoughts on “The Two Data Quality Definitions

  1. Gani Hamiti 5th December 2019 / 09:36

    Interesting post!

    I tend to think that the correct representation of reality (or, actually, the best possible representation of reality at time t ; as we know, in most contexts it may not be possible to verify the full correctness of data compared to the real world) is a subset of the fitness for use of the data; e.g. let’s assume a very basic case where you have to use the date of birth of your clients in a given system to check if they are over 18: if the date per se is not the real one, your data can definitely not be of good enough quality nor can it be fit for your use; but even if it is the real DOB of your clients, it may still not be fully fit for use, e.g. because of formatting issues (in one case you may have “1980/01/01”, in the other “1 january 1980”, etc.).

    As such, data could hardly get fitter for use by losing adequacy to what it represents in the real world, imo.

    • Henrik Gabs Liliendahl 11th December 2019 / 09:43

      Thanks for commenting Gani. I agree about “the best possible representation of reality”. In the example of fit for purpose in the over 18 case one could also imagine that only a yes or no to over 18 is recorded. In this case this is fit for purpose for that scenario. However, in other scenarios you would need over 21, over 26 and so on and if it is under something, that would only be valid at the recording time. Therefore, a DOB is better to fit multiple purposes. As you rightly state formatting of a date is a challenge. Is 12/11/2019 the 11th December 2019 (as it is in the US) or 12th November (as it is in the rest of the world)? Is 1980/01/01 the 1st January 1980 or just 1980?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s