Gartner (the analyst firm), represented by Saul Judah, takes data quality back to basics in the recent post called Data Quality Improvement.
While I agree with the sentiment around measuring the facts as expressed in the post I have cautions about relying on that everything is good when data are fit for the purpose for business operations.
Some clues lies in the data quality dimensions mentioned in the post:
Accuracy (for now):
As said in the Gartner post data are indeed temporal. The real world changes and so does business operations. When you got your data fit for the purpose of use the business operations has changed. And when you got your data re-fit for the new purpose of use the business operations has changed again.
Furthermore most organizations can’t take all business operations into account at the same time. If you go down the fit for purpose track you will typically address a single business objective and make data fit for that purpose. Not at least when dealing with master data there are many business objectives and derived purposes of use. In my experience that leads to this conclusion:
“While we value that data are of high quality if they are fit for the intended use we value more that data correctly represent the real-world construct to which they refer in order to be fit for current and future multiple purposes”
Existence – an aspect of completeness:
The Gartner post mentions a data quality dimension being existence. I tend to see this as an aspect of the broader used term completeness.
For example having a fit for purpose completeness related to product master data has been a huge challenge for many organizations within retail and distribution during the last years as explained in the post Customer Friendly Product Master Data.
I do agree! The question “Fit for purpose – but which purpose? – for whom? – and when?” is a real challenge. This is also relevant considering quality of public sector information and within the present Danish Basic Data program. But I believe alas, that a single simple answer to the question exists
Sorry: should be “… does not exist.”
The believe that data is temporal and degrades over time is one of the major fallacies that trips up the data quality world (the fact that they are obsessed with data quality as opposed to information quality is another!)
In a recent post DQGlobal http://www.dqglobal.com/the-longer-you-delay-the-more-the-data-decay/ talked about ‘data decay’. Here is my reply:
Some great points made in this article.
However, their is a major error – data in a database does not decay! Data in databases is unique in that it has ‘inverse entropy’. This means that, unlike other systems, it does not require energy to be applied to maintain its integrity. In fact most data errors are created by the application of energy from outside.
So, any value that you enter into a database, if true at the time of entry, will remain true for as long as the database exists. For example, if you entered an address for John Smith as ‘127 High St’ on the 25 Jan 2014 and that was the correct address for John at that time, then this entry will always be correct, even if John later moves, as it was his address on 25 Jan 2014.
What will be effected if John moves is the currency of the INFORMATION that the database represents.
In the case of John’s move the database is still able to provide the correct information on where he lived on 25 Jan 2014 but it cannot provide information on where he lives now. Data correct, information incomplete.
Enterprises work on information, not data. Digital data are merely the building blocks for real world information.
So, collect data, provide information.
Nearly all of the ‘data’ quality paradoxes are solved by concentrating on information quality rather than data quality.
Thanks a lot Morten and John for commenting.
About information quality I have noticed that this term hasn’t had much success in the data quality world outside academic circles, but I think it may have legs in the business world if we add newer disciplines as Master Data Management (MDM) and data governance.
The example with the address leads to looking at the distinction between transaction data and master data. What you describe John looks more like a transaction as when John Smith moved to that address or at what address John Smith lived when we on-boarded John Smith’s data as an customer or rather as a party with a customer or other role.
MDM must also evolve beyond the storage and federation themes and embrace practical maintenance of master data as, staying with addresses, told in the post The Relocation Event.
Party and Location are Both Master Entities. However, the association between them is actually a many-to-many relationship on two separate counts. The first is that a Party (say a person) might reside at more that one Location over time. In this case the relationship with the Location is qualified by a start and end date.
The next relationship is defined by Location use. For example one use could be a billing address, another a delivery address, another a residential address, etc. These uses would also be date dependant.
So, the relationship between any Party and any Location is qualified by both date and usage type.
Any effective MDM solution must be able to handle these structures. The problem is that many have not evolved to this level of and actually represent Location (naming it ‘address’) as an attribute of Party which is crazily merging two Master Entities!
John, I agree. I remember that I worked with a wannabe Multi-Domain MDM vendor on exactly that theme some years ago. This principle is also baked into the iDQ™ MDM Edition I am with too these days.
By the way, in the iDQ™ MDM concept we handle the relocation as an event, which is controlled within the Data Steward area of the service. This event can be triggered by external sources or by an internal initiation. It may be a fully or a semi-automated process that fulfills this event as a transaction.