Today I noticed this tweet by Malcolm Chisholm:
I agree.
The problem with the “fitness for use” or “fit for the purpose of use” definition of data quality has been a recurring subject on this blog starting with the post Fit for What Purpose? through to lately the post Inaccurately Accurate discussing the data quality of the British electoral roll seen from either a strict electoral point of view and the point of view from external use of the electoral roll.
The problem with “fitness of use” becomes clear when data quality has to be addressed within master data management. Master data has, per definition so to say, many uses.
My thesis is that there is a breakeven point when including more and more purposes where it will be less cumbersome to reflect the real world object rather than trying to align all known purposes.
Today Jim Harris made an (as ever) excellent post related to how data actually represents what it purports to represent – now and tomorrow too. Find the post called Syncing versus Streaming on the Data Roundtable.
I am pleased to see others recognize the shortcomings of “fit for purpose” as a definition for data quality. I only wish the definition, if there even is an adequate one, can be reduced to a tweet-sized discussion. I believe it’s a tad more difficult because you really have to first settle on definitions for data and quality.
Quality has been an unresolved topic among philosophers for much of our history. Personally, I like Robert Pirsig’s perspective on quality. In short, he defines quality as “…the first slice of undivided experience.” “Quality” he states “is not a thing. It is an event. It is the event at which the subject becomes aware of the object…The Quality event is the cause of the subjects and objects, which are then mistakenly presumed to be the cause of the Quality.”
Data, on the other hand, has been largely described as the first rung in some contrived hierarchy culminating in wisdom. Yet, wisdom should be so easy to acquire as gaining knowledge through the assemblage of information from data. A little high quality inference would be helpful in that attempt for sure.
When speaking of data quality, most are really referring to data “qualities” – like accuracy, completeness, relevance and such. That is okay. But until someone creates a device that can tell us about the quality of anything, which is not likely, I am attracted to Martin Eppler’s work. Assuming I am not misinterpreting…or misrepresenting him, he suggests somethings which should give us pause. For example, do we even need a definition? He sights Karl Popper who said “I do not say that definitions may not have a role to play in connection with certain problems, but I do say it is for most problems quite irrelevant whether a term can be defined (or not). All that is necessary is that we make ourselves understood.” I also believe Eppler said something to the effect that part of experiencing data quality is the result of the importance of the person consuming the data and doing something sufficiently meaningful with it to give it quality in the first place (i.e. the importance of the purpose…or purposes if you prefer). So, even if data were fit for purpose…or purposes, is the purpose..or for that matter the person having the purpose important enough to make the data high quality?
I will say this, though, data quality – whatever anyone wants it to be – is the reason to do anything related to data such as data governance, master data management, data integration, and the like. Right now most have this backwards, seeing data quality, not as the end, but as the means to get these others things. This is a consequence of mistakenly equating data quality to data cleaning, a mechanical operation at best. It’s a consequence of mistakenly seeing data quality as something you do to data and not as something you experience.
In any case, I see where Martin Chisholm is going with his definition…whether I agree with it or not. One thing is for sure, we need to rethink data quality. I started flirting with this definition for data quality, “the best usable, valued data,” in an attempt to: 1. balance objective and subjective by using “best” (as squishy as that is and making it personal), 2. avoid saying data has value while recognizing that it can be valued by someone, and 3. acknowledge that data has to be in a usable form to satify those who prefer to ignore subjectivity all together. It also leaves up to the person to define data however they want.
This all said, I’d like to pose this question. Would data and quality exist if we did not have a mind to fabricate such concepts? In the end data and quality are only real because we make them so. Otherwise they are not naturally occurring. They are only “real” because we make them such. After all, we do have a profound need to describe what is real.
Thanks Peter. Your comment has left me thinking. Better get back to fixing some data 🙂