In a blog post yesterday on the Melissa Data blog Elliot King wrote about Classifying Data Quality Problems. The post suggests that there are three different kinds of data quality issues:
This classification revolves around the root cause of bad data.
As examined in my post yesterday sometimes bad data quality isn’t bad data. A good deal of problems doesn’t relate to the raw data itself, but is linked to how data are structured,for example in data models, and how data are categorized, for example by (not) using metadata.
Flaws in data structure seem to have similar root causes as the suggested categorization, for example:
- Operational: Data are structured and labeled to fit capturing systems which may not fit further downstream purposes of use.
- Conceptual: The term conceptual data models (or similar approaches) pop up here. We miss them, not at least the enterprise-wide ones, very much in IT landscapes made up by popular off-the-shelf software.
- Organizational: We are usually not very well in talking the same language about the same data.
By the way: One good book about overcoming these challenges I read recently is by Thomas Frisendal and is called Design Thinking Business Analysis.
What do you recommend an organization should do in order to design the structures that makes better use of the data available or collect the right data to begin with?
Thanks for asking Peter. It’s a big subject. Some of my “special” recommendations are:
• Try to reflect the real world – that be the part of the real world that makes sense for you
• Look for industry conceptual data models and the likes – most times prudent people have done some of the thinking for you
• Build in identifiers that relates to external sources that can be used for enrichment and updates – now or later
Another good post that highlights the the key role of the Logical Data Model (LDM) in the areas of Data Quality and Master Data Management.
The LDM shows the structure of the data that is required across the enterprise, not just by the application that is capturing the data but, more importantly, by those Functions that are going to have to use the data.
The LDM also gives names to Data Entities that are consistent across the enterprise. It does not , for example, call an entity ‘Customer’ in one place and ‘Supplier’, ‘Guarantor’, etc in others. It shows that these are all the same thing and will call them by a meaningful generic name, such as ‘Party’.
Thanks for commenting John. I agree. The oh so common customer table with name and address is a data modeling disaster because it doesn’t reflect the real world but only a snapshot of a state in a business process.