Completeness is one of the most frequently mentioned data quality dimensions. The different data quality dimensions (as completeness, timeliness, consistency, conformity, accuracy and uniqueness) sticks together, and not at least completeness is an aim in itself as well as something that helps improving the other data quality dimensions.
“You can’t control what you can’t measure” is a famous saying. That also applies to data quality dimensions. As pondered in the post Hierarchical Completeness, measuring completeness is usually not something you can apply on the data model level, but something you need to drill down in hierarchies and other segmentation of data.
Party Master Data
A common example is a form where you have to fill a name and address. You may have a field called state/province. The problem is that for some countries (like USA, Canada, Australia and India) this field should be mandatory (and conform to a value list), but for most other countries it does not make sense. If you keep the field mandatory for everyone, you will not get data quality but rubbish instead.
Customer and other party master data have plenty of other completeness challenges. In my experience the best approach to control completeness is involving third party reference data wherever possible and as early in the data capture as feasible. There is no reason to type something probably in a wrong and incomplete way, if it is already digitally available in a righter and more complete way.
Product Master Data
With product master data the variations are even more challenging than with party master data. Which product information attributes that is needed for a product varies across different types of products.
There is some help available in some of the product information standards available as told in the post Five Product Classification Standards. A few of these standards actually sets requirements for which attributes (also called features and properties) that are needed for a product of certain classification within that standard. The problem is then that not everyone uses the same standard (to say in the same version) at the same time. But it is a good starting point.
Product data flows between trading partners. In my experience the key to getting more complete product data within the whole supply chain is to improve the flow of product data between trading partners supported by those who delivers solutions and services for Product Information Management (PIM).
Making that happen is the vision and mission for Product Data Lake.
This is a great post (as usual). I sincerely wish there was MORE reference data available. One of the problems with reference data is that ‘clever’ marketing people are continuously working on ways to make the reference data obsolete. A prime example is color where “sea foam” and “sinus blue” pass for the description. Fortunately, there are many, many areas where the reference data is relevant and available.
Thanks a lot for your colourful add in Gino 🙂