The most frequently mentioned domains within Master Data Management (MDM) are customer, product and location. Data quality is a core discipline when working with MDM. In data quality we talk about different dimensions as uniqueness, relevance, completeness, timeliness, precision, conformity and consistency.
While these data quality dimensions apply to all domains of MDM, some different dimensions apply a bit more to one of the domains or the intersections of the domains.
Below is a figure with an attempt to illustrate where the dimensions belong the most:
Uniqueness is the most addressed data quality dimension when it comes to customer master data. Customer master data are often marred by duplicates, meaning two or more database rows describing the same real world entity. There are several remedies around to cure that pain. These remedies are explored in the post The Good, Better and Best Way of Avoiding Duplicates.
With product master data, uniqueness is a less frequent issue. However, completeness is often a big pain. One reason is that completeness means different requirements for different categories of products as explained in the post Hierarchical Completeness within Product Information Management.
When working with location master data consistency can be a challenge. Addressing, so to speak, the different postal address formats around the world is certainly not a walkover. Even google maps does not have all the right answers as told in the post Sometimes Big Brother is Confused.
In the intersection between the location domain and the customer domain the data quality dimension called precision can be hard to manage as reported in the post A Universal Challenge. What is relevant to know about your customers and what is relevant to tell about your products are essential questions in the intersection of the customer and product master data domains.
Conformity of product data is related to locations. Take unit measurement. In the United States the length of a small thing will be in inches. In most of the rest of the world it will be in centimetres. In the UK you can never know.
Timeliness is the everlasting data quality dimension all over.
Henrik. An interesting take but I am not sure I agree with your positioning.
Completeness and consistency are big issue for customer data also.
For example, in one environment I have found in excess of 2800 patterns for telephone number – which is a key communication field and also useful (potentially) for uniqueness.
Reducing the number of patterns (for example by removing invalid values) reduce the level of completeness and also increases the level of duplication.
Simiarly, by increasing consistency of product data we find more duplicates…
I am not sure that one can prioritise data quality dimensions based on
However, the concept is intriguing – good start
Thanks Gary for kicking the ball. Indeed all dimensions matters within every domain and I agree with your point about that solving one dimension helps with achieving goals related to other dimensions.
If we look at the remedies we use within the different domains I have seen heaps of data matching going on with customer master data, but not so much data matching with product master data, though I have come across some successful examples. Surely there are also remedies for completeness of customer master data, including the iDQ(tm) (instant Data Quality) service for sharing big refernce data for customer master data I am working with.
Solving the completeness pain within product master data haven’t seen many good solutions until now, but I hope that my new adventure with the Product Data Lake for sharing product master data will help out there.
A perspective indeed. Perhaps, you would want to stress that completeness as a dimension for a product master is the need of the hour. Achieving exclusiveness is a far fetch but can be possible at some stage, based on the approach in maturing the Master data quality.
I always use the view point of data quality dimensions across the life cycle of data. Thanks for sharing.
Hi Henrik, I think the most useful observation here is that contrary to MDM manuals which often sound like there is just the one useful approach the emphasis does vary by data set and deploying the same mix each time is likely to lead to bitter disappointment.
Others are correct that all aspects appear in all domains but that rather misses the point – which is how resource and time should be allocated between them.
Thanks for adding in John. Yes, what every organization need to do is to figure out the impact of solving issues related to the various data quality dimensions and balance this against the effort needed to apply the remedies. The best remedies can be very different.