The other day Joy Medved aka @ParaDataGeek made this tweet:
Indeed, upstream prevention of bad data to enter our databases is sure the better way compared to downstream data cleaning. Also real time enrichment is better than enriching long time after data has been put to work.
That said, there are situations where data cleaning has to be done. These reasons were examined in the post Top 5 Reasons for Downstream Cleansing. But I can’t think of many situations, where a downstream cleaning and/or enrichment operation will be of much worth if it isn’t followed up by an approach to getting it first time right in the future.
If we go a level deeper into data quality challenges, there will be some different data quality dimensions with different importance to various data domains as explored in the post Multi-Domain MDM and Data Quality Dimensions.
With customer master data we most often have issues with uniqueness and location precision. While I have spend many happy years with data cleansing, data enrichment and data matching tools, I have during the last couple of years been focusing on a tool for getting that first time right.
Product master data are often marred by issues with completeness and (location) conformity. The situation here is that tools and platforms for mastering product data are focussed on what goes on inside a given organization and not so much about what goes on between trading partners. Standardization seems to be the only hope. But that path is too long to wait for and may in some way be contradicting the end purpose as discussed under the post Image Coming Soon.
So in order to have a first time right solution for product master data sharing, I have embarked on a journey with a service called the Product Data Lake. If you want to join, you are most welcome.
PS: The product data lake also has the capability of catching up with the sins of the past.