Getting the right data entry at the root is important and it is agreed by most (if not all) data quality professionals that this is a superior approach opposite to doing cleansing operations downstream.
The problem hence is that most data erodes as time is passing. What was right at the time of capture will at some point in time not be right anymore.
Therefore data entry ideally must not only be a snapshot of correct information but should also include raw data elements that make the data easily maintainable.
An obvious example: If I tell you that I am 49 years old that may be just that piece of information you needed for completing a business process. But if you asked me about my birth date you will have the age information also upon a bit of calculation plus you based on that raw data will know when I turn 50 (all too soon) and your organization will know my age if we should do business again later.
Birth dates are stable personal data. Gender is pretty much too. But most other data changes over time. Names changes in many cultures in case of marriage and maybe divorce and people may change names when discovering bad numerology. People move or a street name may be changed.
There is a great deal of privacy concerns around identifying individual persons and the norms are different between countries. In Scandinavia we are used to be identified by our unique citizen ID but also here within debatable limitations. But you are offered solutions for maintaining raw data that will make valid and timely B2C information in what precision asked for when needed.
Otherwise it is broadly accepted everywhere to identify a business entity. Public sector registrations are a basic source of identifying ID’s having various uniqueness and completeness around the world. Private providers have developed proprietary ID systems like the Duns-Number from D&B. All in all such solutions are good sources for an ongoing maintenance of your B2B master data assets.
Addresses belonging to business or consumer/citizen entities – or just being addresses – are contained as external reference data covering more and more spots on the Earth. Ongoing development in open government data helps with availability and completeness and these data are often deployed in the cloud. Right now it is much about visual presenting on maps, but no doubt about that more services will follow.
Getting data right at entry and being able to maintain the real world alignment is the challenge if you don’t look at your data asset as a throw-away commodity.
Figure 1: one year old prime information
PS: If you forgot to maintain your data: Before dumping Data Cleansing might be a sustainable alternative.
A great detailed and clear post on a critical consideration for data management practitioners. Well explained as usual. For me ensuring data is as current as possible is key to MDM, and specifically CDI, success. You aren’t nearly 50 are you?
PS: Love the graphic 🙂
Thanks Charles. In Scandinavia we are constantly reminded about our age as the Citizen ID we have to use almost everywhere contains our birth date. So, no way around.
Good and interesting post!
Great post Henrik,
You are absolutely right that what is sometimes lost in the “cleanse vs. prevent” debate about data quality best practices is that however you “achieve quality” for your data, your job isn’t done – you must also provide ongoing maintenance.
Data decay is the enemy of accuracy, relevancy, and many other dimensions of data quality.
P.S. I guess that we should expect such wisdom from a man of your age 😉
Hi Jim. Nice to see you in here too. You learn all through life – today I discovered the reply threading in WordPress.
Interesting post, Henrik. So many of these concepts are very simple when they’re broken down, but organizational layers, bureaucracy and politics make them difficult.
The gatherers of the data are traditionally far apart from the users of the data and the managers/stewards of the data. Bringing them closer together so that they appreciate each others’ jobs can help get back to that simplicity.
Thanks Alan, I agree. Communication across silos and borders is essential when improving data quality.