Data Quality is an Ingredient, not an Entrée

9th July 20109th July 2010Henrik Gabs Liliendahl

Fortunately it is more and more recognized that you don’t get success with Business Intelligence, Customer Relationship Management, Master Data Management, Service Oriented Architecture and many more disciplines without starting with improving your data quality.

But it will be a big mistake to see Data Quality improvement as an entrée before the main course being BI, CRM, MDM, SOA or whatever is on the menu. You have to have ongoing prevention against having your data polluted again over time.

Improving and maintaining data quality involves people, processes and technology. Now, I am not neglecting the people and process side, but as my expertise is in the technology part I will like to mention some the technological ingredients that help with keeping data quality at a tasty level in your IT implementations.

Mashups

Many data quality flaws are (not surprisingly) introduced at data entry. Enterprise data mashups with external reference data may help during data entry, like:

An address may be suggested from an external source.
A business entity may be picked from an external business directory.
Various rules exist in different countries for using consumer/citizen directories – why not use the best available where you do business.

External ID’s

Getting the right data entry at the root is important and it is agreed by most (if not all) data quality professionals that this is a superior approach opposite to doing cleansing operations downstream.

The problem hence is that most data erodes as time is passing. What was right at the time of capture will at some point in time not be right anymore.

Therefore data entry ideally must not only be a snapshot of correct information but should also include raw data elements that make the data easily maintainable.

Error tolerant search

A common workflow when in-house personnel are entering new customers, suppliers, purchased products and other master data are, that first you search the database for a match. If the entity is not found, you create a new entity. When the search fails to find an actual match we have a classic and frequent cause for introducing duplicates.

An error tolerant search are able to find matches despite of spelling differences, alternative arranged words, various concatenations and many other challenges we face when searching for names, addresses and descriptions.

Anna Glushkovsky 9th July 2010 / 14:11

Great Blog post! Unfortunately, most companies face the problem of other people accessing the database and changing records and adding new entries without checking as their understanding of data management and their actions is limited. My suggestion would be to educate everyone in the organization who has access to your database and hope for the best! (Although, people are not computers and mistakes will continue to happen)

Reply
Henrik Liliendahl Sørensen 9th July 2010 / 19:28

Thanks Anna. It’s true, people are not computers. People and computers have different advantages and weaknesses, and therefore a key to improving and maintaining a sufficient data quality is letting people do what they do best and letting computers do what they do best. Computers should assist people in making the right decisions in the easiest possible way when entering data including presenting what is already known inside the computer.

Reply
Guy Pardon 10th July 2010 / 19:16

Right-on!

Another aspect of data quality is the corruption caused by system crashes or bugs. SOA is about distributed applications by definition, with processing affecting multiple back-end systems and hence multiple data sources.

In such a setting, the only reliable way to have data quality guarantees is by doing something like TCC (see http://www.atomikos.com/Publications/TryCancelConfirm ) or reliable messaging (see http://www.atomikos.com/Publications/ReliableJmsWithTransactions )

Best,
Guy

Reply
Henrik Liliendahl Sørensen 11th July 2010 / 17:11

Thanks for the comment Guy. Exciting stuff on the Atomikos site.

Reply
QVT 13th July 2010 / 18:18

Great post Henrik! I just bookmarked for our members at MIKE2.0. Would be great to have you contribute a post with us sometime.

Reply
Henrik Liliendahl Sørensen 13th July 2010 / 18:25

Thanks Brenda. Say when.

Reply
Christophe 9th July 2011 / 08:32

I absolutely agree with your great post Henrik.

Data quality is not a “one shot” effort. Many people think that a DataBase cleaning operation (duplicates deletion, missing fields completion, fields correction) is the solution for having good data quality.

Of course you can do that, but you also need to put in place processes, tools like MDM solutions, … and educate people in order to avoid again and again the same issues in the future.

Reply
Henrik Liliendahl Sørensen 9th July 2011 / 08:50

Thanks for adding Christophe on this ever timely issue.

Reply

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph

Liliendahl on Data Quality

A blog about Master Data Management, Product Information Management, Data Quality Management and more

Data Quality is an Ingredient, not an Entrée

Related

8 thoughts on “Data Quality is an Ingredient, not an Entrée”

Leave a comment Cancel reply