Data Quality is an Ingredient, not an Entrée

Fortunately it is more and more recognized that you don’t get success with Business Intelligence, Customer Relationship Management, Master Data Management, Service Oriented Architecture and many more disciplines without starting with improving your data quality.

But it will be a big mistake to see Data Quality improvement as an entrée before the main course being BI, CRM, MDM, SOA or whatever is on the menu. You have to have ongoing prevention against having your data polluted again over time.

Improving and maintaining data quality involves people, processes and technology. Now, I am not neglecting the people and process side, but as my expertise is in the technology part I will like to mention some the technological ingredients that help with keeping data quality at a tasty level in your IT implementations.

Mashups

Many data quality flaws are (not surprisingly) introduced at data entry. Enterprise data mashups with external reference data may help during data entry, like:

  • An address may be suggested from an external source.
  • A business entity may be picked from an external business directory.
  • Various rules exist in different countries for using consumer/citizen directories – why not use the best available where you do business.

External ID’s

Getting the right data entry at the root is important and it is agreed by most (if not all) data quality professionals that this is a superior approach opposite to doing cleansing operations downstream.

The problem hence is that most data erodes as time is passing. What was right at the time of capture will at some point in time not be right anymore.

Therefore data entry ideally must not only be a snapshot of correct information but should also include raw data elements that make the data easily maintainable.

Error tolerant search

A common workflow when in-house personnel are entering new customers, suppliers, purchased products and other master data are, that first you search the database for a match. If the entity is not found, you create a new entity. When the search fails to find an actual match we have a classic and frequent cause for introducing duplicates.

An error tolerant search are able to find matches despite of spelling differences, alternative arranged words, various concatenations and many other challenges we face when searching for names, addresses and descriptions.

Bookmark and Share

8 thoughts on “Data Quality is an Ingredient, not an Entrée

  1. Anna Glushkovsky 9th July 2010 / 14:11

    Great Blog post! Unfortunately, most companies face the problem of other people accessing the database and changing records and adding new entries without checking as their understanding of data management and their actions is limited. My suggestion would be to educate everyone in the organization who has access to your database and hope for the best! (Although, people are not computers and mistakes will continue to happen)

  2. Henrik Liliendahl Sørensen 9th July 2010 / 19:28

    Thanks Anna. It’s true, people are not computers. People and computers have different advantages and weaknesses, and therefore a key to improving and maintaining a sufficient data quality is letting people do what they do best and letting computers do what they do best. Computers should assist people in making the right decisions in the easiest possible way when entering data including presenting what is already known inside the computer.

  3. Guy Pardon 10th July 2010 / 19:16

    Right-on!

    Another aspect of data quality is the corruption caused by system crashes or bugs. SOA is about distributed applications by definition, with processing affecting multiple back-end systems and hence multiple data sources.

    In such a setting, the only reliable way to have data quality guarantees is by doing something like TCC (see http://www.atomikos.com/Publications/TryCancelConfirm ) or reliable messaging (see http://www.atomikos.com/Publications/ReliableJmsWithTransactions )

    Best,
    Guy

  4. Henrik Liliendahl Sørensen 11th July 2010 / 17:11

    Thanks for the comment Guy. Exciting stuff on the Atomikos site.

  5. QVT 13th July 2010 / 18:18

    Great post Henrik! I just bookmarked for our members at MIKE2.0. Would be great to have you contribute a post with us sometime.

  6. Henrik Liliendahl Sørensen 13th July 2010 / 18:25

    Thanks Brenda. Say when.

  7. Christophe 9th July 2011 / 08:32

    I absolutely agree with your great post Henrik.

    Data quality is not a “one shot” effort. Many people think that a DataBase cleaning operation (duplicates deletion, missing fields completion, fields correction) is the solution for having good data quality.

    Of course you can do that, but you also need to put in place processes, tools like MDM solutions, … and educate people in order to avoid again and again the same issues in the future.

  8. Henrik Liliendahl Sørensen 9th July 2011 / 08:50

    Thanks for adding Christophe on this ever timely issue.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s