Relational Data Quality

Most of the work related to data quality improvement I do is done with data in relational databases and is aimed at creating new relations between data. Examples (from party master data) are:

  • Make a relation between a postal address in a customer table and a real world address (represented in an official address dictionary).
  • Make a relation between a business entity in a vendor table and a real world business (represented in a business directory most often derived from an official business register).
  • Make a relation between a consumer in one prospect table and a consumer in another prospect table because they are considered to represent the same real world person.

When striving for multi-purpose data quality it is often necessary to reflect further relations from the real world like:

  • Make a relation in a database reflecting that two (or more) persons belongs to the same household (on the same real world address)
  • Make a relation in the database reflecting that two (or more) companies have the same (ultimate) mother.

Having these relations done right is fundamental for any further data quality improvement endeavors and all the exciting business intelligence stuff. In doing that you may continue to have more or less fruitful discussions on say the classic question: What is a customer?

But in my eyes, in relation to data quality, it doesn’t matter if that discussion ends with that a given row in your database is a customer, an old customer, a prospect or something else. Building the relations may even help you realize what that someone really is. Could be a sporadic lead is recognized as belonging to the same household as a good customer. Could be a vendor is recognized as being a daughter company of a hot prospect. Could be someone is recognized as being fake. And you may even have some business intelligence that based on the relations may report a given row as a customer role in one context and another role in another context.

2 thoughts on “Relational Data Quality

  1. Stu Mitchell 20th May 2010 / 20:20

    Great article. It’s the old chestnut isn’t it…. as you increase the quality (and therefore value) of data, you increase other users’ confidence in that data. When other users look at it, they often get more from the data than it originally appeared to revealed.

    That is, you get additional value from improved data once other people gain confidence in it.

    All part of the golden rule that, cost-benefit of data improvement says, it’s generally worth it.

  2. Henrik Liliendahl Sørensen 22nd May 2010 / 08:24

    Thanks Stu. Nice golden rule for a golden copy 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s