Superb Bad Data

When working with data and information quality we often use words as rubbish, poor, bad and other negative words when describing data that need to be enhanced in order to achieve better data quality. However, what is bad may have been good in the context where a particular set of data originated.

Right now I have some fun with author names.

An example of good and bad could be with an author I have used several times on this blog, namely the late fairy tale writer called in full name:

Hans Christian Andersen

When gazing through data you will meet his name represented this way:

Andersen, Hans Christian

This representation is fit for purpose of use for example when looking for a book by this author at a library, where you sort the fictional books by the surname of the author.

The question is then: Do you want to have the one representation, the other representation or both?

You may also meet his name in another form in another field than the name field. For example there is a main street in Copenhagen called:

H. C. Andersens Boulevard

This is the representation of the real world name of the street holding a common form of the authors name with only initials.

Bookmark and Share

2 thoughts on “Superb Bad Data

  1. William Sharp 29th December 2010 / 16:20

    The interesting topic of multiple, valid versions of the truth. Master data management’s nightmare. It’s something that has come up in my address validation series.
    It’s a good topic for discussion in the dq and mdm space.

  2. Henrik Liliendahl Sørensen 29th December 2010 / 16:39

    Thanks for commenting William. Yes, we have to handle these different valid versions of entity representations in our MDM solutions and keep the links as well where relevant.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s