When working with data and information quality we often use words as rubbish, poor, bad and other negative words when describing data that need to be enhanced in order to achieve better data quality. However, what is bad may have been good in the context where a particular set of data originated.
Right now I have some fun with author names.
An example of good and bad could be with an author I have used several times on this blog, namely the late fairy tale writer called in full name:
Hans Christian Andersen
When gazing through data you will meet his name represented this way:
Andersen, Hans Christian
This representation is fit for purpose of use for example when looking for a book by this author at a library, where you sort the fictional books by the surname of the author.
The question is then: Do you want to have the one representation, the other representation or both?
You may also meet his name in another form in another field than the name field. For example there is a main street in Copenhagen called:
H. C. Andersens Boulevard
This is the representation of the real world name of the street holding a common form of the authors name with only initials.
The interesting topic of multiple, valid versions of the truth. Master data management’s nightmare. It’s something that has come up in my address validation series.
It’s a good topic for discussion in the dq and mdm space.
Thanks for commenting William. Yes, we have to handle these different valid versions of entity representations in our MDM solutions and keep the links as well where relevant.