In master data management the two most prominent domains are:
- Parties and
- Products
In the quest for finding representations of parties actually being the same real world party and finding representations of products actually being the same real world product we typically execute fuzzy data matching of:
- Party names as person names and company names
- Product descriptions
However I have often seen party names being an integral part of matching products.
Some examples:
Manufacturer Names:
A product is most often being regarded as distinct not only based on the description but also based on the manufacturer. So besides being sharp on matching product descriptions for light bulbs you must also consider if for example the following manufacturer company names are the same or not:
- Koninklijke Philips Electronics N.V.
- Phillips
- Philips Electronic
Author Names:
A book is a product. The title of the book is the description. But also the author’s person name counts. So how do we collect the entire works made by the author:
- Hans Christian Andersen
- Andersen, Hans Christian
- H. C. Andersen
as all three representations are superb bad data?
Bear Names:
A certain kind of teddy bears has a product description like “Plush magenta teddy bear”. But each bear may have a pet name like “Lots-O’-Huggin’ Bear” or just short “Lotso” as seen in the film “Toy Story 3”. And seriously: In real business I have worked with building a bear data model and the related data matching.
PS: For those who have seen Toy Story 3: Is that Lotso one or two real world entities?
Identification keys can be used for specific areas such as BtoB, BtoC or products (UNSPSC for instance, and other dedicated industrial classification). That helps.
For the rest, sets of rules/ontologies/dictionaries may help but this requires expertise + huge time !
Regarding Lotso, it’s definetly another world.
Merci Stephane. I am also a big fan of identification keys. Agree about the big time. And yes, Lotso is a different world but makes a nice image in blog post.