Data matching is a sub discipline within data quality management. Data matching is about establishing a link between data elements and entities, that does not have the same value, but are referring to the same real-world construct. The most common example is establishing a link between two different data records probably describing the same person as for example:
- Bob Smith at 1 Main Str in Anytown
- Robert Smith at One Main Street in Any Town
Data matching can be applied to other master data entity types as companies, locations, products and more.
In the data matching world there has always been attempts to apply machine learning (or artificial intelligence if you like). This is because deterministic approaches usually result in too many false negatives being actual matching entities not found by the computer. Probabilistic / fuzzy logic approaches usually works better, but often not good enough.
One of my own attempts with machine learning was made within a solution at Dun & Bradstreet Nordic called GlobalMatchBox. One happy result of the machine learning capability was described in the post The Art in Data Matching.
In the recent years I have embraced product master data and product data quality within my business activities. The pain points in handling product information does in some cases include matching product entities but even more it is about matching the different taxonomies in use for product data, not at least between trading partners in business ecosystems.
So, machine learning leading to artificial intelligence is on my agenda again in a quest for matching metadata as told in the post It is time to apply AI to MDM and PIM.
How about you? Do you see a future with machine learning in data matching? Have you seen any happy results?