The Wikipedia article on Identity Resolution has this catch on the difference between good old data matching and Entity Resolution:
”Here are four factors that distinguish entity resolution from data matching, according to John Talburt, director of the UALR Laboratory for Advanced Research in Entity Resolution and Information Quality:
- Works with both structured and unstructured records, and it entails the process of extracting references when the sources are unstructured or semi-structured
- Uses elaborate business rules and concept models to deal with missing, conflicting, and corrupted information
- Utilizes non-matching, asserted linking (associate) information in addition to direct matching
- Uncovers non-obvious relationships and association networks (i.e. who’s associated with whom)”
I have a gut feeling that Data Matching and Entity (or Identity) Resolution will melt together in the future as expressed in the post Deduplication vs Identity Resolution.
If you look at the above mentioned factors that distinguish data matching from identity resolution, some of the often mentioned features in the new big data technology shine through:
- Working with unstructured and semi-structured data is probably the most mentioned difference between working with small data versus working with big data.
- Working with associations is a feature of graph databases or other similar technologies as mentioned in the post Will Graph Databases become Common in MDM?
So, in the quest of expanding matching small data to evolve into Entity (or Identity) Resolution we will be helped by general developments in working with big data.
Interesting contributions to a series of posts that are always thought provoking. The DQ world has come a long way since all we talked about was address standardization, dedupe and merge/purge 🙂 I am sure that Identity Resolution using “big contact data” represents a huge opportunity for businesses to expand their customer knowledge and their ability to target effectively.