We are continuously struggling with defining what it is we are doing like defining: What is data quality? What is Master Data? Lately I’ve been involved in discussions around: What is Identity Resolution? A current discussion on this topic is rolling in the Data Matching LinkedIn group.
This discussion has roots in one of my blog posts called Entity Revolution vs Entity Evolution. Jeffrey Huth of IBM Initiate followed up with the post Entity Resolution & MDM: Interchangeable? In January Phillip Howard of Bloor made a post called There’s identity resolution and then there’s identity resolution (followed up by a correction post the other day called My bad).
It is a “same same but different” discussion. Traditional data matching (or record linkage) as seen in a data quality tool and master data management solution is the bright view: Being about finding duplicates and making a “single business partner view” (or “single party view” or “single customer view”). Identity resolution is the dark view: Preventing fraud and catching criminals, terrorists and other villains.
The Gartner Hype Cycle describes the dark view as ”Entity Resolution and Analysis”. This discipline is approaching the expectation peak and will, according to Gartner, be absorbed by other disciplines as no one can tell the difference I guess.
Certainly there are poles. In an article from 2006 called Identity Resolution and Data Integration David Loshin said: There is a big difference between trying to determine if the same person is being mailed two catalogs instead of one and determining if the individual boarding the plane is on the terrorist list.
But there is also a grey zone.
From a business perspective for example the prevention of misuse of a restricted campaign offer is a bit of both sides. Here you want to avoid that an existing customer is using an offer only meant for new customers. How does that apply to members of the same household or the same company family tree? Or you want to avoid someone using an introduction offer twice by typing her name and address a bit different.
From a technical perspective I have an example from working with a newspaper in a big fraud scam described in the post Big Time ROI in Identity Resolution. Here I had no trouble using a traditional deduplication tool in discovering non-obvious relationships. Also the relationships discovered in traditional data matching ends up quite nicely in hierarchy management as part of master data management as described in the post Fuzzy Hierarchy Management.
And then there is the use of the words identity (resolution) versus entity (resolution).
My feeling is that we could use identity resolution for describing all kind of matching and linking with party master data and entity resolution could be used for describing all kind of matching and linking with all master data entity types as seen in multi-domain master data management. But that’s just my words.
Excellent blog post, Henrik.
You did a great job of examining the similarities and differences among duplicate identification, entity resolution, and identity resolution, and I like your analogy of the bright and dark views differentiating these applications of data matching technology and methodology.
It is this last point that I want to dwell on for a moment because I get frustrated by vendors that claim they offer solutions for identity resolution, which are more sophisticated than their competitors that only offer solutions for entity resolution.
But when you scratch beneath the surface, most of these vendors are simply offering basic data matching technology and methodology, and not customized solutions for a particular application (i.e., out-of-the-box, but customizable, data matching templates for entity resolutions versus identity resolution).
In a way, this reminds me of all the data quality tool vendors that re-branded themselves as master data management (MDM) solution providers when MDM started getting more industry buzz.
Although I definitely agree that entity resolution and identity resolution are different business problems requiring different applications of data matching technology and methodology, most of the hype I have heard in the marketplace is about marketing not matching.
Great post Henrik.
This is an area I’ve been thinking about quite a bit. One of the problems I think we have is we use generic technology-centric terms like identity resolution, MDM, or entity resolution to describe use cases. On top of that, as Jim points out technology providers jump onto whatever term is hot (data quality branding as MDM) and what a competitor might be doing. We then find ourselves with a situation where a product that brands itself as one type of technology is used somewhere else as you have experienced.
It reminds me of my earlier career when a product that was basically ETL called itself Enterprise Application Integration (EAI), later BPM, until web services standards pretty much took it out all together. It is worth reminding ourselves customers don’t buy technology (usually) they busy solutions but we leave it up to the customer or someone trying to solve the customer’s problem to figure it out.
Easier said than done but it would nice if we could stop trying to fit into a select group of technology descriptions but instead describe the technology in more specific terms that make sense for the business problem. Kind of like the bright view and dark view. Another way I describe MDM and “resolution” is to say in the former people want to found, in the later they don’t want to be found. As a result the underlying approach to data and matching is and must be different.
I have a feeling we’ll be discussing this for quite some time.
Great post, Henrik, and good follow up from Jim and Jeff. Having spent some time on both sides of the IR and DQ fence, I always find this discussion interesting. At the end of the day, I’m a firm believer that it’s the level of risk tolerance that drives a decision to use one over the other. As was suggested before, you can definitely argue the case that DQ can do the job of IR and the other way around. In the cases where there’s little room for error, though (ala the reference to David L’s note about terrorists), you need a solution with a core competency for solving zero or low tolerance problems.
I liken it to using a wrench to drive a nail in to a piece of wood. While you really should use a hammer, you technically can drive a nail in wood with a wrench (I’ve actually done this, I’m afraid to admit). Using the wrong tool to do the job might be good enough in certain circumstances however it’s not right the solution long term.
Great post, Henrik.
It highlights a problem that pervades any area that individuals or enterprises want to exploit financially by appearing to be “experts”, namely, the use of meaningless jargon.
The major reason that much discussion can surround terms like “identity resolution” and “entity resolution” is the fact that these terms are, in reality, quite meaningless.
Whenever I enter a dialogue where such terms are being used instead of immediately joining in, thinking that I should know what they mean, I ask those using the term, “What exactly do you mean by ?” I often get a disdainful look that says, “Call yourself an expert?”
However, when I persist and ask again, “What do you mean by ?”. The users of the term(s) are often (most times) unable to come up with a meaningful explanation.
There are two simple steps to take that can remove ambiguity from such terms:
1) Ask “What exactly do I mean by this term?”
2) Ask “What is the fewest words that can unambiguously convey this meaning?”
Let’s look at “Identity Resolution”.
Step 1: Ask what exactly do I mean by “Identity Resolution”?
Suppose I mean, “The ability to be certain that all transactions relating to Third Parties can be unambiguously assigned to a single, unique occurrence of a Third Party. Also, the ability to prevent any Third Party from wrongly representing themselves in a transaction as any another Third Party.”
Clearly, I would then be addressing the identity of Third Parties, so, this area of activity could be meaningfully termed “Party Identity Resolution”
In the same way, when I am talking about the Unique Identifiers of Master Data and the removal of duplicates, etc, I might arrive at the term “Entity Identity Resolution”.
Thanks a lot Jim, Jeff, Clarke and John for adding tremendously to the discussion.
Thanks for explaining about the Identity Resolution through this post, its really necessary for us….!http://www.infoglide.com/