Entity resolution is the discipline of uniquely identifying your master data records, typically being those holding data about customers, products and locations. Entity resolution is closely related to the concept of a single version of the truth.
Questions to be asked during entity resolution are like these ones:
- Is a given customer master data record representing a real world person or organization?
- Is a person acting as a private customer and a small business owner going to be seen as the same?
- Is a product coming from supplier A going to identified as the same as the same product coming from supplier B?
- Is the geocode for the center of a parcel the same place as the geocode of where the parcel is bordering a public road?
We may come a long way in automating entity resolution by using advanced data matching and exploiting rich sources of external reference data and we may be able to handle the complex structures of the real world by using sophisticated hierarchy management and hereby make an entity revolution in our databases.
But I am often faced with the fact that most organizations don’t want an entity revolution. There are always plenty of good reasons why different frequent business processes don’t require full entity resolution and will only be complicated by having it (unless drastic reengineered). The tangible immediate negative business impact of an entity revolution trumps the softer positive improvement in business insight from such a revolution.
Therefore we are mostly making entity evolutions balancing the current business requirements with the distant ideal of a single version of the truth.
Interesting question(s). Before one takes any real steps an organisation has to has to come up with a specific policy/answer for each of these points (& a good deal more) and stick to it. I dont think there are any wrong answers here providing the policy choice is followed religiously
Thanks Tony. I agree about the policies. What you sometimes encounter is that there is a strong policy/business rule for a given perception in a core business area while there are more or less formulated requirements in other contexts. In the first place the most prominent business rule most often will win, but over time we will in best case see an ongoing entity evolution, and in worst (but common) case see data silos with different entity resolutions.
Victor Hugo would be proud of your picture 🙂
Way to go Henrik, stating that “small steps” are more appropriate. There are times I have to remind myself that we as “data professionals” see many customers and situations, we sometimes forget rushing into DQ, MDM or DG is a daunting task for those who dont do this everyday. No matter how they trust us, the business needs to discover the benefits slowly, because an organization can’t absorb the mass change that quickly.
Thanks Garnie. Speaking about Victor Hugo I have had Les Misérables as a working title for a data quality blog post, but it seems too unhappy and negative for the positive spin I really like to make.
Les Misérables is one of my favorite stage shows… you have me singing:
from the intro: Work Song – “Look Down Look Down, dont look them in the eye…” from when we know there are issues on a project (like DQ) but we are not allowed to say anything.
Jean Valjean after being arrested again for stealing from the Bishop – we have another chance, a chance for redemption, a new life can be falsely brought back to the life of the project.
Do You Hear the People Sing? – singing about the right, the need, the just of the project and what needs to be done …
One Day More (end of Act 1) – there is hope, there is need, struggle still exists, but we fight on, not for a year from now, but One Day More….
Oh Henrik I could go on and on about the struggles of projects as Victor has placed, but for the story, there is victory… there is a positive spin. We know, we have heard, we are suppose to remember 🙂 not make the mistakes of the past, to allow blind projects, and to … well you get the point… find a happy ending 🙂
Thanks again Garnie. Yes, despite the “miserable” title there is hope and heroes in Hugo’s stories.
Good post Henrik. It encouraged me to write something about the differences between MDM and ER (http://bit.ly/falkn9). As an industry when we interchange the concepts we should also be able to articulate the differences. I think it actually helps to make your point when you draw a line between them.
Jeff, thanks a lot for following up on the subject.
It’s a good question: Do we have to differentiate between master data management and entity resolution. I sense a close relation to the two ways of having good data quality: Either your data are fit for the purpose of use or they reflect the real world. Or both. Probably best – now or later – if both.