A year ago I wrote a blog post about data matching published on the Informatica Perspective blog. The post was called Five Future Data Matching Trends.
One of the trends mentioned is hierarchical data matching.
The reason we need what may be called hierarchical data matching is that more and more organizations are looking into master data management and then they realize that the classic name and address matching rules do not necessarily fit when party master data are going to be used for multiple purposes. What constitutes a duplicate in one context, like sending a direct mail, doesn’t necessary make a duplicate in another business function and vice versa. Duplicates come in hierarchies.
One example is a household. You probably don’t want to send two sets of the same material to a household, but you might want to engage in a 1-to-1 dialogue with the individual members. Another example is that you might do some very different kinds of business with the same legal entity. Financial risk management is the same, but different sales or purchase processes may require very different views.
I usually divide a data matching process into three main steps:
- Candidate selection
- Match scoring
- Match destination
(More information on the page: The Art of Data Matching)
Hierarchical data matching is mostly about the last step where we apply survivorship rules and execute business rules on whether to purge, merge, split or link records.
In my experience there are a lot of data matching tools out there capable of handling candidate selection, match scoring, purging records and in some degree merging records. But solutions are sparse when it comes to more sophisticated things like spitting an original entity into two or more entities by for example Splitting Names or linking records in hierarchies in order to build a Hierarchical Single Source of Truth.
Hi Henrik – it’s always fun when someone actually knows what they are talking about. Of course, splitting names is a parser function – reasonably elegantly handled by Trillium Software’s parser http://www.masterdata.co.za/index.php/solutions-3/trillium-software-system/, as is house holding and similar matching.
But matching on hierarchies is much more complex – mainly because parent child relationships are not always defined in a similar way. So my parent may be much further up in another view of the same hierarchy – which records are we comparing to which becomes the challenge.
Thanks for commenting Gary. Sometimes a thing as splitting names isn’t easy to serialize as parsing before comparing because the right decision may be dependent on a match typically with external reference data. External reference data also plays a role when building the hierarchies.
Hi Henrik, and excellent presentation to be sure, but I have an alternative for you to consider.
let me know what you think, and I am serious. .I need as much feedback as possible
JM.
Jean Michel , thanks a lot for sharing. I haven’t seen the full video at this point, but I think I know what you are getting at.
Indeed hierarchies has its limitations in reflecting a real world that is more like a neural network and eventually our matching processes should end up in making resolutions beyond hierarchies.
So yes, there are still more land to cover.
Hi Henrik,, I think you need to see the rest of the video, cut to 30 minute point.. and you will understand completely what I am showing you 😉 and you will have fun.
Would it be accurate to paraphrase your comments as: There is sometimes a need to logically link matched records instead of merging them? That linkage will generally be some sort of hierarchy. When they are merged vs. when they are linked will depend on a companies business rules/the intended uses of the data.
Or did I miss your point?
Gino. That is indeed the short version. What we need from data matching processes is help in building the said hierarchies by identifying the relations in the hierarchies which is beyond identifying duplicates and facilitate a purge, merge or a link.
Right. Data Matching tool should be able to support multiple match groups which may or may not be related. Like we may be interested to look at matches/duplicates for the groups: Address, Household, Individual, Organization etc. We may use the hierarchy like:
Level1: Address Group & Organization Group (OR relation)
and then define further hierarchies under Address Group like
Level2: Household Group
Level3: Individual Group.
Interesting thread. In my mind, matching is in some regards is the easy bit. I agree there always needs to be a multiple hierarchies approach.
Firstly – there are entity hierarchies. In B2B: Ultimate parent, Parent, Subsidiary, location etc. in B2C: Household, Family, Individual etc.
Secondly – for processing of matched entities there are hierarchies of trust. Which is the best record of those that match, what attributes can be merged to create the best SCV or MDM or record?
Finally – which is the harder part and moves us beyond record matching to hierarchical Data Quality Rules orchestration. Which is another way of saying it’s really a set of hierarchical business or workflow rules.
When you’re able to do this, records are linked in context of need and – In my opinion – only then will you have a fighting chance of delivering a SCV or MDM project.