Fuzzy Hierarchy Management

2nd March 2011Henrik Gabs Liliendahl

When evaluating results from automated data matching your goal is typically to find false positives and false negatives being entities that are matched, but shouldn’t be (false positives) and entities that are not matched, but should have been (false negatives).

However the fuzziness often used in the data matching process also apply to the evaluation of the results as many dubious results isn’t a question about if the matched database rows are reflecting the same real world entity but more a question about if the matched (or not matched) database rows are reflecting different members of a real world hierarchy.

Example 1:

John Smith on 1 Main Street in Anytown

Mary & John Smith on 1 Main Str in Anytown

Example 2:

Anytown Municipality, Technical Dept

Municipality of Anytown

Example 3:

Acme Corporation, Anytown

Acme Corporation, Anywhere

All three examples above may be considered a false positive if matched and a false negative if not matched.

You may say that it depends on the purpose of use, which is true.

But if we are talking master data management we may probably encompass multiple requirements where we simultaneously need the match and don’t want the match, which is why we need to be able to resolve and store the results from fuzzy data matching into hierarchies.

Wayne Colless 2nd March 2011 / 23:51

Interesting topic Henrik, in more ways than one.

Firstly, it shows the risks of building auotmatic processes on the back end of fuzzy matching solutions. If a false positive result leads to an automatic action then someone could pay a price they needn’t have to. For example, we’ve all heard stories about people who have missed a flight or been taken in for questioning because their name is similar to one on a terrorist alert or other type of watch list. By the same token, people are sometimes missed in the watch list scenario when the matching isnt ‘fuzzy’ enough.(The Delta/Northwest Airlines Flight 253 ‘underpants bomber’ in December 2009 comes to mind.)

Secondly, it shows the importance of having a person have the final say when it comes to fuzzy matching outcomes. Sometimes its an easy choice for the analyst, sometimes not. And when its not, a person knows where to check for the additional evidence to go one way or the other with the match.

Although they can always be fine tuned to maximise outcomes, the best fuzzy matching outcomes will likely always involve the ‘human element’ in some way.

Reply
- Henrik Liliendahl Sørensen 3rd March 2011 / 09:32
  
  Thanks for commenting Wayne. Indeed it is a balancing between:
  
  – Doing so much automated as possible and thereby save money and
  – Doing that much manually and right as necessary but thereby spend more money (or oftentimes therefore not doing it)
  
  Reply
John Owens Dunedin 3rd March 2011 / 01:05

Thanks for the post, Henrik.

It highlights a common and, sadly, widespread problem. That is, people using the term “fuzzy logic”, which can be a powerful technique in data cleansing, with “fuzzy thinking”, which is probably the single greatest barrier to achieving data quality.

The “fuzzy thinking” manifests itself in several ways.

One of these is the belief that you can create high quality data from low quality data. The most common example of this is when people take two, or more, pieces of incomplete data relating to an entity, that by themselves do not uniquely identify that entity, and combine them into something that they think will.

The second most common form is when people look at two records and say, “yes, they represent a unique occurrence of the entity”, without the business having first defined what the Unique Identifier for that entity is.

Until the enterprise has asked and answered the question, ”what is it, with respect to this enterprise, that makes one occurrence of Entity X uniquely different from every other occurrence of Entity X ?”, then not even a human can say that two records represent a single occurrence of Entity X.

Unless, of course, they use “fuzzy thinking”!!

Regards
John

Reply
- Henrik Liliendahl Sørensen 3rd March 2011 / 09:49
  
  Thanks John.
  
  I have a little thesis saying that there is a breakeven point when including more and more purposes (internal business rules) where it will be less cumbersome to reflect the real world object rather than trying to align all known purposes.
  
  This means that if you try to reflect the real world in your model of individuals and companies and their hierarchies then you have a good chance of meeting current discovered, current undiscovered and future business requirements.
  
  But it’s a balance too. I don’t preach that you have to model the entire world in every system. But you may take out pieces rather than building a fuzzy fit for snapshot purpose model. Having the big picture in mind helps you might say.
  
  Reply
Garnie Bolling 3rd March 2011 / 15:19

Henrik, another great article.

Boils down to one thing, what business problem are you trying to solve here ? Matching for the sake of matching is a vicious circle, but if you are trying to match addresses to reduce your costs of duplicate mailings / prevent over postage charges, then, we are have a means to an end.

Thanks.

Reply
- Henrik Liliendahl Sørensen 3rd March 2011 / 17:15
  
  Thanks Garnie. I agree, if we have only one business problem to solve, it’s fuzzy, but we may see an end.
  
  When populating a master data hub we may have several different business problems to solve at the same time and to be encompassed in a single customer view or as I like to widen it, in a single business partner view (leaving other master data entities than parties for a later view).
  
  We have to be able to present solutions for this conundrum sooner or later as master data management solutions will be used in more and more business processes.
  
  Reply

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph

Liliendahl on Data Quality

A blog about Master Data Management, Product Information Management, Data Quality Management and more

Fuzzy Hierarchy Management

Related

6 thoughts on “Fuzzy Hierarchy Management”

Leave a comment Cancel reply