Upstream prevention by error tolerant search

Fuzzy matching techniques were originally developed for batch processing in order to find duplicates and consolidate database rows with no unique identifiers with the real world.

These processes have traditionally been implemented for downstream data cleansing.

As we know that upstream prevention is much more effective than tidy up downstream, real time data entry checking is becoming more common.

But we are able to go further upstream by introducing error tolerant search capabilities.

A common workflow when in-house personnel are entering new customers, suppliers, purchased products and other master data are, that first you search the database for a match. If the entity is not found, you create a new entity. When the search fails to find an actual match we have a classic and frequent cause for either introducing duplicates or challenge the real time checking.

An error tolerant search are able to find matches despite of spelling differences, alternative arranged words, various concatenations and many other challenges we face when searching for names, addresses and descriptions.

Implementation of such features may be as embedded functionality in CRM and ERP systems or as my favourite term: SOA components. So besides classic data quality elements for monitoring and checking we can add error tolerant search to the component catalogue needed for a good MDM solution.

Tirthankar Ghosh 3rd November 2011 / 06:29

Checking for duplicates upstream is good but if there are multiple source systems (each having its own data capturing system), de-duplication needs to be done at the time of data consolidation/integration.

- Henrik Liliendahl Sørensen 3rd November 2011 / 08:31
  
  Thanks for commenting Tirthankar. That is true – unless you are able to make a mash-up of the source systems. I’m working with such a solution right now, where we also include several external registries in the upfront search.
  
  - Tirthankar Ghosh 7th November 2011 / 10:47
    
    I am interested to know the solution architecture in brief after your assignment is over.
Steve Tootill 9th February 2012 / 21:20

Henrik, Really good post. The solution you mention is interesting to me too. My company has a contact data matching product that connects to multiple data sources at data capture – currently limited to Windows web services, SQL Server and Oracle databases, so I would be interested to know the parameters of the solution you’re working with.

- Henrik Liliendahl Sørensen 10th February 2012 / 07:42
  
  Hi Steve, thanks for commenting. I have written a post about it called Reference Data at Work in the Cloud

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph

Liliendahl on Data Quality

A blog about Master Data Management, Product Information Management, Data Quality Management and more

Upstream prevention by error tolerant search

Related

5 thoughts on “Upstream prevention by error tolerant search”

Leave a comment Cancel reply