Upstream prevention by error tolerant search

Fuzzy matching techniques were originally developed for batch processing in order to find duplicates and consolidate database rows with no unique identifiers with the real world.

These processes have traditionally been implemented for downstream data cleansing.

As we know that upstream prevention is much more effective than tidy up downstream, real time data entry checking is becoming more common.

But we are able to go further upstream by introducing error tolerant search capabilities.

A common workflow when in-house personnel are entering new customers, suppliers, purchased products and other master data are, that first you search the database for a match. If the entity is not found, you create a new entity. When the search fails to find an actual match we have a classic and frequent cause for either introducing duplicates or challenge the real time checking.

An error tolerant search are able to find matches despite of spelling differences, alternative arranged words, various concatenations and many other challenges we face when searching for names, addresses and descriptions.

SOA componentImplementation of such features may be as embedded functionality in CRM and ERP systems or as my favourite term: SOA components. So besides classic data quality elements for monitoring and checking we can add error tolerant search to the component catalogue needed for a good MDM solution.

Bookmark and Share

5 thoughts on “Upstream prevention by error tolerant search

  1. Tirthankar Ghosh 3rd November 2011 / 06:29

    Checking for duplicates upstream is good but if there are multiple source systems (each having its own data capturing system), de-duplication needs to be done at the time of data consolidation/integration.

    • Henrik Liliendahl Sørensen 3rd November 2011 / 08:31

      Thanks for commenting Tirthankar. That is true – unless you are able to make a mash-up of the source systems. I’m working with such a solution right now, where we also include several external registries in the upfront search.

      • Tirthankar Ghosh 7th November 2011 / 10:47

        I am interested to know the solution architecture in brief after your assignment is over.

  2. Steve Tootill 9th February 2012 / 21:20

    Henrik, Really good post. The solution you mention is interesting to me too. My company has a contact data matching product that connects to multiple data sources at data capture – currently limited to Windows web services, SQL Server and Oracle databases, so I would be interested to know the parameters of the solution you’re working with.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s