The Overlooked MDM Feature

When engaging in the social media community dealing with master data management an often seen subject is creating a list of important capabilities for the technical side of master data management. I have at some occasions commented on such posts by adding a feature I often see omitted from these lists, namely: Error tolerant search functionality. Examples from the DataFlux CoE blog here and the LinkedIn Master Data Management Interest Group here.

Error tolerant search (also called fuzzy search) technology is closely related to data matching technology. But where data matching is basically none interactive, error tolerant search is highly interactive.

Most people know error tolerant search from googling. You enter something with a typo and google prompts you back with: Did you mean…? When looking for entities in master data management hubs you certainly need something similar. Spelling of names, addresses, product descriptions and so on is not easy – not at least in a globalized world.

As in data matching error tolerant search may use lists of synonyms as the basic technology. But also the use of algorithms is common going from an oldie like the soundex phonetic algorithm over more sophisticated algorithms.

The business benefits from having error tolerant search as a capacity in your master data management solution are plenty, including:

  • Better data quality by upstream prevention against duplicate entries as explained in this post.
  • More efficiency by bringing down the time users spends on searching for information about entities in the master data hub.
  • Higher employee satisfaction by eliminating a lot of frustration else coming from not finding what you know must be inside the hub already.

Error tolerant search has been one of the core features in the master data management implementations where I have been involved. What about you?

Picture This

How do people find their way to your blog? I use Twitter and LinkedIn to say: Hey, I made a new post. And then I pretty much rely on that people find my blog when searching with terms as:

  • Data Quality
  • Master Data Survivorship
  • Fit for purpose

But honestly, the search terms that hits my blog many fold more than the above terms are those little texts I add to the images I use to have on every post. And I am pretty sure that those people were not looking for data quality and master data management.

The top term is pearls, including the same word in Russian (жемчуг), Turkish (inci) and Arabian (لآلئ). This word was the title in the image in the post “Universal Pearls of Wisdom” where I wrote about the new SOA manifesto and how this manifesto might as well be about data quality and a lot of other disciplines and concepts. Probably not very interesting for someone trying to buy pearls or so. But maybe a single or two of the +2,000 pearl fishers was captured in the data quality net.

The second most used term is Gorilla. This was used as text for the image in the post “Gorilla Data Quality”. Personally I like this gorilla picture, and so it seems that approximate 1,600 other people also do. Whether they also like the philosophic ideas around “Gorilla Data Quality” and “Guerilla Data Quality” I am not so sure.

Other terms hitting big is Brueghel and Tower of Babel used in a post about international challenges in data quality called “The Tower of Babel” as it was illustrated by a painting by Brueghel. Also Penny Black used in a post about “Postal Address Hierarchy, Granularity, Precision and History” raised the pageview counter.

But it doesn’t seem that every little common word will do. Once I used the word traffic, but it didn’t generate any traffic at all.

