MDM Tools Revealed

Every organization needs Master Data Management (MDM). But does every organization need a MDM tool?

In many ways the MDM tools we see on the market resembles common database tools. But there are some things the MDM tools do better than a common database management tool. The post called The Database versus the Hub outlines three such features being:

  • Controlling hierarchical completeness
  • Achieving a Single Business Partner View
  • Exploiting Real World Awareness

Controlling hierarchical completeness and achieving a single business partner view is closely related to the two things data quality tools do better than common database systems as explained in the post Data Quality Tools Revealed. These two features are:

  • Data profiling and
  • Data matching

Specialized data profiling tools are very good at providing out-of-the-box functionality for statistical summaries and frequency distributions for the unique values and formats found within the fields of your data sources in order to measure data quality and find critical areas that may harm your business. These capabilities are often better and easier to use than what you find inside a MDM tool. However, in order to measure the improvement in a business context and fix the problems not just in a one-off you need a solid MDM environment.

When it comes to data matching we also still see specialized solutions that are more effective and easier to use than what is typically delivered inside MDM solutions. Besides that, we also see business scenarios where it is better to do the data matching outside the MDM platform as examined in the post The Place for Data Matching in and around MDM.

Looking at the single MDM domains we also see alternatives. Customer Relation Management (CRM) systems are popular as a choice for managing customer master data.  But as explained in the post CRM systems and Customer MDM: CRM systems are said to deliver a Single Customer View but usually they don’t. The way CRM systems are built, used and integrated is a certain track to create duplicates. Some remedies for that are touched in the post The Good, Better and Best Way of Avoiding Duplicates.

integriertWith product master data we also have Product Information Management (PIM) solutions. From what I have seen PIM solutions has one key capability that is essentially different from a common database solution and how many MDM solutions, that are built with party master data in mind, has. That is a flexible and super user angled way of building hierarchies and assigning attributes to entities – in this case particularly products. If you offer customer self-service, like in eCommerce, with products that have varying attributes you need PIM functionality. If you want to do this smart, you need a collaboration environment for supplier self-service as well as pondered in the post Chinese Whispers and Data Quality.

All in all the necessary components and combinations for a suitable MDM toolbox are plentiful and can be obtained by one-stop-shopping or by putting some best-of-breed solutions together.

IDQ vs iDQ™

The previous post on this blog was called Informatica without Data Quality? This post digs into the messaging around the recent takeover of Informatica and the future for the data quality components in the Informatica toolbox.

In the comments Julien Peltier and Richard Branch discusses the cloud emphasis in the messaging from the new Informatica owners and especially the future of Master Data Management (MDM) in the cloud.

open-doorMy best experience with MDM in the cloud is with a service called iDQ™ – a service that shares TLA (Three Letter Acronym) with Informatica Data Quality by the way. The former stands for instant Data Quality. This is a service that revolves around turning your MDM inside-out as latest touched on this blog in the post The Pros and Cons of MDM 3.0.

iDQ™ specifically deals with customer (or rather party) master data, how to get this kind of master data right the first time and how to avoid duplicates as explored in the post The Good, Better and Best Way of Avoiding Duplicates.

Bookmark and Share

The Data Matching Institute is Here

Within data management we already have “The MDM Institute”, “The Data Governance Institute” and “The Data Warehouse Institute (TDWI)” and now we also have “The Data Matching Institute”.

TDMIThe founder of The Matching Institute is Alexandra Duplicado. Aleksandra says: “The reason I founded The Institute of Data Matching is that I am sick and tired of receiving duplicate letters with different spellings of my name and address”. Alex is also pleased about, that she now have found a nice office in edit distance of her home.

Before founding The Matching of Data Institute Alexander worked at the Universal Postal Union with responsibility for extra-terrestrial partners. When talking about the future of The Match Institute Sasha remarks: “It is a matter of not being too false positive. But it is a unique concept”.

One of the first activities for The Data-Matching Institute will be organizing a conference in Brussels. Many tool vendors such as Statistical Analysis System Inc., Dataflux and SAS Instiute will sponsor the Brüssel conference. I hope to join many record linkage friends in Bruxelles says Alexandre.

The Institute of Matching of Data also plans to offer a yearly report on the capabilities of the tool vendors. Asked about when that is going to happen Aleksander says: “Without being too deterministic a probabilistic release date is the next 1st of April”.

Bookmark and Share

The Place for Data Matching in and around MDM

Data matching has increasingly become a component of Master Data Management (MDM) solutions. This has mostly been the case for MDM of customer data solutions, but it is also a component of MDM of product data solutions not at least when these solutions are emerging into the multi-domain MDM space.

The deployment of data matching was discussed nearly 5 years ago in the post Deploying Data Matching.

Neural NetworkWhile MDM solutions since then have been picking up on the share of the data matching being done around it is still a fairly small proportion of data matching that is performed within MDM solutions. Even if you have a MDM solution with data matching capabilities, you might still consider where data matching should be done. Some considerations I have come across are:

Acquisition and silo consolidation circumstances

A common use case for data matching is as part of an acquisition or internal consolidation of data silos where two or more populations of party master data, product master data and other important entities are to be merged into a single version of truth (or trust) in terms of uniqueness, consistency and other data quality dimensions.

While the MDM hub must be the end goal for storing that truth (or trust) there may be good reasons for doing the data matching before the actual on-boarding of the master data.

These considerations includes

The point of entry

The MDM solution isn’t for many good reasons not always the system of entry. To do the data matching at the stage of data being put into the MDM hub may be too late. Expanding the data matching capabilities as Service Oriented Architecture component may be a better way as pondered in the post Service Oriented Data Quality.

Avoiding data matching

Even being a long time data matching practitioner I’m afraid I have to bring up the subject of avoiding data matching as further explained in the post The Good, The Better and The Best Way of Avoiding Duplicates.

Bookmark and Share

CRM systems and Customer MDM

Last week I had some fun making a blog post called The True Leader in Product MDM. This post was about how product Master Data Management still in most places is executed by having heaps of MS Excel spreadsheets flowing around within the enterprise and between business partners, as I have seen it.

business partnersWhen it comes to customer Master Data Management MS Excel may not be so dominant. Instead we have MS CRM and the competing offerings as Salesforce.com and a lot of other similar Customer Relationship Management solutions.

CRM systems are said to deliver a Single Customer View. Usually they don’t. One of the reasons is explained in the post Leads, Accounts, Contacts and Data Quality. The way CRM systems are built, used and integrated is a certain track to create duplicates.

Some remedies out there includes periodic duplicate checks within CRM databases or creating a federated Customer Master Data Hub with entities coming from CRM systems and other databases with customer master data. This is good, but not good enough as told in the post The Good, Better and Best Way of Avoiding Duplicates.

During the last couple of years I have been working with the instant Data Quality service. This MDM service sits within or besides CRM systems and/or Master Data Hubs in order to achieve the only sustainable way of having a Single Customer View, which is an instant Single Customer View.

Bookmark and Share

Using External Data in Data Matching

One of the things that data quality tools does is data matching. Data matching is mostly related to the party master data domain. It is about comparing two or more data records that does not have exactly the same data but are describing the same real world entity.

Common approaches for that is to compare data records in internal master data repositories within your organization. However, there are great advantages in bringing in external reference data sources to support the data matching.

Some of the ways to do that I have worked with includes these kind of big reference data:

identityBusiness directories:

The business-to-business (B2B) world does not have privacy issues in the degree we see in the business-to-consumer (B2C) world. Therefore there are many business directories out there with a quite complete picture of which business entities exists in a given country and even in regions and the whole world.

A common approach is to first match your internal B2B records against a business directory and obtain a unique key for each business entity. The next step of matching business entities with that unique is a no brainer.

The problem is though that an automatic match between internal B2B records and a business directory most often does not yield a 100 % hit rate. Not even close as examined in the post 3 out of 10.

Address directories:

Address directories are mostly used in order to standardize postal address data, so that two addresses in internal master data that can be standardized to an address written in exactly the same way can be better matched.

A deeper use of address directories is to exploit related property data. The probability of two records with “John Smith” on the same address being a true positive match is much higher if the address is a single-family house opposite to a high-rise building, nursery home or university campus.

Relocation services:

A common cause of false negatives in data matching is that you have compared two records where one of the postal addresses is an old one.

Bringing in National Change of Address (NCOA) services for the countries in question will help a lot.

The optimal way of doing that (and utilizing business and address directories) is to make it a continuous element of Master Data Management (MDM) as explored in the post The Relocation Event.

Bookmark and Share

Unique Data = Big Money

In a recent tweet Ted Friedman of Gartner (the analyst firm) said:

ted on reference data

I think he is right.

Duplicates has always been pain number one in most places when it comes to the cost of poor data quality.

Though I have been in the data matching business for many years and been fighting duplicates with dedupliaction tools in numerous battles the war doesn’t seem to be won by using deduplication tools alone as told in the post Somehow Deduplication Won’t Stick.

Eventually deduplication always comes down to entity resolution when you have to decide which results are true positives, which results are useless false positives and wonder how many false negatives you didn’t catch, which means how much money you didn’t have in return of your deduplication investment.

Bringing in new and be that obscure reference sources is in my eyes a very good idea as examined in the post The Good, Better and Best Way of Avoiding Duplicates.

Bookmark and Share

Our Double Trouble

Royal Coat of Arms of DenmarkUsing the royal we is usually only for majestic people, but as a person with a being in two countries at the same time, I do sometimes feel that I am we.

So, this morning we once again found our way to London Heathrow Airport for one of our many trips between London and Copenhagen as we have lived in the United Kingdom the last couple of years but still have many business and private ties with The Kingdom of Denmark where we (is that was or were?) born, raised and worked and from where we still hold a passport.

Most public sector and private sector business processes and master data management implementations simply don’t cope with the fast evolving globalization. Reflecting on this, flying over Doggerland, we memorize situations where:

  • We as a prospect or customer in a global brand are stored as a duplicate record for each country as told in the post Hello Leading MDM Vendor.
  • You as an employee in a multi-national firm have a duplicate record for each country you have worked in.

People moving between countries are still treated as an exception not covered by adequate business rules and data capture procedures. Most things are sorted out eventually, but it always takes a whole lot of more trouble compared to if you just are born, raised and stays in the same country.

When we landed in Copenhagen this morning we (is that was or were?) able to use the new local smart travel card in order to travel on with public transit. But it wasn’t easy getting the card we remember. With a foreign address you can’t apply online. So we had to queue up at the Central Station, fill in a form and explain that you don’t have an official document with your address in the UK – and we avoided explaining the shocking fact that in the UK your electricity bill is your premier proof of almost anything related to your identity.

What about you? Do you have a being in several countries? Any war stories experienced related to your going back and forth?

Bookmark and Share

Famous False Positives

You should Beware of False Positives in Data Matching. A false positive in the data quality realm is a match of two (or more) identities that actually isn’t the same real world entity.

Throughout history and within art we have seen some false positives too. Here are my three favorites:

The Piltdown Man

In 1912 a British amateur archeologist apparently found a fossil claimed to be the missing link between apes and man: The so called Piltdown Man. Backed up by the British Museum it was a true discovery until 1953 when it was finally revealed as a hoax. It was however disputed during all the years but defended by the British establishment maybe due to envy on the French having a Cro-Magnon man first found there and the Germans having a name giving true discovery in Neandertal.

Eventually the Piltdown Man was exposed as a middle age human upper skull, an orangutan jawbone and chimpanzee teeth.

Barry_Nelson_as_Jimmy_Bond_in_1954
Jimmy Bond in Casino Royale

James and Jimmy Bond

As told in the post My Name is Bond. Jimmy Bond: James Bond is British intelligence and Jimmy Bond is an American agent. It’s always a question if two identities residing in different countries are the same as discussed (about me) in the post Hello Leading MDM Vendor.

Dupond et Dupont

In English they are known as Thomson and Thompson. In the original Belgian/French (and in my childhood Danish comics) piece of art about the adventures of Tintin they are known as Dupond et Dupont. They are two incompetent detectives who look alike and have names with a low edit distance and same phonetic sound. For twin names in a lot of other languages check the Wikipedia article here.

And hey, today I’m going to the creator of these two guy’s home country Belgium to be at the Belgian Data Quality Association congress tomorrow.

Bookmark and Share

Entity Resolution and Big Data

FingerprintThe Wikipedia article on Identity Resolution has this catch on the difference between good old data matching and Entity Resolution:

”Here are four factors that distinguish entity resolution from data matching, according to John Talburt, director of the UALR Laboratory for Advanced Research in Entity Resolution and Information Quality:

  • Works with both structured and unstructured records, and it entails the process of extracting references when the sources are unstructured or semi-structured
  • Uses elaborate business rules and concept models to deal with missing, conflicting, and corrupted information
  • Utilizes non-matching, asserted linking (associate) information in addition to direct matching
  • Uncovers non-obvious relationships and association networks (i.e. who’s associated with whom)”

I have a gut feeling that Data Matching and Entity (or Identity) Resolution will melt together in the future as expressed in the post Deduplication vs Identity Resolution.

If you look at the above mentioned factors that distinguish data matching from identity resolution, some of the often mentioned features in the new big data technology shine through:

  • Working with unstructured and semi-structured data is probably the most mentioned difference between working with small data versus working with big data.
  • Working with associations is a feature of graph databases or other similar technologies as mentioned in the post Will Graph Databases become Common in MDM?

So, in the quest of expanding matching small data to evolve into Entity (or Identity) Resolution we will be helped by general developments in working with big data.

Bookmark and Share