Data Matching and Real-World Alignment

Data matching is a sub discipline within data quality management. Data matching is about establishing a link between data elements and entities, that does not have the same value, but are referring to the same real-world construct.

The most common scenario for data matching is deduplication of customer data records held across an enterprise. In this case we often see a gap between what we technically try to do and the desired business outcome from deduplication. In my experience, this misalignment has something to do with real-world alignment.

Data Matching and Real World Alignment

What we technically do is basically to find a similarity between data records that typically has been pre-processed with some form of standardization. This is often not enough.

Location Intelligence

Deduplication and other forms of data matching with customer master data revolves around names and addresses.

Standardization and verification of addresses is very common element in data quality / data matching tools. Often such at tool will use a service either from its same brand or a third-party service. Unfortunately, no single service is often enough. This is because:

  • Most services are biased towards a certain geography. They may for example be quite good for addresses in The United States but very poor compared to local services for other geographies. This is especially true for geographies with multiple languages in play as exemplified in the post The Art in Data Matching.
  • There is much more to an address than the postal format. In deduplication it is for example useful to know if the address is a single-family house or a high-rise building, a nursing home, a campus or other building with lots of units.
  • Timeliness of address reference data is underestimated. I recently heard from a leader in the Gartner Quadrant for Data Quality Tools that a quarterly refresh is fine. It is not, as told in the post Location Data Quality for MDM.

Identity Resolution

The overlaps and similarities between data matching and identity resolution was discussed in the post Deduplication vs Identity Resolution.

In summary, the capability to tell if two data records represent the same real-world entity will eventually involve identity resolution. And as this is very poorly supported by data quality tools around, we see that a lot of manual work will be involved if the business processes that relies on the data matching cannot tolerate too may, or in some cases any, false positives – or false negatives.

Hierarchy Management

Even telling that a true positive match is true in all circumstances is hard. The predominant examples of this challenge are:

  • Is a match between what seems to be an individual person and what seems to be the household where the person lives a true match?
  • Is a match between what seems to be a person in a private role and what seems to be the same person in a business role a true match? This is especially tricky with sole proprietors working from home like farmers, dentists, free lance consultants and more.
  • Is a match between two sister companies on the same address a true match? Or two departments within the same company?

We often realize that the answer to the questions are different depending on the business processes where the result of the data matching will be used.

The solution is not simple. The data matching functionality must, if we want automated and broadly usable results, be quite sophisticated in order to take advantage of what is available in the real-world. The data model where we hold the result of the data matching must be quite complex if we want to reflect the real-world.

The Recent Coupling on the MDM Market

When it has been about mergers and acquisitions on the Master Data Management (MDM) solution market, there have until recently not been so much going around since 2012. Rather we have seen people leaving the established vendors and formed or joined new companies.

But, three months ago Tibco was coupled with Orchestra.

Then on Valentine’s day 2019 Symphony Technology Group Acquired PIM and MDM Provider EnterWorks with the aim of coupling their offerings with the ones from WinShuttle. WinShuttle has been more a data management generalist company with focus on ERP data – not at least in SAP. This merger ties into the trend of extending MDM platforms to other kinds of data than traditional master data. It will also make an alternative to SAPs own MDM and data governance offering called MDG.

Fourteen days later there was a new coupling as reported in the post MDM Market News: Informatica acquires AllSight. This must also be seen as a step in the trend of providing an extended MDM platform with Artificial Intelligence (AI) capabilities. Also, Informatica is here going against the new MDM solution provider Reltio, who has been successful in promoting their big data extended MDM platform.

Both Enterworks and AllSight (and Reltio too) are listed on The Disruptive Master Data Management List.

MDM Coupling

 

1,000 Blog Posts and More to Come

number_1000I just realized that this post will be number 1,000 published on this blog. So, let me not say something new but just recap a little bit on what it has been all about in the last nearly 10 years of running a blog on some nerdy stuff.

Data quality has been the main theme. When writing about data quality one will not avoid touching Master Data Management (MDM). In fact, the most applied category used here on this site, with 464 and counting entries, is Master Data.

The second most applied category on this blog is, with 219 entries, Data Architecture.

The most applied data quality activity around is data matching. As this is also where I started my data quality venture, there has been 192 posts about Data Matching.

The newest category relates to Product Information Management (PIM) and is, with 20 posts at the moment, about Product Data Syndication.

Even though that data quality is a serious subject, you must not forget to have fun. 66 posts, including a yearly April Fools post, has been categorized as Supposed to be a Joke.

Thanks to all who are reading this blog and not least to all who from time to time takes time to make a comment, like and share.

Get Your Hands Dirty with Data

When working with data management – and not at least listening to and reading stuff about data management – there is in my experience too little work with the actual data going around out there.

I know this from my own work. Most often presentations, studies and other decision support in the data management realm is based on random anecdotes about the data rather than looking at the data. And don’t get me wrong. I know that data must be seen as information in context, that the processes around data is crucial, that the people working with data is key to achieving better data quality and much more cleverness not about the data as is.

data management wordsBut time and again I always realize that you get the best understanding about the data when getting your hands dirty with working with the data from various organizations. For me that have been when doing a deduplication of party master data, when calibrating a data matching engine for party master data against third party reference data, when grouping and linking product information held by trading partners, when relating other master data to location reference data and all these activities we do in order to raise data quality and get a grip on Master Data Management (MDM) and Product Information Management (PIM).

Well, perhaps it is just me and because I never liked real dirt and gardening.

The Road from PIM to Multidomain MDM

The previous post on this blog was about the recent Gartner Magic Quadrant for Master Data Management Solutions. The post was called Who Will Make the Next Disruption on the MDM Market?

In a comment to this post Nadim observes that this Gartner quadrant is mixing up pure MDM players and PIM players.

That is true. It has always been a discussion point if one should combine or separate solutions for Master Data Management (MDM) and Product Information Management (PIM). This is a question to be asked by end user organizations and it is certainly a question the vendors on the market(s) ask themselves.

If we look at the vendors included in the 2018 Magic Quadrant the PIM part is represented in some different ways.

I would say that two of the newcomers, Viamedici and Contentserv (yellow dots in below figure), are mostly PIM players today. This is also mentioned as a caution by Gartner and is a reason for the current left-bottom’ish placement in the quadrant. But both companies want to be more multidomain MDM’ish.

PIM to MDM vendors

8 years ago, I was engaged at Stibo Systems as part of their first steps on the route from PIM to multidomain MDM. Enterworks and Riversand (the orange dots in above figure) is on the same road.

Informatica has taken a different path towards the same destination as they back in 2012 bought the PIM player Heiler. Gartner has some cautions about how well the MDM and PIM components makes up a whole in the Informatica offerings and similar cautions was expressed around the Forrester PIM Wave as seen in the comments to the post There is no PIM quadrant, but there is a PIM wave.

Tibco, Orchestra and Netrics

Today’s Master Data Management (MDM) news is that Tibco Software has bought Orchestra Networks. So, now the 11 vendors in last year’s Gartner Magic Quadrant for Master Data Management Solutions is down to 10.

If Gartner is still postponing this year’s MDM quadrant, they may even manage to reflect this change. We are of course also waiting to see if newcomers will make it to the quadrant and make the crowd of vendors in there go back to an above 10 number. Some of the candidates will be likes of Reltio and Semarchy.

Else, back to the takeover of Orchestra by Tibco, this is not the first time Tibco buys something in the MDM and Data Quality realm. Back in 2010 Tibco bought the data quality tool and data matching front runner Netrics as reported in the post What is a best-in-class match engine?

Then Tibco didn’t defend Netrics’ position in the Gartner Magic Quadrant for Data Quality Tools. The latest Data Quality Tool quadrant is also as the MDM quadrant from 2017 and was touched on this blog here.

So, will be exciting to see how Tibco will defend the joint Tibco MDM solution, which in 2017 was a sliding niche player at Gartner, and the Orchestra MDM solution, which in 2017 was a leader at the Gartner MDM quadrant.

Tibco Orchestra Netrics

It is time to apply AI to MDM and PIM

The intersection between Artificial Intelligence (AI) and Master Data Management (MDM) – and the associated discipline Product Information Management (PIM) – is an emerging topic.

A use case close to me

In my work at setting up a service called Product Data Lake the inclusion of AI has become an important topic. The aim of this service is to translate between the different taxonomies in use at trading partners for example when a manufacturer shares his product information with a merchant.

In some cases the manufacturer, the provider of product information, may use the same standard for product information as the merchant. This may be deep standards as eCl@ss and ETIM or pure product classification standards as UNSPSC. In this case we can apply deterministic matching of the classifications and the attributes (also called properties or features).

Product Data Syndication

However, most often there are uncovered areas even when two trading partners share the same standard. And then again, the most frequent situation is that the two trading partners are using different standards.

In that case we initially will use human resources to do the linking. Our data governance framework for that includes upstream (manufacturer) responsibility, downstream (merchant) responsibility and our ambassador concept.

As always, applying too much human interaction is costly, time consuming and error prone. Therefore, we are very eagerly training our machines to be able to do this work in a cost-effective way, within a much shorter time frame and with a repeatable and consistent outcome to the benefit of the participating manufacturers, merchants and other enterprises involved in exchanging products and the related product information.

Learning from others

This week I participated in a workshop around exchanging experiences and proofing use cases for AI and MDM. The above-mentioned use case was one of several use cases examined here. And for sure, there is a basis for applying AI with substantial benefits for the enterprises who gets this. The workshop was arranged by Camelot Management Consultants within their Global Community for Artificial Intelligence in MDM.