Data Matching and Real-World Alignment

Data matching is a sub discipline within data quality management. Data matching is about establishing a link between data elements and entities, that does not have the same value, but are referring to the same real-world construct.

The most common scenario for data matching is deduplication of customer data records held across an enterprise. In this case we often see a gap between what we technically try to do and the desired business outcome from deduplication. In my experience, this misalignment has something to do with real-world alignment.

Data Matching and Real World Alignment

What we technically do is basically to find a similarity between data records that typically has been pre-processed with some form of standardization. This is often not enough.

Location Intelligence

Deduplication and other forms of data matching with customer master data revolves around names and addresses.

Standardization and verification of addresses is very common element in data quality / data matching tools. Often such at tool will use a service either from its same brand or a third-party service. Unfortunately, no single service is often enough. This is because:

  • Most services are biased towards a certain geography. They may for example be quite good for addresses in The United States but very poor compared to local services for other geographies. This is especially true for geographies with multiple languages in play as exemplified in the post The Art in Data Matching.
  • There is much more to an address than the postal format. In deduplication it is for example useful to know if the address is a single-family house or a high-rise building, a nursing home, a campus or other building with lots of units.
  • Timeliness of address reference data is underestimated. I recently heard from a leader in the Gartner Quadrant for Data Quality Tools that a quarterly refresh is fine. It is not, as told in the post Location Data Quality for MDM.

Identity Resolution

The overlaps and similarities between data matching and identity resolution was discussed in the post Deduplication vs Identity Resolution.

In summary, the capability to tell if two data records represent the same real-world entity will eventually involve identity resolution. And as this is very poorly supported by data quality tools around, we see that a lot of manual work will be involved if the business processes that relies on the data matching cannot tolerate too may, or in some cases any, false positives – or false negatives.

Hierarchy Management

Even telling that a true positive match is true in all circumstances is hard. The predominant examples of this challenge are:

  • Is a match between what seems to be an individual person and what seems to be the household where the person lives a true match?
  • Is a match between what seems to be a person in a private role and what seems to be the same person in a business role a true match? This is especially tricky with sole proprietors working from home like farmers, dentists, free lance consultants and more.
  • Is a match between two sister companies on the same address a true match? Or two departments within the same company?

We often realize that the answer to the questions are different depending on the business processes where the result of the data matching will be used.

The solution is not simple. The data matching functionality must, if we want automated and broadly usable results, be quite sophisticated in order to take advantage of what is available in the real-world. The data model where we hold the result of the data matching must be quite complex if we want to reflect the real-world.

Data Quality Tools are Vital for Digital Transformation

The Gartner Magic Quadrant for Data Quality Tools 2019 is out. It will take you 43 minutes to read through, so let me provide a short overview.

Gartner says that “data quality tools are vital for digital business transformation, especially now that many have emerging features like automation, machine learning, business-centric workflows and cloud deployment models.”

The data quality software tools market was at 1.61 billion USD in 2017 which was an increase of 11.6% compared to 2016.

Gartner sees that end-user demand is shifting toward having broader capabilities spanning data management and information governance. Therefore, the data quality tool market continues to interact closely with the markets for data integration tools and for Master Data Management (MDM) products.

Among the capabilities mentioned is multidomain support meaning capabilities covering all the specific data subject areas, such as customer, product, asset and location. Interestingly Gartner continues to focus on customer as the one of several party data domains out there. In my experience, there are the same data quality challenges with vendor and other business partner data as well as with employee data.

According to Gartner, data quality tool vendors are competing to address shifting market requirements by introducing an array of new technologies, such as machine learning, interactive visualization and predictive/prescriptive analytics, all of which they are embedding in data quality tools. They are, according to Gartner, also offering new pricing models, based on open source and subscriptions.

The vendors included in the quadrant are positioned as seen below:

Gartner DQ 2019

If you want a full copy of the report you can, against providing your personal data, get it from Information Builders here.

Toward the Third Generation of MDM

The Forrester Wave™: Master Data Management, Q1 2019 is out. The subtitle of the report is “Toward the Third Generation of Master Data Management.”

This resonates very well with my view as for example expressed is the post Three Stages of MDM Maturity.

The Forrester Report has this saying on that theme: “The internet of things has led to systems of automation and systems of design, which introduce new MDM usage scenarios to support co-design and the exchange of information on customers, products, and assets within ecosystems”.

Else, the report of course ranks the best selling MDM solutions as seen below:

Forrester MDM Wave 2019

You can get a free copy of the report from Riversand here or from Reltio here.

The Recent Coupling on the MDM Market

When it has been about mergers and acquisitions on the Master Data Management (MDM) solution market, there have until recently not been so much going around since 2012. Rather we have seen people leaving the established vendors and formed or joined new companies.

But, three months ago Tibco was coupled with Orchestra.

Then on Valentine’s day 2019 Symphony Technology Group Acquired PIM and MDM Provider EnterWorks with the aim of coupling their offerings with the ones from WinShuttle. WinShuttle has been more a data management generalist company with focus on ERP data – not at least in SAP. This merger ties into the trend of extending MDM platforms to other kinds of data than traditional master data. It will also make an alternative to SAPs own MDM and data governance offering called MDG.

Fourteen days later there was a new coupling as reported in the post MDM Market News: Informatica acquires AllSight. This must also be seen as a step in the trend of providing an extended MDM platform with Artificial Intelligence (AI) capabilities. Also, Informatica is here going against the new MDM solution provider Reltio, who has been successful in promoting their big data extended MDM platform.

Both Enterworks and AllSight (and Reltio too) are listed on The Disruptive Master Data Management List.

MDM Coupling

 

Counting MDM Licenses

The Gartner Magic Quadrant for Master Data Management (MDM) Solutions 2018 was published last month.

Some of the numbers in the market that were revealed in the report was the number and distribution of MDM licenses from the included vendors. These covered their top-three master data domains and estimated license counts as well as the number of customers managing multiple domains:

mdm licenses

One should of course be aware of the data quality issues related to comparing these numbers, as they in some degree are estimates based on different perceptions at the included vendors. So, let me just highlight these observations:

  • The overall number of MDM licenses and unique MDM customers (at the included vendors) is not high. Under 10,000 organizations world-wide is running such a solution. The potential new market out there for the salesforce at the MDM vendors is huge.
  • If you find an existing MDM solution user organization, they probably have a solution from SAP or Informatica – or maybe IBM. To be complete, Oracle has been dropped from the MDM quadrant, they practically do not promote their MDM solutions anymore, but there are still existing solutions operating out there.
  • The reign of Customer MDM is over. Product MDM is selling and multidomain is becoming the norm. Several MDM vendors are making their way into the quadrant from a Product Information Management (PIM) base as reported in the post The Road from PIM to Multidomain MDM.

PS: If you, as an end customer organization or a MDM and PIM vendor, want to work with me on the consequences for MDM solutions, here are some Popular Offerings for you.

Flying by Ultima Thule and Data Management

Ultima Thule is a name for a distant place beyond the known world and the nickname of the most distant object in the solar system closely observed by a man-made object today the 1st January 2019. Before the flyby scientists were unsure if it was two objects, a peanut formed object or another shape. The images probing what it is will be downloaded during the next couple of months.

You can make many analogies between exploring space and data management. On this blog the journey has passed the similarity between Neutron Star Collision and Data Quality. The Gravitational Waves in the MDM World has been observed and so has the Gravitational Collapse in the PIM Space. The notion of A Product Information Management (PIM) Solar System has also been suggested.

Happy New Year and wishing you all well in the data management journey beyond Ultima Thule.

Ultima Thule
Source: Nasa via BBC

The Road from PIM to Multidomain MDM

The previous post on this blog was about the recent Gartner Magic Quadrant for Master Data Management Solutions. The post was called Who Will Make the Next Disruption on the MDM Market?

In a comment to this post Nadim observes that this Gartner quadrant is mixing up pure MDM players and PIM players.

That is true. It has always been a discussion point if one should combine or separate solutions for Master Data Management (MDM) and Product Information Management (PIM). This is a question to be asked by end user organizations and it is certainly a question the vendors on the market(s) ask themselves.

If we look at the vendors included in the 2018 Magic Quadrant the PIM part is represented in some different ways.

I would say that two of the newcomers, Viamedici and Contentserv (yellow dots in below figure), are mostly PIM players today. This is also mentioned as a caution by Gartner and is a reason for the current left-bottom’ish placement in the quadrant. But both companies want to be more multidomain MDM’ish.

PIM to MDM vendors

8 years ago, I was engaged at Stibo Systems as part of their first steps on the route from PIM to multidomain MDM. Enterworks and Riversand (the orange dots in above figure) is on the same road.

Informatica has taken a different path towards the same destination as they back in 2012 bought the PIM player Heiler. Gartner has some cautions about how well the MDM and PIM components makes up a whole in the Informatica offerings and similar cautions was expressed around the Forrester PIM Wave as seen in the comments to the post There is no PIM quadrant, but there is a PIM wave.

Who Will Make the Next Disruption on the MDM Market?

As reported in the previous post on this blog, there were some Movements in the Gartner MDM Magic Quadrant 2018.

But there was also a good deal of steadiness. Informatica still holds pole position in the race for going towards the top-right corner. Orchestra EBX, now disguised as Tibco EBX, is trailing them in the leaders quadrant. Old challengers as IBM, SAP and Stibo is watching them among the newcomers in the challengers quadrant and still as the only visionary – according to Gartner – we have Riversand.

In the niche players quadrant, we also still have Ataccama and Enterworks.

But there is still lot of free space in the top-right corner. There is still room for disruption. Gartner mentions some traditional forces still on the move being the good old 360 degree view on party data (customer, patient and the bit US biased provider) as well as Product Information Management (PIM) maybe in new wrappings as PCM or PXM.

Gartner still promotes this stuff as Application Data Management (ADM).

Else Gartner focuses on these tracks to disruption:

  • Subscription pricing – both for on-promise and cloud
  • The need for professional services – still a lot of money goes into that pit
  • Cloud based deployment – going up from 19% last year to 24 % this year among Gartner’s respondents
  • Machine Learning (ML) and Artificial Intelligence (AI)

Informatica is one of the vendors who – against your registration – offers a free copy of the 2018 Gartner MDM Quadrant.

If you like our personal data to be in the hands of Profisee, here is their free copy of the 2018 Gartner MDM Quadrant.

Another option is to get it from Riversand. Here is their free copy of the 2018 Gartner MDM Quadrant.

MDM 2018 Disruption

Tibco, Orchestra and Netrics

Today’s Master Data Management (MDM) news is that Tibco Software has bought Orchestra Networks. So, now the 11 vendors in last year’s Gartner Magic Quadrant for Master Data Management Solutions is down to 10.

If Gartner is still postponing this year’s MDM quadrant, they may even manage to reflect this change. We are of course also waiting to see if newcomers will make it to the quadrant and make the crowd of vendors in there go back to an above 10 number. Some of the candidates will be likes of Reltio and Semarchy.

Else, back to the takeover of Orchestra by Tibco, this is not the first time Tibco buys something in the MDM and Data Quality realm. Back in 2010 Tibco bought the data quality tool and data matching front runner Netrics as reported in the post What is a best-in-class match engine?

Then Tibco didn’t defend Netrics’ position in the Gartner Magic Quadrant for Data Quality Tools. The latest Data Quality Tool quadrant is also as the MDM quadrant from 2017 and was touched on this blog here.

So, will be exciting to see how Tibco will defend the joint Tibco MDM solution, which in 2017 was a sliding niche player at Gartner, and the Orchestra MDM solution, which in 2017 was a leader at the Gartner MDM quadrant.

Tibco Orchestra Netrics