Deduplication as Part of MDM

A core intersection between Data Quality Management (DQM) and Master Data Management (MDM) is deduplication. The process here will basically involve:

  • Match master data records across the enterprise application landscape, where these records describe the same real-world entity most frequently being a person, organization, product or asset.
  • Link the master data records in the best fit / achievable way, for example as a golden record.
  • Apply the master data records / golden record to a hierarchy.

Data Matching

The classic data matching quest is to identify data records that refer to the same person being an existing customer and/or prospective customer. The first solutions for doing that emerged more than 40 years ago. Since then the more difficult task of identifying the same organization being a customer, prospective customer, vendor/supplier or other business partner has been implemented while also solutions for identifying products as being the same have been deployed.

Besides using data matching to detect internal duplicates within an enterprise, data matching has also been used to match against external registries. Doing this serves as a mean to enrich internal records while this also helps in identifying internal duplicates.

Master Data Survivorship

When two or more data records have been confirmed as duplicates there are various ways to deal with the result.

In the registry MDM style, you will only store the IDs between the linked records so the linkage can be used for specific operational and analytic purposes in source and target applications.

Further, there are more advanced ways of using the linkage as described in the post Three Master Data Survivorship Approaches.

One relatively simple approach is to choose the best fit record as the survivor in the MDM hub and then keep the IDs of the MDM purged records as a link back to the sourced application records.

The probably most used approach is to form a golden record from the best fit data elements, store this compiled record in the MDM hub and keep the IDs of the linked records from the sourced applications.

A third way is to keep the sourced records in the MDM hub and on the fly compile a golden view for a given purpose.

Hierarchy Management

When you inspect records identified as a duplicate candidate, you will often have to decide if they describe the same real-world entity or if they describe two real-world entities belonging to the same hierarchy.

Instead of throwing away the latter result, this link can be stored in the MDM hub as well as a relation in a hierarchy (or graph) and thus support a broader range of operational and analytic purposes.

The main hierarchies in play here are described in the post Are These Familiar Hierarchies in Your MDM / PIM / DQM Solution?

Family consumer citizen

With persons in private roles a classic challenge is to distinguish between the individual person, a household with a shared economy and people who happen to live at the same postal address. The location hierarchy plays a role in solving this case. This quest includes having precise addresses when identifying units in large buildings and knowing the kind of building. The probability of two John Smith records being the same person differs if it is a single-family house address or the address of a nursing home.

Family company

Organizations can belong to a company family tree. A basic representation for example used in the Dun & Bradstreet Worldbase is having branches at a postal address. These branches belong a legal entity with a headquarter at a given postal address, where there may be other individual branches too. Each legal entity in an enterprise may have a national ultimate mother. In multinational enterprises, there is a global ultimate mother. Public organizations have similar often very complex trees.

Product hierachy

Products are also formed in hierarchies. The challenge is to identify if a given product record points to a certain level in the bottom part of a given product hierarchy. Products can have variants in size, colour and more. A product can be packed in different ways. The most prominent product identifier is the Global Trade Identification Number (GTIN) which occur in various representations as for example the Universal Product Code (UPC) popular in North America and European (now International) Article Number (EAN) popular in Europe. These identifiers are applied by each producer (and in some cases distributor) at the product packing variant level.

Solutions Available

When looking for a solution to support you in this conundrum the best fit for you may be a best-of-breed Data Quality Management (DQM) tool and/or a capable Master Data Management (MDM) platform.

This Disruptive MDM / PIM /DQM List has the most innovative candidates here.

Analyst MDM / PIM / DQM Solution Reports Update March 2021

Analyst firms occasionally publish market reports with a generic solution overview for Master Data Management (MDM), Product Information Management (PIM) and Data Quality Management (DQM).

Here is an overview of the latest major reports:

MDM PIM DQM solutions analyst firms

3 ways to learn more:

  • You can check out many of the included solutions on The Disruptive MDM / PIM / DQM List.
  • You can get a free ranking that also include the rising stars on the solution market and is based on your context, scope and requirements here.
  • You can book a free short online meeting with me for further discussion on your business case as part of my engagement at the consultancy firm Astrocytia here.

Constellation Research MDM Shortlist Q1 2021

There is a new MDM market report with vendor assessment out. It is the Constellation ShortList™ Master Data Management Q1 2021.

The report highlights a shortlist of the solutions you have to know. This one has 6 solutions:

Compared to the previous shortlist, Stibo Systems has been dropped. The explanation is: “This Q1 2021 update removes Stibo Systems from this ShortList due to what Constellation sees as slow progress on cloud deployment options.”

I find this a bit peculiar.

While cloud MDM is an important theme and Stibo Systems has not been a front runner in this game, it is by far not the only important theme, which strangely also is stated in the reports threshold criteria.

In my work with selecting a longlist/shortlist/PoC candidate for actual MDM considerations at 250 organizations per year via The Disruptive MDM/PIM/DQM List, Stibo Systems is part of many shortlists and is the best fit in some cases.

Also, Stibo Systems is a front runner in some other important MDM themes. One example is Interenterprise MDM through Product Data Syndication.

Interenterprise MDM Will be Hot

Interenterprise Master Data Management is about how organizations can collaborate by sharing master data with business partners in order to optimize own master data and create new data driven revenue models together with business partners.

It is in my eyes one of the most promising trends in the MDM world. However, it is not going to happen tomorrow. The quest of breaking down internal data and knowledge silos within organizations around is still not completed in most enterprises. Nevertheless, there is a huge business opportunity to pursue for the enterprises who will be in the first wave of interenterprise data sharing through interenterprise MDM.

A poll in the LinkedIn MDM – Master Data Management group revealed that MDM practitioners are aware of that Interenterprise MDM will be hot sooner or later:

For the range of industries that work with tangible products, one of the most obvious places to start with Interenterprise MDM is by excelling – in the meaning of eliminating excel files exchange – in Product Data Syndication (PDS). Learn more in the post The Role of Product Data Syndication in Interenterprise MDM.

Welcome Winpure on The Disruptive MDM / PIM / DQM List

There is a new kid on the block on The Disruptive MDM / PIM / DQM List. Well, Winpure is not a new solution at all. It is a veteran tool in the data matching space.

Recently the folks at Winpure have embarked on a journey to take best-of-breed data matching into the contextual MDM world.

Data matching is often part of Master Data Management implementations, not at least when the party domain (customers, suppliers, other business partners) is encompassed.

However, it not always the best approach to utilize the data matching capabilities in MDM platforms. In some cases, these are not very effective. In other cases, the matching is needed before data is loaded into the MDM platform. And then many MDM initiatives do not include an MDM platform, but relies on capabilities in ERP and CRM applications.

Here, there is a need for a contextual MDM component with strong data matching capabilities as Winpure.

Learn more about Winpure here.

Movements in the Gartner MDM MQ 2021

This is the fourth and final blog post on the main take away from the fresh published Gartner Magic Quadrant for Master Data Management Solutions 2021.

The first post here touched on the quadrant advancements being the vendors that have moved between the 4 quadrants.

Unfortunately, Gartner has not, as in previous years, stated the revenue for all the vendors, so that you can determine the growth directly. Gartner though mentions, that Semarchy, Reltio and Ataccama had 2-digit revenue growth and that IBM had shrinking MDM revenue – again. We may then assume that the other recurring vendors had 1-digit revenue growth. However, it is mentioned that Riversand had a 10m USD revenue growth, which could indicate a 2-digit revenue growth for them too.

Combining quadrant advancements and revenue growth statements results in this movement overview:

Based on statements in the Gartner MDM MQ

Watch Out for Interenterprise MDM

In the recent Gartner Magic Quadrant for Master Data Management Solutions there is a bold statement:

By 2023, organizations with shared ontology, semantics, governance and stewardship processes to enable interenterprise data sharing will outperform those that don’t.

The interenterprise data sharing theme was covered a couple of years ago here on the blog in the post What is Interenterprise Data Sharing?

Interenterprise data sharing must be leveraged through interenterprise MDM, where master data are shared between many companies as for example in supply chains. The evolution of interenterprise MDM and the current state of the discipline was touched in the post MDM Terms In and Out of The Gartner 2020 Hype Cycle.

In the 00’s the evolution of Master Data Management (MDM) started with single domain / departmental solutions dominated by Customer Data Integration (CDI) and Product Information Management (PIM) implementations. These solutions were in best cases underpinned by third party data sources as business directories as for example the Dun & Bradstreet (D&B) world base and second party product information sources as for example the GS1 Global Data Syndication Network (GDSN).

In the previous decade multidomain MDM with enterprise-wide coverage became the norm. Here the solution typically encompasses customer-, vendor/supplier-, product- and asset master data. Increasingly GDSN is supplemented by other forms of Product Data Syndication (PDS). Third party and second party sources are delivered in the form of Data as a Service that comes with each MDM solution.

In this decade we will see the rise of interenterprise MDM where the solutions to some extend become business ecosystem wide, meaning that you will increasingly share master data and possibly the MDM solutions with your business partners – or else you will fade in the wake of the overwhelming data load you will have to handle yourself.

So, watch out for not applying interenterprise MDM.

PS: That goes for MDM end user organizations and MDM platform vendors as well.

The Quasimodo Quadrant

The 2021 Magic Quadrant for Master Data Management (MDM) Solutions went public yesterday as reported here.

Quasimodo is the main protagonist of the novel The Hunchback of Notre-Dame. Somehow the plot of vendors in this year’s MDM quadrant looks like (a caricature of) a hunchback. The vendors are in general better in “Ability to Execute” than in “Completeness of Vision”.

So, MDM vendors in general may lack something in market understanding, marketing strategy, product strategy, innovation and more.

This does resonate with me. As also stated in the quadrant some vendors are too invisible in the market buzz. There are heaps of emerging MDM use cases where it is not that easy to find a suitable solution not to say finding one well-fit solution for a range of use cases in a given organization with a given IT landscape.

Gartner Reports 28 % Increase in Client Inquiries for MDM

The new Gartner Magic Quadrant for Master Data Management Solutions 2021 is out.

There is as usual two main pieces of take away:

  • The inclusion and positioning of the vendors
  • The message about where the market is heading

The first one is here:

Some noteworthy movements from the previous quadrant are:

  • Semarchy and Riversand have advanced to being leaders
  • Contentserv, Reltio and Ataccama have moved up as challengers
  • Syniti and PiLog are new inclusions in the report
  • Propecta MDO has emerged into the honourable mentions part of the report

The market is probably also heading up. As stated in the report: “From March 2020 — when COVID-19 became a pandemic and global crisis — until December 2020, Gartner had a 28% increase in client inquiries compared with the same period in 2019”.

You can, against a small set of your Personally Identifiable Information, get a free copy of the report at the Semarchy site here.

Stay tuned for more pieces of take away from the quadrant report in the coming days.