Scaling Up The Disruptive MDM / PIM / DQM List

The Disruptive MDM / PIM / DQM List was launched in the late 2017.

Here the first innovative Master Data Management (MDM) and Product Information Management (PIM) tool vendors joined the list with a presentation page showcasing the unique capabilities offered to the market.

The blog was launched at the same time. Since then, a lot of blog posts – including guest blog posts – have been posted. The topics covered have been about the list, the analysts and their market reports as well as the capabilities that are essential in solutions and their implementation.

In 2019 the MDM and PIM tool vendors were joined by some of the forward-looking best-of-breed Data Quality Management (DQM) tool vendors.

The Select Your Solution service was launched at the same time. Here organizations – and their consultants – who are on the look for a MDM / PIM / DQM solution can jumpstart the selection process by getting a list of the best solutions based on their individual context, scope and requirements. More than 100 hundred end user organizations or their consultants have received such a list.

MDMlist timeline

Going into the 20es the list is ready to be scaled up. The new sections being launched are:

  • The Service List: In parallel with the solution providers it is possible for service providers – like implementation partners – to register on The Service List. This list will run besides The Solution List. For an organization on the look for an MDM / PIM / DQM solution it is equally important to select the right solution and the right implementation partner.
  • The Resource List: This is a list – going live soon – with white papers, webinars and other content from potentially all the registered tool vendors and service providers divided into sections of topics. Here end user organizations can get a quick overview of the content available within the themes that matters right now.
  • The Case Study List: The next planned list is a list of case studies from potentially all the registered tool vendors and service providers. The list will be divided into industry sectors. Here end user organizations can get a quick overview of studies from similar organizations.

If you have questions and/or suggestions for valuable online content on the list, make a comment or get in contact here:

Analyst MDM / PIM / DQM Solution Reports Update March 2020

Analyst firms occasionally publish market reports with solution overview for Master Data Management (MDM), Product Information Management (PIM) and Data Quality Management (DQM).

The publication schedule from the analyst firms can be unpredictable.

Information Difference is an exception. There have during the years every year been a Data Quality landscape named Q1 and published shortly after that quarter and an MDM landscape named Q2 and published shortly after that quarter. However, these reports are relying on participation from relevant vendors and not all vendors prioritize this scheme.

Forrester is quite unpredictable both with timing and which market segments (MDM, PIM, DQM) to be covered.

Gartner is a bit steadier. However, for example the MDM solution reports have been coming in varying intervals during the latest years.

Here is an overview of the latest major reports:

Stay tuned on this blog to get the latest on analyst reports and news on market movements.

MDM PIM DQM analysts and solutions

Take Part in State of Data 2020

KDR Recruitment is a data management recruitment company and one of those rare recruitment agencies that genuinely express an interest in the disciplines covered.

This is manifested in among other things a yearly survey and report about the state of data that also was touched on this blog five years ago in the post Integration Matters.

This year the surveyed topics include for example how to use data analysis, new skills needed and the most effective ways to improve data quality. You can participate with your experience and observations here at State of Data 2020.

KDR state of data 2020

The Two Data Quality Definitions

If you search on Google for “data quality” you will find the ever-recurring discussion on how we can define data quality.

This is also true for the top ranked none sponsored articles as the Wikipedia page on data quality and an article from Profisee called Data Quality – What, Why, How, 10 Best Practices & More!

The two predominant definitions are that data is of high quality if the data:

  • Is fit for the intended purpose of use.
  • Correctly represent the real-world construct that the data describes.

Personally, I think it is a balance.

Data Quality Definition

In theory I am on the right side. This is probably because I most often work with master data, where the same data have multiple purposes.

However, as a consultant helping organizations with getting the funding in place and getting the data quality improvement done within time and budget I do end up on the other side.

What about you? Where do you stand in this question?

10 Data Management TLAs You Should Know

TLA stands for Three Letter Acronym. The world is full of TLAs. The IT world is full of TLAs. The Data Management world is full of TLAs. Here are 10 TLAs from the data management world that have been mentioned a lot of times on this blog and the sister blog over at The Disruptive MDM / PIM / DQM List:

MDM = Master Data Management can be defined as a comprehensive method of enabling an enterprise to link all of its critical data to a common point of reference. When properly done, MDM improves data quality, while streamlining data sharing across personnel and departments. In addition, MDM can facilitate computing in multiple system architectures, platforms and applications. You can find the source of this definition and 3 other – somewhat similar – definitions in the post 4 MDM Definitions: Which One is the Best?

PIM = Product Information Management is a discipline that overlaps MDM. In PIM you focus on product master data and a long tail of specific product information related to each given classification of products. This data is used in omni-channel scenarios to ensure that the products you sell are presented with consistent, complete and accurate data. Learn more in the post Five Product Information Management Core Aspects.

DAM = Digital Asset Management is about handling rich media files often related to master data and especially product information. The digital assets can be photos of people and places, product images, line drawings, brochures, videos and much more. You can learn more about how these first 3 mentioned TLAs are connected in the post How MDM, PIM and DAM Stick Together.

DQM = Data Quality Management is dealing with assessing and improving the quality of data in order to make your business more competitive. It is about making data fit for the intended (multiple) purpose(s) of use which most often is best to achieved by real-world alignment. It is about people, processes and technology. When it comes to technology there are different implementations as told in the post DQM Tools In and Around MDM Tools.

RDM = Reference Data Management encompass those typically smaller lists of data records that are referenced by master data and transaction data. These lists do not change often. They tend to be externally defined but can also be internally defined within each organization. Learn more in the post What is Reference Data Management (RDM)?

10 TLA show

CDI = Customer Data Integration, which is considered as the predecessor to MDM, as the first MDMish solutions focussed on federating customer master data handled in multiple applications across the IT landscape within an enterprise. You may ask: What Happened to CDI?

CDP = Customer Data Platform is an emerging kind of solution that provides a centralized registry of all data related to parties regarded as (prospective) customers at an enterprise. Right now, we see such solutions coming both from MDM solution vendors and CRM vendors as reported in the post CDP: Is that part of CRM or MDM?

ADM = Application Data Management, which is about not just master data, but all critical data however limited to a single (suite of) application(s) at the time. ADM is an emerging term and we still do not have a well-defined market as examined in the post Who are the ADM Solution Providers?

PXM = Product eXperience Management is another emerging term that describes a trend to distance some PIM solutions from the MDM flavour and more towards digital experience / customer experience themes. Read more about it in the post What is PxM?

PDS = Product Data Syndication, which connects MDM, PIM (and other) solutions at each trading partner with each other within business ecosystems. As this is an area where we can expect future growth along with the digital transformation theme, you can get the details in the post What is Product Data Syndication (PDS)?

Movements in the Constellation Research MDM Shortlist

One of the not so often mentioned analyst MDM market reports is the Constellation Shortlist™ Master Data Management.

The Q3 2018 version was mentioned here on the blog in the post Making Your MDM Vendor Longlist and Shortlist.

The Q3 2019 version has these changes compared to the shortlist a year ago:

  • Orchestra Networks is renamed to Tibco EBX
  • Oracle CDM Cloud is joined by Oracle Product MDM
  • Stibo Systems is a new entry

Constellation MDM ShortlistTwo observations:

  • At analyst firms Gartner and Forrester, Oracle is not considered as a (major) MDM market player anymore.
  • SAP MDG is the only megavendor solution not reaching this generic shortlist

PS: If you need a shortlist tailored to your context, scope and requirements, you can get it on The Disruptive MDM list here.

When Vendors Decline to Participate in Analyst Research

In the Master Data Management (MDM) and Product Information Management (PIM) space there are some analyst market reports with vendor rankings used by organizations when doing a tool selection project.

These reports are based on the analyst’s survey at their customers and perhaps other end user organizations as well as the analysts research in corporation with the solution vendor. However, sometimes the latter part does not happen.

One example was the Gartner late 2017 Magic Quadrant for MDM solutions where IBM declined to participate as reported in the post Why IBM Declined to Participate in The Gartner MDM Magic Quadrant.

Another example is the Forrester and Informatica dysfunctional relationship. In the Forrester 2019 MDM Wave it is stated that “Informatica declined to participate in our research. This was also apparent in the Forrester 2018 PIM Wave where Forrester’s placement of Informatica as a Germany-based vendor didn’t reflect movements (and perhaps achievements) since 2012 as told in the post MDM Alternative Facts.

Both Gartner and Forrester have though positioned IBM and Informatica in their plot with the note that the research did not include interaction with the vendor.

Analyst Relationship

Information Difference has taken another approach and does not include nonparticipating vendors as discussed in the post Movements in the MDM Vendor Landscape 2019.

This challenge is a bit close to me as I am running a list of MDM / PIM / DQM vendors where there now also is a ranking service based on individual context, scope and requirements. Here I have chosen to include vendor solutions on the three above analyst reports and the list itself as noted in select your solution step 4.

MDM Vendor Revenues According to Gartner

A recent post on this blog has the title MDM Spending Might be 5 Billion USD per Year.

The 5 B USD figure was a guestimate based on an estimate by Information Difference about the total yearly revenue at 1.6 B USD collected by MDM software vendors.

Prash Chandramohan, who has his daily work at Informatica, made a follow up blog post with the title The Size of the Global Master Data Management Market. In here Prash mentions some of the uncertainties there are when making such a guestimate.

In a Linkedin discussion on that post Ben Rund, who is at Riversand, asks about other sources – Gartner and others.

The latest Gartner MDM Magic Quadrant mentions the 2017 revenues as estimated by Gartner:

MDM market vendors re Gartner

It is worth noticing that Oracle is not a Gartner MDM Magic Quadrant vendor anymore and the Gartner report indicate that Oracle still have an MDM (or is it ADM?) revenue from the installed base resembling the ones of the other mega-vendors being SAP, IBM and Informatica.

Update: The revenues mentioned are assumed to be software license and maintenance. The vendors may then have additional professional services revenue.

The 14 MDM vendors that qualified for inclusion in the latest quadrant constituted, according to Gartner estimates, 84% of the estimated MDM market revenue (software and maintenance) for 2017  – which according to Gartner criteria must be excluding Oracle.

The Trouble with Data Quality Dimensions

Data Quality Dimensions

Data quality dimensions are some of the most used terms when explaining why data quality is important, what data quality issues can be and how you can measure data quality. Ironically, we sometimes use the same data quality dimension term for two different things or use two different data quality dimension terms for the same thing. Some of the troubling terms are:

Validity / Conformity – same same but different

Validity is most often used to describe if data filled in a data field obeys a required format or are among a list of accepted values. Databases are usually well in doing this like ensuring that an entered date has the day-month-year sequence asked for and is a date in the calendar or to cross check data values against another table and see if the value exist there.

The problems arise when data is moved between databases with different rules and when data is captured in textual forms before being loaded into a database.

Conformity is often used to describe if data adheres to a given standard, like an industry or international standard. This standard may due to complexity and other circumstances not or only partly be implemented as database constraints or by other means. Therefore, a given piece of data may seem to be a valid database value but not being in compliance with a given standard.

For example, the code value for a colour being “0,255,0” may be the accepted format and all elements are in the accepted range between 0 and 255 for a RGB colour code. But the standard for a given product colour may only allow the value “Green” and the other common colour names and “0,255,0” will when translated end up as “Lime” or “High green”.

Accuracy / Precision – true, false or not sure

The difference between accuracy and precision is a well-known statistical subject.

In the data quality realm accuracy is most often used to describe if the data value corresponds correctly to a real-world entity. If we for example have a postal address of the person “Robert Smith” being “123 Main Street in Anytown” this data value may be accurate because this person (for the moment) lives at that address.

But if “123 Main Street in Anytown” has 3 different apartments each having its own mailbox, the value does not, for a given purpose, have the required precision.

If we work with geocoordinates we have the same challenge. A given accurate geocode may have the sufficient precision to tell the direction to the nearest supermarket is, but not precise enough to know in which apartment the out-of-milk smart refrigerator is.

Timeliness / Currency – when time matters

Timeliness is most often used to state if a given data value is present when it is needed. For example, you need the postal address of “Robert Smith” when you want to send a paper invoice or when you want to establish his demographic stereotype for a campaign.

Currency is most often used to state if the data value is accurate at a given time – for example if “123 Main Street in Anytown” is the current postal address of “Robert Smith”.

Uniqueness / Duplication – positive or negative

Uniqueness is the positive term where duplication is the negative term for the same issue.

We strive to have uniqueness by avoiding duplicates. In data quality lingo duplicates are two (or more) data values describing the same real-world entity. For example, we may assume that

  • “Robert Smith at 123 Main Street, Suite 2 in Anytown”

is the same person as

  • “Bob Smith at 123 Main Str in Anytown”

Completeness / Existence – to be, or not to be

Completeness is most often used to tell in what degree all required data elements are populated.

Existence can be used to tell if a given dataset has all the needed data elements for a given purpose defined.

So “Bob Smith at 123 Main Str in Anytown” is complete if we need name, street address and city, but only 75 % complete if we need name, street address, city and preferred colour and preferred colour is an existent data element in the dataset.

More on data quality dimensions:

Human Errors and Data Quality

Every time there is a survey about what causes poor data quality the most ticked answer is human error. This is also the case in the Profisee 2019 State of Data Management Report where 58% of the respondents said that human error is among the most prevalent causes of poor data quality within their organization.

This topic was also examined some years ago in the post called The Internet of Things and the Fat-Finger Syndrome.

Errare humanum estEven the Romans knew this as Seneca the Younger said that “errare humanum est” which translates to “to err is human”. He also added “but to persist in error is diabolical”.

So, how can we not persist in having human errors in data then? Here are three main approaches:

  • Better humans: There is a whip called Data Governance. In a data governance regime you define data policies and data standards. You build an organizational structure with a data governance council (or any better name), have data stewards and data custodians (or any better title). You set up a business glossary. And then you carry on with a data governance framework.
  • Machines: Robotic Processing Automation (RPA) has, besides operational efficiency, the advantage of that machines, unlike humans, do not make mistakes when they are tired and bored.
  • Data Sharing: Human errors typically occur when typing in data. However, most data are already typed in somewhere. Instead of retyping data, and thereby potentially introduce your misspelling or other mistake, you can connect to data that is already digitalized and validated. This is especially doable for master data as examined in the article about Master Data Share.