Data Quality and the Climate Issue

The similarities between getting awareness for data quality issues and the climate issue was touched 10 years ago here on this blog in the post Data Quality and Climate Politics.

The challenges are still the same.

There are many examples published where the results of climate change are pictured. A recent one is the image from Greenland showing huskies pulling sleds not over the usual ice, but through water.

Greenland-melting-ice-sheet-0613-01-exlarge-169

(Image taken by Steffen Malskær Olsen, @SteffenMalskaer, here published on CNN)

We also see statistics showing a development towards melting ice masses with rising sea levels as the foreseeable result. However, statistics can always be questioned. Is the ice thickening somewhere else? Has this happened many times before?

These kind of questions shows the layers we must go through getting from data quality to information quality, then decision quality and on top the wisdom in applying the right knowledge whether that is to achieve business outcomes or avoiding climate change.

DIKW data quality

 

Marathon, Spartathlon and Data Quality

Tomorrow there is a Marathon race in my home city Copenhagen. 8 years ago, a post on this blog revolved around some data quality issues connected with the Marathon race. The post was called How long is a Marathon?

Marathon
Pheidippides at the end of his Marathon race in a classic painting

However, another information quality issue is if there ever was a first Marathon race ran by Pheidippides? Historians toady do not think so. It has something to do with data lineage. The written mention of the 42.192 (or so) kilometre effort from Marathon to Athens by Pheidippides is from Plutarch whose records was made 500 years after the events. The first written source about the Battle of Marathon is from Herodotus. It was written (in historian perspective) only 40 years after the events. He did not mention the Marathon run. However, he wrote, that Pheidippides ran from Athens to Sparta. That is 245 kilometres.

By the way: His mission in Sparta was to get help. But the Spartans did not have time. They were in the middle of an SAP roll-out (or something similar festive).

Some people make the 245-kilometre track in what is called a Spartathlon. In data and information quality context this reminds me that improving data quality and thereby information quality is not a sprint. Not even a Marathon. It is a Spartathlon.

 

Machine Learning, Artificial Intelligence and Data Quality

Using machine learning (ML) and then artificial intelligence (AI) to automate business processes is a hot topic and on the wish list at most organizations. However, many, including yours truly, warn that automating business processes based on data with data quality issues is a risky thing.

In my eyes we need to take a phased approach and double use ML and AI to ensure the right business outcomes from AI automated business processes. ML and AI can be used to rationalize data and overcome data quality issues as exemplified in the post The Art in Data Matching.

Instead of applying ML and AI using a dirty dataset at hand for a given business process, the right way will be to use ML and AI to understand and asses relevant datasets within the organization and then use thereon rationalized data to be understood my machines and used for sustainable automation of business processes.

ML AI DQ

Most of these rationalized data will be master data, where there is a movement to include ML and AI in Master Data Management solutions by forward looking vendors as examined in the post Artificial Intelligence (AI) and Master Data Management (MDM).

Looking at The Data Quality Tool World with Different Metrics

The latest market report on data quality tools from Information Difference is out. In the introduction to the data quality landscape Q1 2019 this example of the consequences of  a data quality issue is mentioned: “Christopher Columbus accidentally landed in America when he based his route on calculations using the shorter 4,856 foot Roman mile rather than the 7,091 foot Arabic mile of the Persian geographer that he was relying on.”.

Information Difference has the vendors on the market plotted this way:

Information Difference DQ Landscape Q1 2019

As reported in the post Data Quality Tools are Vital for Digital Transformation also Gartner recently published a market report with vendor positions. The two reports are, in terms on evaluating vendors, like Roman and Arabic miles. Same same but different and may bring you to a different place depending on which one you choose to use.

Vendors evaluated by Information Difference but not Gartner are veteran solution providers Melissa and Datactics. On the other side Gartner has evaluated for example Talend, Information Builders and Ataccama. Gartner has a more spread out evaluation than Information Difference, where most vendors are equal.

PS: If you need any help in your journey across the data quality world, here are some Popular Offerings.

Data Quality Tools are Vital for Digital Transformation

The Gartner Magic Quadrant for Data Quality Tools 2019 is out. It will take you 43 minutes to read through, so let me provide a short overview.

Gartner says that “data quality tools are vital for digital business transformation, especially now that many have emerging features like automation, machine learning, business-centric workflows and cloud deployment models.”

The data quality software tools market was at 1.61 billion USD in 2017 which was an increase of 11.6% compared to 2016.

Gartner sees that end-user demand is shifting toward having broader capabilities spanning data management and information governance. Therefore, the data quality tool market continues to interact closely with the markets for data integration tools and for Master Data Management (MDM) products.

Among the capabilities mentioned is multidomain support meaning capabilities covering all the specific data subject areas, such as customer, product, asset and location. Interestingly Gartner continues to focus on customer as the one of several party data domains out there. In my experience, there are the same data quality challenges with vendor and other business partner data as well as with employee data.

According to Gartner, data quality tool vendors are competing to address shifting market requirements by introducing an array of new technologies, such as machine learning, interactive visualization and predictive/prescriptive analytics, all of which they are embedding in data quality tools. They are, according to Gartner, also offering new pricing models, based on open source and subscriptions.

The vendors included in the quadrant are positioned as seen below:

Gartner DQ 2019

If you want a full copy of the report you can, against providing your personal data, get it from Information Builders here.

Data Quality Dimensions in Motion

For the fifth year Dan Myers of DQMatters is making an Annual Dimensions of Data Quality Survey.

There are some very interesting findings when looking at the trend in the previous years surveys as seen in the figure below.

Data Quality Dimensions 2015 to 2018

Among the data quality dimensions included in this survey we see that the use of consistency, validity and not at least completeness has increased significantly over these years.

The possible use of consistency and completeness was examined here on the blog in the post Multi-Domain MDM and Data Quality Dimensions. Another dimension included in this post was uniqueness, which is a frequently addressed data quality dimension for customer master data in the quest of fighting duplicates in databases around.

You can now be part of the 2019 Annual Dimensions of Data Quality Survey here.

Several Sources of Truth about MDM / PIM Solutions

The previous post on this blog was about Forrester vs Gartner on MDM/PIM. This post was about who is recognized as a major Master Data Management (MDM) / Product Information Management (PIM) solution vendor by the analyst firm Forrester versus who is recognized as a major MDM solution provider by the analyst firm Gartner.

MDM Truths

Now, let us have a look into how the individual solution providers are ranked in either the same way or differently by these major analyst firms spiced with my humble take on where this will be going. In the cause of brevity, I will focus on vendors positioned by Forrester as an MDM /PIM leader or strong performer or by Gartner as an MDM leader, visionary or challenger.

Informatica is an MDM leader both with Forrester and Gartner. When it comes to PIM Forrester has Informatica a little behind the leaders and back in the days when Gartner had specific customer MDM and product MDM quadrants, Informatica did better in customer MDM versus product MDM. Informatica has strengthened their grip on customer MDM with the recent AllSight acquisition. Will be interesting to see what moves Informatica will take to catching up on the product MDM / PIM battle ground and thus consolidating their multidomain MDM leadership.

Orchestra Networks who was recently acquired by Tibco is a leader in the eyes of Gartner but a bit less prominent positioned as a strong performer in the eyes of Forrester. The question asked on the market is if Tibco, against how earlier acquisitions turned out, will be able uphold Orchestra’s position as examined in the post Tibco, Orchestra and Netrics.

Reltio is a leader in the Forrester wave but still a niche player in the Gartner quadrant. This may say more about Forrester versus Gartner than about Reltio. Forrester seems to focus more on where the market is going while Gartner emphasizes on where the market has gone.

Riversand is a strong MDM and PIM performer at Forrester and a visionary in the Gartner quadrant. Perhaps Gartner sees a bit more on the vision side and Forrester a bit more on the offering side, but all in all the two analyst firms seems to be in agreement about Riversand. I think Riversand is on a good track.

SAP is a strong performer in the Forrester wave and a strong challenger according to Gartner. A lot of SAP ECC clients have and will choose the SAP MDG offering based on IT landscape simplification considerations. The Forrester PIM wave has SAP trailing the other solutions, which corresponds with my impression, which is that the SAP Hybris offering is struggling with really being a PIM solution.

Semarchy just made a high jump into the Gartner MDM quadrant challenger zone and according to Forrester they have the strongest MDM strategy possible. No doubt about that Semarchy is going in the fast track.

Profisee just moved up from niche player to challenger in the latest Gartner MDM quadrant. However, they were not included in the Forrester MDM wave. In my eyes, Profisee belongs among the major MDM solution providers.

Stibo Sytems is a challenger in the Gartner MDM quadrant. Forrester has Stibo Systems as a PIM leader but less prominent as an MDM contender. Stibo Systems has been on the same track as Riversand going from being a PIM vendor to become a multidomain MDM vendor. Perhaps because they are self-funded, versus Riversand being funded from outside, their tracks seem different.

IBM hangs on as a challenger in the Gartner MDM quadrant. Forrester only have IBM as a contender both for MDM and PIM. Nevertheless, large companies, not at least in the financial sector, will continue to rely on IBM also when it comes to MDM.

EnterWorks is a PIM leader and also an MDM leader according to Forrester. According to Gartner they are still a niche player in MDM. Recently EnterWorks joined forces with WinShuttle as told in the post The Recent Coupling on the MDM Market. It is not unlikely, that the Forrester view and the Gartner view will be aligned in the future.

Pitney Bowes is a strong performer in the latest Forrester wave sliding a bit from being a leader two years ago. They are not included in the Gartner quadrant. Pitney Bowes need to promote themselves as an MDM vendor and come up with new stuff to remain a major player on the MDM market.

Magnitude Software, who’s MDM solution was formerly known as Kalido, has moved up from contender to be a strong performer in the Forrester Wave. They are not included in the Gartner quadrant. Will be exciting to see if Magnitude Software can reignite the momentum Kalido had back in the first MDM years. Agility Multichannel is a part of Magnitude Software and a strong PIM performer at Forrester – and in my eyes too.

Contentserv is a PIM leader on the Forrester wave. Contentserv is also an MDM niche player on the Gartner quadrant. inRiver and Salsify are strong PIM performers on the Forrester wave but not big enough (and perhaps not MDM focussed enough) to be on the Gartner MDM quadrant.

PS: You can learn more about many of solutions mentioned here – and some more – on The Disruptive Master Data Management Solutions List.

Governing Product Information

The title of this blog post is also the title of a presentation I will do at the 2019 Data Governance and Information Quality Conference in San Diego, US in June.

There is a little difference between how we can exercise data governance and information quality management when we are handling data about products versus handling the most common data domain being party data (customer, vendor/supplier, employee and other roles).

Multi-Domain MDM and Data Quality DimensionsThis topic was touched here on the blog in the post called Data Quality for the Product Domain vs the Party Domain.

The conference session will go through these topics:

  • Product master data vs. product information
  • How Master Data Management (MDM), Product Information Management (PIM) and Digital Asset Management (DAM) stick together
  • The roles of 1st party data, 2nd party data and 3rd party data in MDM, PIM and DAM
  • Business ecosystem wide product data management
  • Cross company data governance and information quality alignment

You can have a look at the full agenda for the DGIQ 2019 Conference here.

dgiq 2019