Marathon, Spartathlon and Data Quality

Tomorrow there is a Marathon race in my home city Copenhagen. 8 years ago, a post on this blog revolved around some data quality issues connected with the Marathon race. The post was called How long is a Marathon?

Marathon
Pheidippides at the end of his Marathon race in a classic painting

However, another information quality issue is if there ever was a first Marathon race ran by Pheidippides? Historians toady do not think so. It has something to do with data lineage. The written mention of the 42.192 (or so) kilometre effort from Marathon to Athens by Pheidippides is from Plutarch whose records was made 500 years after the events. The first written source about the Battle of Marathon is from Herodotus. It was written (in historian perspective) only 40 years after the events. He did not mention the Marathon run. However, he wrote, that Pheidippides ran from Athens to Sparta. That is 245 kilometres.

By the way: His mission in Sparta was to get help. But the Spartans did not have time. They were in the middle of an SAP roll-out (or something similar festive).

Some people make the 245-kilometre track in what is called a Spartathlon. In data and information quality context this reminds me that improving data and thereby information quality is not a sprint. Not even a Marathon. It is a Spartathlon.

 

Artificial Intelligence (AI) and Multienterprise MDM

The previous post on this blog was called Machine Learning, Artificial Intelligence and Data Quality. In here the it was examined how Artificial Intelligence (AI) is impacted by data quality and how data quality can impact AI.

Master Data Management (MDM) will play a crucial role in sustaining the needed data quality for AI and with the rise of digital transformation encompassing business ecosystems we will also see an increasing need for ecosystem wide MDM – also called multienterprise MDM.

Right now, I am working with a service called Product Data Lake where we strive to utilize AI including using Machine Learning (ML) to understand and map data standards and exchange formats used within product information exchange between trading partners.

The challenge in this area is that we have many different classification systems in play as told in the post Five Product Classification Standards. Besides the industry and cross sector standards we still have many homegrown standards as well.

Some of these standards (as eClass and ETIM) also covers standards for the attributes needed for a given product classification, but still, we have plenty of homegrown standards (at no standards) for attribute requirements as well.

Add to that the different preferences for exchange methods and we got a chaotic system where human intervention makes Sisyphus look like a lucky man. Therefore, we have great expectations about introducing machine learning and artificial intelligence in this space.

AI ML PDL

Next week, I will elaborate on the multienterprise MDM and artificial theme on the Master Data Management Summit Europe in London.

Machine Learning, Artificial Intelligence and Data Quality

Using machine learning (ML) and then artificial intelligence (AI) to automate business processes is a hot topic and on the wish list at most organizations. However, many, including yours truly, warn that automating business processes based on data with data quality issues is a risky thing.

In my eyes we need to take a phased approach and double use ML and AI to ensure the right business outcomes from AI automated business processes. ML and AI can be used to rationalize data and overcome data quality issues as exemplified in the post The Art in Data Matching.

Instead of applying ML and AI using a dirty dataset at hand for a given business process, the right way will be to use ML and AI to understand and asses relevant datasets within the organization and then use thereon rationalized data to be understood my machines and used for sustainable automation of business processes.

ML AI DQ

Most of these rationalized data will be master data, where there is a movement to include ML and AI in Master Data Management solutions by forward looking vendors as examined in the post Artificial Intelligence (AI) and Master Data Management (MDM).

A Master Data Mind Map

Please find below a mind map with some of the data elements that are considered to be master data.

Master Data Mind Map

The map is in no way exhaustive and if you feel some more very important and common data elements should be there, please comment.

The data elements are grouped within the most common master data domains being party master data, product master data and location master data.

Some of the data elements have previously been examined in posts on this blog. This include:

The mind map has a selection of flags around where master data are geographically dependent. Again, this is not exhaustive. If you have examples of diversities within master data, please also comment.

Solutions for Handling Product Master Data and Digital Assets

There are three kinds of solutions for handling product master data and related digital assets:

  • Master Data Management (MDM) solutions that are either focussed on product master data or being a multi-domain MDM solution covering the product domain as well as the party domain, the location domain, the asset domain and more.
  • Product Information Management (PIM) solutions.
  • Digital Asset Management (DAM) solutions.

According to Gartner Analyst Simon Walker a short distinction is:

  • MDM of product master data solutions help manage structured product data for enterprise operational and analytical use cases
  • PIM solutions help extend structured product data through the addition of rich product content for sales and marketing use cases
  • DAM solutions help users create and manage digital multimedia files for enterprise, sales and marketing use cases

The below figure shows what kind of data that is typically included in respectively an MDM solution, a PIM solution and/or a DAM solution.

MDM PIM DAM

This is further elaborated in the post How MDM, PIM and DAM Stick Together.

The solution vendors have varying offerings going from being best-of-breed in one of the three categories to offering a OneStopShopping solution for all disciplines.

If you are to compile a list of suitable and forward-looking solutions for MDM, PIM and/or DAM for your required mix, you can start looking at The Disruptive List of MDM/PIM/DAM solutions.

Looking at The Data Quality Tool World with Different Metrics

The latest market report on data quality tools from Information Difference is out. In the introduction to the data quality landscape Q1 2019 this example of the consequences of  a data quality issue is mentioned: “Christopher Columbus accidentally landed in America when he based his route on calculations using the shorter 4,856 foot Roman mile rather than the 7,091 foot Arabic mile of the Persian geographer that he was relying on.”.

Information Difference has the vendors on the market plotted this way:

Information Difference DQ Landscape Q1 2019

As reported in the post Data Quality Tools are Vital for Digital Transformation also Gartner recently published a market report with vendor positions. The two reports are, in terms on evaluating vendors, like Roman and Arabic miles. Same same but different and may bring you to a different place depending on which one you choose to use.

Vendors evaluated by Information Difference but not Gartner are veteran solution providers Melissa and Datactics. On the other side Gartner has evaluated for example Talend, Information Builders and Ataccama. Gartner has a more spread out evaluation than Information Difference, where most vendors are equal.

PS: If you need any help in your journey across the data quality world, here are some Popular Offerings.

Who is on The Disruptive MDM / PIM List?

The Disruptive Master Data Management Solutions List is a sister site to this blog. This site is aimed to be a list of available:

  • Master Data Management (MDM) solutions
  • Customer Data Integration (CDI) solutions
  • Product Information Management (PIM) solutions
  • Digital Asset Management (DAM) solutions.

You can use this site as an alternative to the likes of Gartner, Forrester, MDM Institute and others when selecting a MDM / CDI / PIM / DAM solution, not at least because this site will include both larger and smaller disruptive MDM solutions.

Vendors can register their solutions here and the crowd, being processional users, can review the solutions.

So far these solutions have been listed:

Reltio thumb

Reltio provides all the benefits of cloud like simplicity, scale, and security. On top of that, Reltio breaks down data silos by providing a unified data set with personalized views of data across departments like sales, marketing and compliance. Learn more about Reltio Cloud here.

thumbnailRiversand is an innovative global pioneer in information management. The powerful MDM, PIM and DAM solution help enterprises to transform their raw data into an engine of growth by making data usable, useful and meaningful. Learn more about Riversand here.

Semarchy IconSemarchy xDM is a platform that enables Intelligent MDM and Collaborative Data Governance. It leverages smart algorithms, an agile design, and scales to meet enterprise complexity with solid ROI. Learn more about Semarchy xDM here.

Contentserv thumbContentserv offers a real-time Product Experience Platform being recognized and recommended by international analysts as one of the top worldwide innovators and strong performers in the PIM & MDM space. Learn more about Contentserv here.

ewEnterWorks, which recently was joint with Winshuttle is a multi-domain master data solution for acquiring, managing and transforming a company’s multi-domain master data into persuasive and personalized content for marketing, sales, digital commerce and new market opportunities. Learn about Enterworks here.

SyncForce-plus-icon

SyncForce helps international consumer & professional packaged goods manufacturers realize Epic Availability. With SyncForce, your product portfolio is digitally available with a click of a button, in every shape and form, both internal and external. Learn about SyncForce here.

Dynamicweb thumb

Dynamicweb PIM brings you fewer applications, integrations and systems. It is fast and inexpensive to implement and maintain, because it is part of an all-in-one platform for omni-channel commerce. Learn more about Dynamicweb PIM here.

Agility thumbAgility® empowers marketers to acquire, enrich and deliver accurate and timely product content through every touchpoint, channel and region along with the analytical support required to maximize effectiveness in the market. Learn more about Agility here.

Magnitude thumbMagnitude Software’s Master Data Management solution offers enterprises the core capabilities to model multiple data entities, harmonize the data sources and manage governance processes for reference data and master data. Learn more about Magnitude MDM here.

AllsightAllSight, which is now a part of Informatica, is using state-of-the-art AI-driven technology in an MDM and Customer 360 solution. AllSight matches and links all customer data and provides multiple views of the customer for different users.

Smallest

Product Data Lake, which is affiliated to this blog, is a cloud service for sharing product master data in the business ecosystems of manufacturers, distributors, merchants, marketplaces and large end users of product information. Learn more about Product Data Lake here.

Disruptive MDM M and A

 

The Need for Speed in Product Information Flow

One of the bottlenecks in Product Information Management (PIM) is getting product data ready for presentation to the buying audience as fast as possible.

Product data travels a long way from the origin at the manufacturing company, perhaps through distributors and wholesalers to the merchant or marketplace. In that journey the data undergo transformation (and translation) from the state it has at the producing organization to the state chosen by the selling organization.

However, time to market is crucial. This applies to when a new product range is chosen by the merchant or when there are changes and improvements at the manufacturer.

At Product Data Lake we enable a much faster pace in these quests than when doing this by using emails, spreadsheets and passive portals.

Take two minutes to test if your company is exchanging product data at the speed of a cheetah or a garden snail.

Cheetah

To use Excel or not to use Excel in Product Information Management?

Excel is used heavily throughout data management and this is true for Product Information Management (PIM) too.

The reason of being for PIM solutions is often said to be to eliminate the use of spreadsheets. However, PIM solutions around have functionality to co-exist with spreadsheets, because spreadsheets are still a fact of life.

This is close to me as I have been working on a solution to connect PIM solutions (and other solutions for handling product data) between trading partners. This solution is called Product Data Lake.

Our goal is certainly also to eliminate the use of spreadsheets in exchanging product information between trading partners. However, as an intermediate state we must accept that spreadsheets exists either as the replacement of PIM solutions or because PIM solutions does not (yet) fulfill all purposes around product information.

So, consequently we have added a little co-existence with Excel spreadsheets in today´s public online release of Product Data Lake version 1.10.

PDL version 1 10

The challenge is that product information is multi-dimensional as we for example have products and their attributes typically represented in multiple languages. Also, each product group has its collection of attributes that are relevant for that group of products.

Spreadsheets are basically two dimensional – rows and columns.

In Product Data Lake version 1.10 we have included a data entry sheet that mirrors spreadsheets. You can upload a two-dimensional spreadsheet into a given product group and language, and you can download that selection into a spreadsheet.

This functionality can typically be used by the original supplier of product information – the manufacturer. This simple representation of data will then be part of the data lake organisation of varieties of product information supplemented by digital assets, product relationships and much more.