Utilizing a knowledge graph has an overlap with Master Data Management (MDM).
If we go back 10 years MDM and Data Quality Management had a small niche discipline that was called (among other things) entity resolution as explored in the post Non-Obvious Entity Relationship Awareness. The aim of this was the same that today can be delivered in a much larger scale using knowledge graph technology.
During the past decade there have been examples of using graph technology for MDM as for example mentioned in the post Takeaways from MDM Summit Europe 2016. However, most attempts to combine MDM and graph have been to visualize the relationships in MDM using a graph presentation.
When utilizing knowledge graph approaches you will be able to detect many more relationships than those that are currently managed in MDM. This fact is the foundation for a successful co-existence between MDM and knowledge graph with these synergies:
MDM hubs can enrich knowledge graph with proven descriptions of the entities that are the nodes (vertices) in the knowledge graph.
Additional detected relationships (edges) and entities (nodes) from the knowledge graph that are of operational and/or general analytic interest enterprise wide can be proven and managed in MDM.
In this way you can create new business benefits from both MDM and knowledge graph.
In the round of presenting the solutions for The Disruptive MDM / PIM / DQM List 2022 the next vendor is Magnitude Software.
Magnitude Software has two solutions on the list:
Kalido MDM where you can define and model critical business information from any domain – customer, product, financial, vendor, supplier, location and more – to create and manage accurate, integrated, and governed data that business users trust.
Agility Multichannel PIM which has the capabilities to get products to market faster with a simple-to-use, comprehensive Product Information Management solution that makes it easy to support commerce across digital and traditional channels.
When you are going to implement data governance one key prerequisite is to work with a framework that outlines the key components of the implementation and ongoing program.
There are many frameworks available. A few are public while most are legacy frameworks provided by consultancy companies.
Anyway, the seven main components that you will (or should) see in a data governance framework are these:
Vision and mission: Formalizing a statement of the desired outcome, the business objectives to be reached and the scope covered.
Organization: Outlaying how the implementation and the continuing core team is to be organized, their mandate and job descriptions as well as outlaying the forums needed for business engagement.
Roles and responsibilities: Assigning the wider roles involved across the business often set in a RACI matrix with responsible, accountable, to be consulted and to be informed roles for data domains and the critical data elements within.
Business Glossary: Creation and maintenance of a list of business terms and their definitions that must be used to ensure the same vocabulary are used enterprise-wide when operating with and analyzing data.
Data Policies and Data Standards: Documentation of the overarching data policies enterprise-wide and for each data domain and the standards for the critical data elements within.
Data Quality Measurement: Identification of the key data quality indicators that support general key performance indicators in the business and the desired goals for these.
Data Innovation Roadmap: Forecasting the future need of new data elements and relationships to be managed to support key business drivers as for example digitalization and globalization.
Other common components in and around a data governance framework are the funding/business case, data management maturity assessment, escalation procedures and other processes.
What else have you seen or should be seen in a data governance framework?
One of the recurring entries on The Disruptive MDM/PIM/DQM List is Contentserv.
Contentserv operates under the slogan: Futurize your customers’ product experience.
Using Contentserv, you will be able to develop the groundbreaking product experiences your customers expect — across multiple channels. Contentserv help you unleash the potential of your product information, using our unique combination of advanced technologies.
Contetserv has combined multiple data management technologies in a single platform for controlling the total product experience. The platform facilitates collecting data from suppliers, enriching it into high-grade content, and then personalizing it for use in targeted marketing and promotions.
Product Information Management (PIM) has a sub discipline called Product Data Syndication (PDS).
While PIM basically is about how to collect, enrich, store and publish product information within a given organization, PDS is about how to share product information between manufacturers, merchants and marketplaces.
Marketplaces is the new kid on the block in this world. Amazon and Alibaba are the most known ones, however there are plenty of them internationally, within given product groups and nationally. Merchants can provide product information related to the goods they are selling on a marketplace. A disruptive force in the supply (or value) chain world is that today manufacturers can sell their goods directly on marketplaces and thereby leave out the merchants. It is though still only a fraction of trade that has been diverted this way.
Each marketplace has their requirements for how product information should be uploaded encompassing what data elements that are needed, the requested taxonomy and data standards as well as the data syndication method.
One way of syndicating (or synchronizing) data from manufacturers to merchants is going through a data pool. The most known one is the Global Data Synchronization Network (GDSN) operated by GS1 through data pool vendors, where 1WorldSync is the dominant one. In here trading partners are following the same classification, taxonomy and structure for a group of products (typically food and beverage) and their most common attributes in use in a given geography.
There are plenty of other data pools available emphasizing on given product groups either internationally or nationally. The concept here is also that everyone will use the same taxonomy and have the same structure and range of data elements available.
Product classifications can be used to apply the same data standards. GS1 has a product classification called GPC. Some marketplaces use the UNSPSC classification provided by United Nations and – perhaps ironically – also operated by GS1. Other classifications, that in addition encompass the attribute requirements too, are eClass and ETIM.
A manufacturer can have product information in an in-house ERP, MDM and/or PIM application. In the same way a merchant (retailer or B2B dealer) can have product information in an in-house ERP, MDM (Master Data Management) and/or PIM application. Most often a pair of manufacturer and merchant will not use the same data standard, taxonomy, format and structure for product information.
1-1 Product Data Syndication
Data pools have not substantially penetrated the product data flows encompassing all product groups and all the needed attributes and digital assets. Besides that, merchants also have a desire to provide unique product information and thereby stand out in the competition with other merchants selling the same products.
Thus, the highway in product data syndication is still 1-1 exchange. This highway has these lanes:
Exchanging spreadsheets typically orchestrated as that the merchant request the manufacturer to fill in a spreadsheet with the data elements defined by the merchant.
A supplier portal, where the merchant offers an interface to their PIM environment where each manufacturer can upload product information according to the merchant’s definitions.
A customer portal, where the manufacturer offers an interface where each merchant can download product information according to the manufacturer’s definitions.
A specialized product data syndication service where the manufacturer can push product information according to their definitions and the merchant can pull linked and transformed product information according to their definitions.
In practice, the chain from manufacturer to the end merchant may have several nodes being distributors/wholesalers that reloads the data by getting product information from an upstream trading partner and passing this product information to a downstream trading partner.
Data Quality Implications
Data quality is as always a concern when information producers and information consumers must collaborate, and in a product data syndication context the extended challenge is that the upstream producer and the downstream consumer does not belong to the same organization. This ecosystem wide data quality and Master Data Management (MDM) issue was examined in the post Watch Out for Interenterprise MDM.
Data fabric has been named a key strategic technology trend in 2022 by Gartner, the analyst firm.
According to Gartner, “by 2024, data fabric deployments will quadruple efficiency in data utilization while cutting human-driven data management tasks in half”.
Master Data Management (MDM) and data fabric are overlapping disciplines as examined in the post Data Fabric vs MDM. I have seen data strategies where MDM is put as a subset to data fabric and data strategies where they are separate tracks.
In my head, there is a common theme being data sharing.
Then there is a different focus, where data fabric seems to be focusing on data integration. MDM is also about data integration, but more about data quality. Data fabric takes care of all data while MDM obviously is about master data, though the coverage of business entities within MDM seems to be broadening.
Another term closely tied to data fabric – and increasingly with MDM as well – is knowledge graph. Knowledge graph is usually considered a mean to achieve a good state of data fabric. In the same way you can use a knowledge graph approach to achieve a good state of MDM when it comes to managing relationships – if you include a data quality facet.
Data quality dimensions are some of the most used terms when explaining why data quality is important, what data quality issues can be and how you can measure data quality. Ironically, we sometimes use the same data quality dimension term for two different things or use two different data quality dimension terms for the same thing. Some of the troubling terms are:
Validity / Conformity – same same but different
Validity is most often used to describe if data filled in a data field obeys a required format or are among a list of accepted values. Databases are usually well in doing this like ensuring that an entered date has the day-month-year sequence asked for and is a date in the calendar or to cross check data values against another table and see if the value exist there.
The problems arise when data is moved between databases with different rules and when data is captured in textual forms before being loaded into a database.
Conformity is often used to describe if data adheres to a given standard, like an industry or international standard. This standard may due to complexity and other circumstances not or only partly be implemented as database constraints or by other means. Therefore, a given piece of data may seem to be a valid database value but not being in compliance with a given standard.
Sometimes conformity is linked to the geography in question. For example a postal code will be conform depending on the country where the address is in. Therefore, a the postal code 12345 is conform in Germany, but not in United Kingdom.
In the data quality realm accuracy is most often used to describe if the data value corresponds correctly to a real-world entity. If we for example have a postal address of the person “Robert Smith” being “123 Main Street in Anytown” this data value may be accurate because this person (for the moment) lives at that address.
But if “123 Main Street in Anytown” has 3 different apartments each having its own mailbox, the value does not, for a given purpose, have the required precision.
If we work with geocoordinates we have the same challenge. A given accurate geocode may have the sufficient precision to tell the direction to the nearest supermarket is, but not precise enough to know in which apartment the out-of-milk smart refrigerator is.
Timeliness / Currency – when time matters
Timeliness is most often used to state if a given data value is present when it is needed. For example, you need the postal address of “Robert Smith” when you want to send a paper invoice or when you want to establish his demographic stereotype for a campaign.
Currency is most often used to state if the data value is accurate at a given time – for example if “123 Main Street in Anytown” is the current postal address of “Robert Smith”.
Uniqueness / Duplication – positive or negative
Uniqueness is the positive term where duplication is the negative term for the same issue.
We strive to have uniqueness by avoiding duplicates. In data quality lingo duplicates are two (or more) data values describing the same real-world entity. For example, we may assume that
“Robert Smith at 123 Main Street, Suite 2 in Anytown”
is the same person as
“Bob Smith at 123 Main Str in Anytown”
Completeness / Existence – to be, or not to be
Completeness is most often used to tell in what degree all required data elements are populated.
Existence can be used to tell if a given dataset has all the needed data elements for a given purpose defined.
So “Bob Smith at 123 Main Str in Anytown” is complete if we need name, street address and city, but only 75 % complete if we need name, street address, city and preferred colour and preferred colour is an existent data element in the dataset.
Data Quality Management
Master Data Management (MDM) solutions and specialized Data Quality Management (DQM) tools have capabilities to asses data quality dimensions and improve data quality within the different data quality dimensions.
Also, Tibco has acquired Information Builders and thus taken their position.
Again, this year Informatica is the most top-right positioned vendor. Good to know, as I am right now involved in some digital transformation programs where Informatica Data Quality (iDQ) is part of the technology stack.
You can get a free copy of the report from Ataccama here.
Multidomain MDM has moved on from the Trough of Disillusionment to climbing up the Slope of Enlightenment. I have been waiting for this to happen for 10 years – both in the hype cycle and in the real-world – since I founded the Multi-Domain MDM Group on LinkedIn back then.
Interinterprise MDM has swapped place with Cloud MDM, so this term is now ahead of Cloud MDM. It is though hard to imagine Interenterprise MDM without Cloud MDM, and MDM in the cloud will also according Gartner reach the the Plateau of Productivity before ecosystem wide MDM. The promise of this is also in accordance with a poll I made as told in the post Interenterprise MDM Will be Hot.
You can get the full report from the MDM consultancy parsionate here.