What’s in a Data Governance Framework?

When you are going to implement data governance one key prerequisite is to work with a framework that outlines the key components of the implementation and ongoing program.

There are many frameworks available. A few are public while most are legacy frameworks provided by consultancy companies.

Anyway, the seven main components that you will (or should) see in a data governance framework are these:

  • Vision and mission: Formalizing a statement of the desired outcome, the business objectives to be reached and the scope covered.
  • Organization: Outlaying how the implementation and the continuing core team is to be organized, their mandate and job descriptions as well as outlaying the forums needed for business engagement.
  • Roles and responsibilities: Assigning the wider roles involved across the business often set in a RACI matrix with responsible, accountable, to be consulted and to be informed roles for data domains and the critical data elements within.
  • Business Glossary: Creation and maintenance of a list of business terms and their definitions that must be used to ensure the same vocabulary are used enterprise-wide when operating with and analyzing data.
  • Data Policies and Data Standards: Documentation of the overarching data policies enterprise-wide and for each data domain and the standards for the critical data elements within.
  • Data Quality Measurement: Identification of the key data quality indicators that support general key performance indicators in the business and the desired goals for these.
  • Data Innovation Roadmap: Forecasting the future need of new data elements and relationships to be managed to support key business drivers as for example digitalization and globalization.

Other common components in and around a data governance framework are the funding/business case, data management maturity assessment, escalation procedures and other processes.

What else have you seen or should be seen in a data governance framework?   

The Disruptive MDM/PIM/DQM List 2022: Contentserv

One of the recurring entries on The Disruptive MDM/PIM/DQM List is Contentserv.

Contentserv operates under the slogan: Futurize your customers’ product experience.

Using Contentserv, you will be able to develop the groundbreaking product experiences your customers expect — across multiple channels. Contentserv help you unleash the potential of your product information, using our unique combination of advanced technologies.

Contetserv has combined multiple data management technologies in a single platform for controlling the total product experience. The platform facilitates collecting data from suppliers, enriching it into high-grade content, and then personalizing it for use in targeted marketing and promotions.

Learn more about the Contentserv Product Experience Platform here.

PS: You can also find some compelling success stories from Contentserv on the Case Study List here.

What is Product Data Syndication (PDS)?

Product Information Management (PIM) has a sub discipline called Product Data Syndication (PDS).

While PIM basically is about how to collect, enrich, store and publish product information within a given organization, PDS is about how to share product information between manufacturers, merchants and marketplaces.

Marketplaces

Marketplaces is the new kid on the block in this world. Amazon and Alibaba are the most known ones, however there are plenty of them internationally, within given product groups and nationally. Merchants can provide product information related to the goods they are selling on a marketplace. A disruptive force in the supply (or value) chain world is that today manufacturers can sell their goods directly on marketplaces and thereby leave out the merchants. It is though still only a fraction of trade that has been diverted this way.

Each marketplace has their requirements for how product information should be uploaded encompassing what data elements that are needed, the requested taxonomy and data standards as well as the data syndication method.

Data Pools

One way of syndicating (or synchronizing) data from manufacturers to merchants is going through a data pool. The most known one is the Global Data Synchronization Network (GDSN) operated by GS1 through data pool vendors, where 1WorldSync is the dominant one. In here trading partners are following the same classification, taxonomy and structure for a group of products (typically food and beverage) and their most common attributes in use in a given geography.

There are plenty of other data pools available emphasizing on given product groups either internationally or nationally. The concept here is also that everyone will use the same taxonomy and have the same structure and range of data elements available.

Data Standards

Product classifications can be used to apply the same data standards. GS1 has a product classification called GPC. Some marketplaces use the UNSPSC classification provided by United Nations and – perhaps ironically – also operated by GS1. Other classifications, that in addition encompass the attribute requirements too, are eClass and ETIM.

A manufacturer can have product information in an in-house ERP, MDM and/or PIM application. In the same way a merchant (retailer or B2B dealer) can have product information in an in-house ERP, MDM (Master Data Management) and/or PIM application. Most often a pair of manufacturer and merchant will not use the same data standard, taxonomy, format and structure for product information.

1-1 Product Data Syndication

Data pools have not substantially penetrated the product data flows encompassing all product groups and all the needed attributes and digital assets. Besides that, merchants also have a desire to provide unique product information and thereby stand out in the competition with other merchants selling the same products.

Thus, the highway in product data syndication is still 1-1 exchange. This highway has these lanes:

  • Exchanging spreadsheets typically orchestrated as that the merchant request the manufacturer to fill in a spreadsheet with the data elements defined by the merchant.
  • A supplier portal, where the merchant offers an interface to their PIM environment where each manufacturer can upload product information according to the merchant’s definitions.
  • A customer portal, where the manufacturer offers an interface where each merchant can download product information according to the manufacturer’s definitions.
  • A specialized product data syndication service where the manufacturer can push product information according to their definitions and the merchant can pull linked and transformed product information according to their definitions.

In practice, the chain from manufacturer to the end merchant may have several nodes being distributors/wholesalers that reloads the data by getting product information from an upstream trading partner and passing this product information to a downstream trading partner.

Data Quality Implications

Data quality is as always a concern when information producers and information consumers must collaborate, and in a product data syndication context the extended challenge is that the upstream producer and the downstream consumer does not belong to the same organization. This ecosystem wide data quality and Master Data Management (MDM) issue was examined in the post Watch Out for Interenterprise MDM.

The Disruptive MDM/PIM/DQM List 2022: Datactics

A major rework of The Disruptive MDM/PIM/DQM List is in the making as the number of visitors keep increasing and so do the number of requests for individual solution lists.

It is good to see that some of the most innovative solution providers commit to be part of the list also next year.

One of those is Datactics.

Datactics is a veteran data quality solution provider who is constantly innovating in this space. This year Datactics was one of the rare new entries in The Gartner Magic Quadrant for Data Quality Solutions 2021.

It will be exciting to follow the ongoing development at Datactics, who is operating under the slogan: “Democratising Data Quality”.

You can learn more about how their self-service data quality and matching solution looks like here.

Core Datactics capabilities

Data Fabric and Master Data Management

Data fabric has been named a key strategic technology trend in 2022 by Gartner, the analyst firm.

According to Gartner, “by 2024, data fabric deployments will quadruple efficiency in data utilization while cutting human-driven data management tasks in half”.

Master Data Management (MDM) and data fabric are overlapping disciplines as examined in the post Data Fabric vs MDM. I have seen data strategies where MDM is put as a subset to data fabric and data strategies where they are separate tracks.

In my head, there is a common theme being data sharing.

Then there is a different focus, where data fabric seems to be focusing on data integration. MDM is also about data integration, but more about data quality. Data fabric takes care of all data while MDM obviously is about master data, though the coverage of business entities within MDM seems to be broadening.

Another term closely tied to data fabric – and increasingly with MDM as well – is knowledge graph. Knowledge graph is usually considered a mean to achieve a good state of data fabric. In the same way you can use a knowledge graph approach to achieve a good state of MDM when it comes to managing relationships – if you include a data quality facet.

What is your take on data fabric and MDM?

Five Pairs of Data Quality Dimensions

Data quality dimensions are some of the most used terms when explaining why data quality is important, what data quality issues can be and how you can measure data quality. Ironically, we sometimes use the same data quality dimension term for two different things or use two different data quality dimension terms for the same thing. Some of the troubling terms are:

Validity / Conformity – same same but different

Validity is most often used to describe if data filled in a data field obeys a required format or are among a list of accepted values. Databases are usually well in doing this like ensuring that an entered date has the day-month-year sequence asked for and is a date in the calendar or to cross check data values against another table and see if the value exist there.

The problems arise when data is moved between databases with different rules and when data is captured in textual forms before being loaded into a database.

Conformity is often used to describe if data adheres to a given standard, like an industry or international standard. This standard may due to complexity and other circumstances not or only partly be implemented as database constraints or by other means. Therefore, a given piece of data may seem to be a valid database value but not being in compliance with a given standard.

Sometimes conformity is linked to the geography in question. For example a postal code will be conform depending on the country where the address is in. Therefore, a the postal code 12345 is conform in Germany, but not in United Kingdom.

Accuracy / Precision – true, false or not sure

The difference between accuracy and precision is a well-known statistical subject.

In the data quality realm accuracy is most often used to describe if the data value corresponds correctly to a real-world entity. If we for example have a postal address of the person “Robert Smith” being “123 Main Street in Anytown” this data value may be accurate because this person (for the moment) lives at that address.

But if “123 Main Street in Anytown” has 3 different apartments each having its own mailbox, the value does not, for a given purpose, have the required precision.

If we work with geocoordinates we have the same challenge. A given accurate geocode may have the sufficient precision to tell the direction to the nearest supermarket is, but not precise enough to know in which apartment the out-of-milk smart refrigerator is.

Timeliness / Currency – when time matters

Timeliness is most often used to state if a given data value is present when it is needed. For example, you need the postal address of “Robert Smith” when you want to send a paper invoice or when you want to establish his demographic stereotype for a campaign.

Currency is most often used to state if the data value is accurate at a given time – for example if “123 Main Street in Anytown” is the current postal address of “Robert Smith”.

Uniqueness / Duplication – positive or negative

Uniqueness is the positive term where duplication is the negative term for the same issue.

We strive to have uniqueness by avoiding duplicates. In data quality lingo duplicates are two (or more) data values describing the same real-world entity. For example, we may assume that

  • “Robert Smith at 123 Main Street, Suite 2 in Anytown”

is the same person as

  • “Bob Smith at 123 Main Str in Anytown”

Completeness / Existence – to be, or not to be

Completeness is most often used to tell in what degree all required data elements are populated.

Existence can be used to tell if a given dataset has all the needed data elements for a given purpose defined.

So “Bob Smith at 123 Main Str in Anytown” is complete if we need name, street address and city, but only 75 % complete if we need name, street address, city and preferred colour and preferred colour is an existent data element in the dataset.

Data Quality Management 

Master Data Management (MDM) solutions and specialized Data Quality Management (DQM) tools have capabilities to asses data quality dimensions and improve data quality within the different data quality dimensions.

Check out the range of the best solutions to cover this space on The Disruptive MDM &PIM &DQM List.

Few Movements in the Gartner Magic Quadrant for Data Quality Solutions 2021

The new Gartner® Magic Quadrant™ for Data Quality Solutions is out.

There are only few movements in this quadrant compared to the previous quadrant which was examined in the post From Where Will the Data Quality Machine-Learning Disruption Come?

With vendor positioning some movements are:

  • Ataccama has crossed the line into the leaders quadrant
  • Syniti has become a visionary
  • Datactics has entered the quadrant

Also, Tibco has acquired Information Builders and thus taken their position.

Again, this year Informatica is the most top-right positioned vendor. Good to know, as I am right now involved in some digital transformation programs where Informatica Data Quality (iDQ) is part of the technology stack.

You can get a free copy of the report from Ataccama here.

MDM Terms on the Move in the Gartner Hype Cycle

The latest Gartner Hype Cycle for Data and Analytics Governance and Master Data Management includes some of the MDM trends that have been touched here on the blog.

If we look at the post peak side, there are these five terms in motion:

  • Single domain MDM represented by the two most common domains being MDM of Product Data and MDM of Customer Data.
  • Multidomain MDM.
  • Interenterprise MDM, which before was coined Multienterprise MDM by Gartner and as I like to coin Ecosystem Wide MDM.
  • Data Hub Strategy which I like to coin Extended MDM.
  • Cloud MDM.
Source: Gartner

The hype cycle from last year was examined in the post MDM Terms in Use in the Gartner Hype Cycle.

Compared to last year this has happened to MDM:

  • Multidomain MDM has moved on from the Trough of Disillusionment to climbing up the Slope of Enlightenment. I have been waiting for this to happen for 10 years – both in the hype cycle and in the real-world – since I founded the Multi-Domain MDM Group on LinkedIn back then.
  • Interinterprise MDM has swapped place with Cloud MDM, so this term is now ahead of Cloud MDM. It is though hard to imagine Interenterprise MDM without Cloud MDM, and MDM in the cloud will also according Gartner reach the the Plateau of Productivity before ecosystem wide MDM. The promise of this is also in accordance with a poll I made as told in the post Interenterprise MDM Will be Hot.

You can get the full report from the MDM consultancy parsionate here.

The Forrester Data Governance Wave 2021

Solutions for data governance are still rare. However, more and more organizations are looking for the technology part of the data governance discipline to underpin the else predominant people and process part of this challenge.

The Forrester Data Governance Wave 2021 is a list of solutions for data governance. As rightfully stated in the report: “Organizations have an ever-increasing appetite to leverage their data for business advantage, either through internal collaboration, data sharing across ecosystems, direct commercialization, or as the basis for AI-driven business decision-making. While doing so, organizations must take care to maintain employee, partner, and customer trust in their approach of leveraging data (and technology fueled by data). This requires data governance and data governance solutions to step up once again and enable data-driven businesses to leverage their data responsibly, ethically, compliantly, and accountably.”

The wave looks like this:

The solutions included seems to be a mix of data governance pure players, data privacy and data protection specialists and more general data management solution providers.

Erwin has been better known for their data modelling technology, which they still do also.

Infogix was acquired by Precisely recently and as they also recently have acquired PIM/MDM technology, the Infogix solution may become part of a wider stack.

Ataccama is also a recognized MDM and Data Quality Tool vendor.

Not surprisingly Informatica is missing from the list as Informatica and Forrester seem to have dysfunctional relationship. I think the list is incomplete without Informatica – and IBM as well, though they do all the other data management stuff too. Like SAP who is in there.

You can, against your Personal Identifiable Information, get a free copy of the report from Ataccama here.