Liliendahl on Data Quality

Interenterprise Metadata Handling

8th May 2018Henrik Gabs LiliendahlLeave a comment

I am aware that the title of this blog post is a bit geeky.

However, the terms are important in data management and your organisations ability to prosper in a continuously data driven world.

The term interenterprise was part of the previous post on this blog. The post was called What is Interenterprise Data Sharing? In a comment on LinkedIn analyst Simon Walker of Gartner explained interenterprise data sharing this way: “Interenterprise data sharing = Organizations are increasingly required to provide data to, and receive data from, external trading partners (customers, suppliers, business partners and others).”

Metadata is data about data. Handling metadata is an important facet of data management including in data governance, data quality management and Master Data Management (MDM). When it comes to the new trends in data management as big data and handling data in data lakes, the importance of metadata management will in my eyes become even more obvious.

In a current venture (Product Data Lake) we are working on building in metadata management for business ecosystems, meaning that trading partners can share product information either using the same metadata or linking their different metadata.

Using international, national and industry standards for product information will be the perfect solution within business ecosystem sharing of metadata and indeed this is the preferred option we support. However, there are many competing standards for product information and they come in developing versions, so having everyone on the same page at the same time is quite utopic.

Add to that everyone do not speak English – and even not the same variant of English. Metadata originates and should exist in the languages that is used in trading partnerships.

In Product Data Lake we have started out with these principles:

Product attributes can be tagged with an attribute type telling about what standard (if any) in terms of product identification, product classification or product feature it adheres to. More about that in the post Connecting Product Information.
Attribute short and long descriptions can be represented in different languages.
Trading partners can link their product attributes and have visibility in the Product Data Lake of the standards and descriptions used in the different languages they exist.

pdl-how-2

What is Interenterprise Data Sharing?

4th May 20183rd July 2018Henrik Gabs LiliendahlLeave a comment

The term “Interenterprise Data Sharing” has been used a couple of times by Gartner, the analyst firm, during the last two decades.

Lately it has been part of the picturing in conjunction with a recent research document with the title Fundamentals for Data Integration Initiatives.

Data Integration.png — Source: Gartner Inc with red ovals added

The term was also used back in 2001 in the piece about that Data Ownership Extends Outside the Enterprise. Here on the blog it was included in the title of the post about Interenterprise Data Sharing and the 2016 Data Quality Magic Quadrant.

In my eyes interenterprise data sharing is closely related to how you can achieve business benefits from taking part in the ecosystem flavor of a digital business platform. Some of the data types where we will see such business ecosystem platform flourish will be around sharing product model master data and data about and coming from things related to the Internet of Things (IoT) theme. This is further explained in the blog page about Master Data Share.

Diversities in Civil Registration

1st May 2018Henrik Gabs LiliendahlLeave a comment

Citizen Registry

The way governments around the world has organized their Master Data Management (MDM) is quite different. When it comes to registering citizens, the practice varies a lot as described in the post Citizen Master Data Management.

I have lived most of my years in Denmark where our national ID is unique and used for everything by public agencies and also a lot by private companies. Some years ago I lived in the United Kingdom, where the public agencies (and my bank) had no clue about who I were, when I came, what I did and when I left.

Recently the World Economic Forum has circulated some videos on LinkedIn telling about how stuff is done differently around the world. The video below is about the Danish civil registry (which by the way is similar in other Scandinavian countries):

What do you think? Would this public MDM and data quality practice work in USA, UK, Germany or where else you live?

A Business Oriented Data Mind Map

28th April 2018Henrik Gabs Liliendahl2 Comments

You can look at data in many ways.

Below is a mind map embracing some of the ways you can make a picture of data within your business.

data mind map

Data is often seen as the raw material that will be processed into information, which can be used to gather knowledge and thereby over time emerge as business wisdom.

When working with processing data we may distinguish between structured data that is already pre-processed into a workable format and unstructured data that is not easily ingested as information yet.

The main forms of structured data are:

Reference data that often is defined and maintained in a wider scope than in your organization but where you still may consider be more knowledgeable inside your organisation as touched in the post The World of Reference Data.
Master data that describes the who, where and what in your business transactions. You can drill further down into this in the post A Master Data Mind Map.
Transactions that holds the details of the ongoing production events, about when we make purchases and sales and the financials related to all activities in the business.

Unstructured data will in the end hold much more information than our structured data. This includes communication data, digital assets and big data. Some structured data sources are though also big as examined in the post Five Flavors of Big Data.

We may also store the data in different places. For historical reasons within computer technology we have stored our data on premise, but organizations are, in different pace, increasingly depolying new data stores in the cloud.

In organisations with activities in multiple geographies and/or other organizational splits an ongoing consideration is whether a chunk of data is to be handled locally for each unit or to be handled globally (within the organization).

I am sure there are a lot of other ways in which you can look at data. What is on your mind?

The Cases for Data Matching in Multi-Domain MDM

26th April 2018Henrik Gabs LiliendahlLeave a comment

Data matching has always been a substantial part of the capabilities in data quality technology and have become a common capability in Master Data Management (MDM) solutions.

We use the term data matching when talking about linking entities where we cannot just use exact keys in databases.

The most prominent example around is matching names and addresses related to parties, where these attributes can be spelled differently and formatted using different standards but do refer to the same real-world entity. Most common scenarios are deduplication, where we clean up databases for duplicate customer, vendor and other party role records and reference matching, where we identify and enrich party data records with external directories.

A way to pre-process party data matching is matching the locations (addresses) with external references, which has become more and more available around the world, so you have a standardized address in order to reduce the fuzziness. In some geographies you can even make use of more extended location data, as whether the location is a single-family house, a high-rise building, a nursing home or campus. Geocodes can also be brought into the process.

matching MDM Handling the location as a separate unique entity can also be used in many industries as utility, telco, finance, transit and more.

For product data achieving uniqueness usually is a lesser pain point as told in the post Multi-Domain MDM and Data Quality Dimensions. But for sure requirements for matching products arises from time to time.

In the old days this was quite difficult as you often only had a product description that had to be parsed into discrete elements as examined in the post Matching Light Bulbs.

With the rise of Product Information Management (PIM) we now often do have the product attributes in a granular form. However, using traditional matching technology made for party master data will not do the trick as this is a different and more complex scenario. My thinking is that graph technology will help as touched in the post Three Ways of Finding a Product.

Data Pool vs Data Lake

21st April 201825th April 2018Henrik Gabs LiliendahlLeave a comment

Within Product Information Management (PIM) – or Product Master Data Management if you like – there is a concept of a data pool.

Recently Justine Rodian of Stibo Systems made a nice blog post with the title Master Data Management Definitions: The Complete A-Z of MDM. Herein Justine explains a lot of terms within Master Data Management (MDM). A data pool is described as this:

“A data pool is a centralized repository of data where trading partners (e.g., retailers, distributors or suppliers) can obtain, maintain and exchange information about products in a standard format. Suppliers can, for instance, upload data to a data pool that cooperating retailers can then receive through their data pool.”

Now, during the last couple of year I have been working on the concept of applying the data lake approach to product information exchange between trading partners. Justine describes a data lake this way:

“A data lake is a place to store your data, usually in its raw form without changing it. The idea of the data lake is to provide a place for the unaltered data in its native format until it’s needed…..”

Product Data Lake — MacRitchie Reservoir in Singapore

For a provider of product information, typically a manufacturer, the benefit of interacting via a data lake opposite to a data pool is that they do not have to go through standardization before uploading and thus have to shoehorn the data into a specific form and thereby almost certainly leave out important information and being depending on consensus between competing manufacturers.

For a receiver of information, typically a merchant as a retailer and B2B dealer, the benefit of interacting via a data lake opposite to a data pool is that they can request the data in the form they will use to be most competitive and thereby sell more and reduce costs in product information sharing. This will be further accelerated if the merchant uses several data pools.

In Product Data Lake we even combine the best of the two approaches by encompassing data pools in our reservoir concept – to stay in the water body lingo. Here data pools are refreshed with modern data management technology and less rigid incoming and outgoing streams as announced in the post Product Data Lake Version 1.3 is Live.

Seven Flavors of MDM

19th April 201825th April 2018Henrik Gabs LiliendahlLeave a comment

Master Data Management (MDM) can take many forms. An exciting side of being involved in MDM implementations is that every implementation is a little bit different which also makes room for a lot of different technology options. There is no best MDM solution out there. There are a lot of options where some will be the best fit for a given MDM implementation.

The available solutions also change over the years – typically by spreading to cover more land in the MDM space.

In the following I will shortly introduce the basic stuff with seven flavours of MDM. A given MDM implementation will typically be focused on one of these flavours with some elements of the other flavors and a given piece of technology will have an origin in one of these flavours and in more or less degree encompass some more flavors.

7 flavours

The traditional MDM platform

A traditional MDM solution is a hub for master data aiming at delivering a single source of truth (or trust) for master data within a given organization either enterprise wide or within a portion of an enterprise. The first MDM solutions were aimed at Customer Data Integration (CDI), because having multiple and inconsistent data stores for customer data with varying data quality is a well-known pain point almost everywhere. Besides that, similar pain points exist around vendor data and other party roles, product data, assets, locations and other master data domains and dedicated solutions for that are available.

Product Information Management (PIM)

Special breed of solutions for Product Information Management aimed at having consistent product specifications across the enterprise to be published in multiple sales channels have been around for years and we have seen a continuously integration of the market for such solutions into the traditional MDM space as many of these solutions have morphed into being a kind of MDM solution.

Digital Asset Management (DAM)

Not at least in relation to PIM we have a distinct discipline around handling digital assets as text documents, audio files, video and other rich media data that are different from the structured and granular data we can manage in data models in common database technologies. A post on this blog examines How MDM, PIM and DAM Stick Together.

Big Data Integration

The rise of big data is having a considerable influence on how MDM solutions will look like in the future. You may handle big data directly inside MDM og link to big data outside MDM as told in the post about The Intersection of MDM and Big Data.

Application Data Management (ADM)

Another area where you have to decide where master data stops and handling other data starts is when it comes to transactional data and other forms data handled in dedicated applications as ERP, CRM, PLM (Product Lifecycle Management) and plenty of other industry specific applications. This conundrum was touched in a recent post called MDM vs ADM.

Multi-Domain MDM

Many MDM implementations focus on a single master data domain as customer, vendor or product or you see MDM programs that have a multi-domain vision, overall project management but quite separate tracks for each domain. We have though seen many technology vendors preparing for the multi-domain future either by:

Being born in the multi-domain age as for example Semarchy
Acquiring the stuff as for example Informatica and IBM
Extend from PIM as for example Riversand and Stibo Systems

MDM in the cloud

MDM follows the source applications up into the cloud. New MDM solutions naturally come as a cloud solution. The traditional vendors introduce cloud alternatives to or based on their proven on-promise solutions. There is only one direction here: More and more cloud MDM – also as customer as business partner engagement will take place in the cloud.

Ecosystem wide MDM

Doing MDM enterprise wide is hard enough. But it does not stop there. Increasingly every organization will be an integrated part of a business ecosystem where collaboration with business partners will be a part of digitalization and thus we will have a need for working on the same foundation around master data as reported in the post Ecosystem Wide MDM.

Ecosystem Wide MDM

14th April 201814th September 2021Henrik Gabs Liliendahl2 Comments

Doing Master Data Management (MDM) enterprise wide is hard enough. The ability to control master data across your organization is essential to enable digitalization initiatives and ensure the competitiveness of your organization in the future.

But it does not stop there. Increasingly every organization will be an integrated part of a business ecosystem where collaboration with business partners will be a part of digitalization and thus we will have a need for working on the same foundation around master data.

The different master data domains will have different roles to play in such endeavors. Party master will be shared in some degree but there are both competitive factors, data protection and privacy factors to be observed as well. However, privacy regulations as GDPR article 20 on data portability will make data sharing a must too.

MDM Ecosystem

Product master data – or product information if you like – is an obvious master data domain where you can gain business benefits from extending master data management to be ecosystem wide. This includes:

Working with the same product classifications or being able to continuously map between different classifications used by trading partners
Utilizing the same attribute definitions (metadata around products) or being able to continuously map between different attribute taxonomies in use by trading partners
Sharing data on product relationships (available accessories, relevant spare parts, updated succession for products, cross-sell information and up-sell opportunities)
Having access to latest versions of digital assets (text, audio, video) associated with products

The concept of ecosystem wide Multi-Domain MDM is explored further is the article about Master Data Share.

(PS: Ecosystem wide MDM is coined by Gartner, the analyst firm, as multienterprise MDM and later as Interenterprise MDM).

Spreadsheets, Business Process Re-engineering and Robots

10th April 201810th April 2018Henrik Gabs LiliendahlLeave a comment

Product information is the data a potential buyer of a product needs to make a purchasing decision. Today purchasing is more and more made by self-services as in e-commerce. The product information is usually obtained through a supply chain between trading partners stretching from the manufacturer to the end merchant.

The most common way of exchanging product information between trading partners is using spreadsheets. Spreadsheets are marvellous, because you can do almost anything you want with them. However, spreadsheets are also horrendous, because you can do almost anything you want with them. Therefore, trading partners are often stuck with manual, cumbersome and error prone processes on both the providing and receiving end.

At Product Data Lake we have developed a new mechanism that enables a whole new process for exchanging product information between trading partners. We have kept the flexibility of spreadsheets when it comes to choosing the data standards on the providing and receiving end but at the same time introduced automation and correctness when it comes to transferring, translating and transforming the data.

When telling about our service I am often asked if we have a nice feature for on-boarding spreadsheets. We don’t. Because the process is designed to omit the spreadsheets and transfer directly from the providers in-house product information data store(s) to the receiving in-house product information data store.

Robot This reminds me of when we talk about using robots to substitute human labor. Then we often think about a machine that looks like a human. But effective industrial robots do not look like humans. They a designed to do a specific process much more effective than a human and will therefore not look like a human. The same is true in digitalization. When we redesign business processes to be much more effective they should not include spreadsheets.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph