Golden Records in Multi-Domain MDM

The term golden record is a core concept within Master Data Management (MDM). A golden record is a representation of a real world entity that may be compiled from multiple different representations of that entity in a single or in multiple different databases within the enterprise system landscape.

GoldIn Multi-domain MDM we work with a range of different entity types as party (with customer, supplier, employee and other roles), location, product and asset. The golden record concept applies to all of these entity types, but in slightly different ways.

Party Golden Records

Having a golden record that facilitates a single view of customer is probably the most known example of using the golden record concept. Managing customer records and dealing with duplicates of those is the most frequent data quality issue around.

If you are not able to prevent duplicate records from entering your MDM world, which is the best approach, then you have to apply data matching capabilities. When identifying a duplicate you must be able to intelligently merge any conflicting views into a golden record.

In lesser degree we see the same challenges in getting a single view of suppliers and, which is one of my favourite subjects, you ultimately will want to have a single view on any business partner, also where the same real world entity have both customer, supplier and other roles to your organization.

Location Golden Records

Having the same location only represented once in a golden record and applying any party, product and asset record, and ultimately golden record, to that record may be seen as quite academic. Nevertheless, striving for that concept will solve many data quality conundrums.

GoldLocation management have different meanings and importance for different industries. One example is that a brewery makes business with the legal entity (party) that owns a bar, café, restaurant. However, even though the owner of that place changes, which happens a lot, the brewery is still interested in being the brand served at that place. Also, the brewery wants to keep records of logistics around that place and the historic volumes delivered to that place. Utility and insurance is other examples of industries where the location golden record (should) matter a lot.

Knowing the properties of a location also supports the party deduplication process. For example, if you have two records with the name “John Smith” on the same address, the probability of that being the same real world entity is dependent on whether that location is a single-family house or a nursing home.

Product Golden Record

Product Information Management (PIM) solutions became popular with the raise of multi-channel where having the same representation of a product in offline and online channels is essential. The self-service approach in online sales also drew the requirements of managing a lot more product attributes than seen before, which again points to a solution of handling the product entity centralized.

In large organizations that have many business units around the world you struggle with having a local view and a global view of products. A given product may be a finished product to one unit but a raw material to another unit. Even a global SAP rollout will usually not clarify this – rather the contrary.

GoldWhile third party reference data helps a lot with handling golden records for party and location, this is lesser the case for product master data. Classification systems and data pools do exist, but will certainly not take you all the way. With product master data we must, in my eyes, rely more on second party master data meaning sharing product master data within the business ecosystems where you are present.

Asset (or Thing) Golden Records

In asset master data management you also have different purposes where having a single view of a real world asset helps a lot. There are namely financial purposes and logistic purposes that have to aligned, but also a lot of others purposes depending on the industry and the type of asset.

With the raise of the Internet of Things (IoT) we will have to manage a lot more assets (or things) than we usually have considered. When a thing (a machine, a vehicle, an appliance) becomes intelligent and now produces big data, master data management and indeed multi-domain master data management becomes imperative.

You will want to know a lot about the product model of the thing in order to make sense of the produced big data. For that, you need the product (model) golden record. You will want to have deep knowledge of the location in time of the thing. You cannot do that without the location golden records. You will want to know the different party roles in time related to the thing. The owner, the operator, the maintainer. If you want to avoid chaos, you need party golden records.

PIM Supplier Portals: Are They Good or Bad?

A recent discussion on the LinkedIn Multi-Domain MDM group is about vendor / supplier portals as a part of Product Information Management implementations.

A supplier portal (or vendor portal if you like) is usually an extension to a Product Information Management (PIM) solution. The idea is that the suppliers of products, and thus providers of product information, to you as a downstream participant (distributor or retailer) in a supply chain, can upload their product information into your PIM solution and thus relieving you of doing that. This process usually replace the work of receiving spreadsheets from suppliers in the many situations where data pools are not relevant.

In my opinion and experience, this is a flawed concept, because it is hostile to the supplier. The supplier will have hundreds of downstream receivers of products and thus product information. If all of them introduced their own supplier portal, they will have to learn and maintain hundreds of them. Only if you are bigger than your supplier is and is a substantial part of their business, they will go with you.

Broken data supply chainAnother concept, which is the opposite, is also emerging. This is manufacturers and upstream distributors establishing PIM customer portals, where suppliers can fetch product information. This concept is in my eyes flawed exactly the opposite way.

And then let us imagine that every provider of product information had their PIM customer portal and every receiver had their PIM supplier portal. Then no data would flow at all.

What is your opinion and experience?

What Will you Complicate in the Year of the Rooster?

rooster-6Today is the first day in the new year. The year of the rooster according to the Lunar Calendar observed in East Asia. One of the characteristics of the year of the rooster is that in this year, people will tend to complicate things.

People usually likes to keep things simple. The KISS principle – Keep It Simple, Stupid – has many fans. But not me. Not that I do not like to keep things simple. I do. But only as simple as it should be as Einstein probably said. Sometimes KISS is the shortcut to getting it all wrong.

When working with data quality I have come across the three below examples of striking the right balance in making things a bit complicated and not too simple:

Deduplication

One of the most frequent data quality issues around is duplicates in party master data. Customer, supplier, patient, citizen, member and many other roles of legal entities and natural persons, where the real world entity are described more than once with different values in our databases.

In solving this challenge, we can use methods as match codes and edit distance to detect duplicates. However, these methods, often called deterministic, are far too simple to really automate the remedy. We can also use advanced probabilistic methods. These methods are better, but have the downside that the matching done is hard to explain, repeat and reuse in other contexts.

My best experience is to use something in between these approaches. Not too simple and not too overcomplicated.

Address verification

You can make a good algorithm to perform verification of postal and visit addresses in a database for addresses coming from one country. However, if you try the same algorithm on addresses from another country, it often fails miserably.

Making an algorithm for addresses from all over the world will be very complicated. I have not seen one yet, that works.

My best experience is to accept the complication of having almost as many algorithms as there are countries on this planet.

Product classification

Classifications of products controls a lot of the data quality dimensions related to product master data. The most prominent example is completeness of product information. Whether you have complete product information is dependent on the classification of the product. Some attributes will be mandatory for one product but make no sense at all to another product by a different classification.

If your product classification is too simple, your completeness measurement will not be realistic. A too granular or other way complicated classification system is very hard to maintain and will probably seem as an overkill for many purposes of product master data management.

My best experience is that you have to maintain several classification systems and have a linking between them, both inside your organization and between your trading partners.

Happy New Lunar Year

IT is not the opposite of the business, but a part of it

Yin and yangDuring my professional work and not at least when following the data management talk on social media I often stumble upon sayings as:

  • IT should not drive a CRM / MDM / PIM /  XXX project. The business should do that.
  • IT should not be responsible for data quality. The business should be that.

I disagree with that. Not that the business should not do and be those things. But because IT should be a part of the business.

I have personally always disliked the concept of dividing a company into IT and the business. It is a concept practically only used by the IT (and IT focused consulting) side. In my eyes, IT is part of the business just as much as marketing, sales, accounting and all the other departmental units.

With the raise of digitalization the distinction between IT and the business becomes absolutely ridiculous – not to say dangerous.

We need business minded IT people and IT savvy business people to drive digitilization and take responsibility of data quality.

Used abbreviations:

  • IT = Information Technology
  • CRM = Customer Relationship Management
  • MDM = Master Data Management
  • PIM = Product Information Management

Party and Product: The Core Entities in Most Data Models

Party and product are the most frequent master data domains around.

Often you meet party as one of the most frequent party roles being customer and supplier (or vendor) or by another term related to the context as for example citizen, patient, member, student, passenger and many more. These are the people and legal entities we are interacting with and with whom we usually exchange money – and information.

Product (or material) is the things we buy, make and sell. The goods (or services) we exchange.

In my current venture called Product Data Lake our aim to serve the exchange of information about products between trading partners who are customers and suppliers in business ecosystems.

For that, we have been building a data model. Below you see our first developed conceptual data model, which has party and product as the core entities.

PDL concept model.png

As this is a service for business ecosystems, another important entity is the partnership between suppliers and customers of products and the information about the products.

The product link entity in this data model is handling the identification of products by the pairs of trading partners. In the same way, this data model has link entities between the identification of product attributes at pair of trading partners (build on same standards or not) as well as digital asset types.

If you are offering product information management services, at thus being a potential Product Data Lake ambassador, or you are part of a business ecosystem with trading partners, I will be happy to discus with you about adding handling of trading partnerships and product information exchange to your current model.

 

Who will become Future Leaders in the Gartner Multidomain MDM Magic Quadrant?

Gartner emphasizes that the new Magic Quadrant for Master Data Management Solutions Published 19 January 2017 is not solely about multidomain MDM or a consolidation of the two retired MDM quadrants for customer and product master data. However, a long way down the report it still is.

If you want a free copy both Informatica here and Riversand here offers that.

The Current Pole Position and the Pack

The possible positioning was the subject in a post here on the blog some while ago. This post was called The Gartner Magic Quadrant for MDM 2016. The term 2016 has though been omitted in the title of the final quadrant probably because it took into 2017 to finalize the report as reported in the post Gartner MDM Magic Quadrant in Overtime.

Below is my look at the positioning in the current quadrant:

mdm-mq

Starting with the multidomain MDM point the two current leaders, Informatica and Orchestra, have made their way to multidomain in two different ways. Pole position vendor Informatica has used mergers and acquisitions with the old Siperian MDM solution and the Heiler PIM (Product Information Management) solution to build the multidomain MDM leadership. Orchestra Networks has built a multidomain MDM solution from the gound.

The visionary Riversand is coming in from the Product MDM / PIM world as a multidomain MDM wannabe and so is the challenger Stibo. I think SAP is in their right place: Enormous ability to execute with not so much vision.

If you go through the strengths and cautions of the various vendors, you will find a lot of multidomain MDM views from Gartner.

The Future Race

While the edges of the challengers and visionaries’ quadrants are usually empty in a Gartner magic quadrant, the top right in this first multidomain MDM quadrant from Gartner is noticeably empty too. So who will we see there in the future?

Gartner mentions some interesting upcoming vendors earning too little yet. Examples are Agility Multichannel (a Product Data Lake ambassador by the way), Semarchy and Reltio.

The future race track will according to Gartner go through:

  • MDM and the Cloud
  • MDM and the Internet of Things
  • MDM and Big Data

PS: At Product Data Lake we are heading there in full speed too. Therefore, it will be a win-win to see more MDM vendors joining as ambassadors or even being more involved.

MDM: The Technology Trends

There are certainly many things going on in the Master Data Management (MDM) realm when it comes to technologies applied.

The move from on premise based solutions to cloud based solutions has been visible for some years. It is not a rush yet, but we see more and more master data services being offered as cloud services as well as many vendors of full stack MDM platforms offers both on premise, cloud and even hybrid solutions.

As reported in the post Emerging Database Technologies for Master Data new underlying database technologies are put in place instead of the relational database solutions that until now have ruled the MDM world. As mentioned graph databases as Neo4J and document databases as MongoDB (which now also support graph) are examples of new popular choices.

blockchainAs examined by Gartner (the analyst Firm) there are Two Ways of Exploiting Big Data with MDM, either doing it directly or by linking. Anyway, the ties between big data and master data management is in my eyes going to be a main focus for the technology trends in the years to come. Other important ties includes the raise of Industry 4.0 / Internet of Things and blockchain approaches.

We are still waiting for The Gartner Magic Quadrant for Master Data Management Solutions 2016 and the related Critical Capabilities document, so it will be very exciting, in fact more exciting that the vendor positioning, to learn about how Gartner sees the technology trends affecting the MDM landscape.

What are your expectations about Master Data Management and new emerging technologies?