The Database versus the Hub

In the LinkedIn Multi-Domain MDM group we have an ongoing discussion about why you need a master data hub when you already got some workflow, UI and a database.

I have been involved in several master data quality improvement programs without having the opportunity of storing the results in a genuine MDM solution, for example as described in the post Lean MDM. And of course this may very well result in a success story.

However there are some architectural reasons why many more organizations than those who are using a MDM hub today may find benefits in sooner or later having a Master Data hub.

Hierarchical Completeness

If we start with product master data the main issue with storing product master data is the diversity in the requirements for which attributes is needed and when they are needed dependent on the categorization of the products involved.

Typical you will have hundreds or thousands of different attributes where some are crucial for one kind of product and absolutely ridiculous for another kind of product.

Modeling a single product table with thousands of attributes is not a good database practice and pre-modeling tables for each thought categorization is very inflexible.

Setting up mandatory fields on database level for product master data tables is asking for data quality issues as you can’t miss either over-killing or under-killing.

Also product master data entities are seldom created in one single insertion, but is inserted and updated by several different employees each responsible for a set of attributes until it is ready to be approved as a whole.

A master data hub, not at least those born in the product domain, is built for those realities.

The party domain has hierarchical issues too. One example will be if a state/province is mandatory on an address, which is dependent on the country in question.

Single Business Partner View

I like the term “single business partner view” as a higher vision for the more common “single customer view”, as we have the same architectural requirements for supplier master data, employee master data and other master data concerning business partners as we have for the of course extremely important customer master data.

The uniqueness dimension of data quality has a really hard time in common database managers. Having duplicate customer, supplier and employee master data records is the most frequent data quality issue around.

In this sense, a duplicate party is not a record with accurately the same fields filled and with accurate the same values spelled accurately the same as a database will see it. A duplicate is one record reflecting the same real world entity as another record and a duplicate group is more records reflecting the same real world entity.

Even though some database managers have fuzzy capabilities they are still very inadequate in finding these duplicates based on including several attributes at one time and not at least finding duplicate groups.

Finding duplicates when inserting supposed new entities into your customer list and other party master data containers is only the first challenge concerning uniqueness. Next you have to solve the so called survivorship questions being what values will survive unavoidable differences.

Finally the results to be stored may have several constructing outcomes. Maybe a new insertion must be split into two entities belonging to two different hierarchy levels in your party master data universe.

A master data hub will have the capabilities to solve this complexity, some for customer master data only, some also for supplier master data combined with similar challenges with product master data and eventually also other party master data.

Domain Real World Awareness

Building hierarchies, filling incomplete attributes and consolidating duplicates and other forms of real world alignment is most often fulfilled by including external reference data.

There are many sources available for party master as address directories, business directories and citizen information dependent on countries in question.

With product master data global data synchronization involving common product identifiers and product classifications is becoming very important when doing business the lean way.

Master data hubs knows these sources of external reference data so you, once again, don’t have to reinvent the wheel.

Bookmark and Share

Single Customer Hierarchy View

One of the things I do over and over again as part of my work is data matching.

There is a clear tendency that the goal of the data matching efforts increasingly is a master data consolidation taking place before the launch of a master data management (MDM) solution. Such a goal makes the data matching requirements considerably more complex than if the goal is a one-shot deduplication before a direct marketing campaign.

Hierarchy Management

In the post Fuzzy Hierarchy Management I described how requirements for multiple purposes of use of customer master data makes the terms false positive and false negative fuzzy.

As I like to think of a customer as a party role there are essentially two kinds of hierarchies to be aware of:

  • The hierarchies the involved party is belonging to in the real world. This is for example an individual person seen as belonging to a household or a company belonging at a place in a company family tree.
  • The hierarchies of customer roles as seen in different business functions and by different departments. For example two billing entities may belong to the same account in a CRM system in one example, but in another example two CRM accounts have the same billing entity. 

The first type of hierarchy shouldn’t be seen differently between enterprises. You should reach the very same result in data matching regardless of what your organization is doing. It may however be true that your business rules and the regularity requirements applying to your industry and geography may narrow down the need for exploration.

In the latter case we must of course examine the purpose of use for the customer master data within the organization.

Single Customer View

It is in my experience much easier to solve the second case when the first case is solved. This approach was evaluated in the post Lean MDM.

The same approach also applies to continuous data quality prevention as part of a MDM solution. Aligning with the real world and it’s hierarchies as part of the data capture makes solving the customer roles as seen in different business functions and by different departments much easier.  The benefits of doing this is explained in the post instant Data Quality.

It is often said that a “single customer view” is an illusion. I guess it is. First of all the term “single customer view” is a vision, but a vision worth striving at. Secondly customers come in hierarchies. Managing and reflecting these hierarchies is a very important aspect of master data management. Therefore a “single customer view” often ends up as having a “single customer hierarchy view”.    

Bookmark and Share

Lean Social MDM

I have previously written some blog posts about “Social MDM” using the term “Social MDM” to describe the trend of having social media (master) data as a new complexity on top of the already known conundrum of mastering traditional master data.

Stephan Zoder of IBM Initiate discussed this topic in a recent post called CMM is Actually High-Frequency, Social MDM (where CMM is about Customer Motivation Management).

As I also briefly examined the term “Lean MDM” last week I wonder if it is possible to start embracing social media (master) data under a term as “Lean Social MDM”.

The lean MDM post included an actual real life project I have been involved in, which was about how the car rental giant Avis achieved lean MDM for the Scandinavian business.

An underlying business case for this project was that many decisions about car rental is made by individual persons who may act as an employee at (changing) employers and as private renters. Therefore the emphasis of the master data management was at the person in contact, user and private roles.

Having a “single person view” is in my eyes, if it wasn’t before, a good place to start your “Lean Social MDM” journey.

Bookmark and Share

Proactive Data Governance at Work

Data governance is 80 % about people and processes and 20 % (if not less) about technology is a common statement in the data management realm.

This blog post is about the 20 % (or less) technology part of data governance.

The term proactive data governance is often used to describe if a given technology platform is able to support data governance in a good way.

So, what is proactive data governance technology?

Obviously it must be the opposite of reactive data governance technology which must be something about discovering completeness issues like in data profiling and fixing uniqueness issues like in data matching.

Proactive data governance technology must be implemented in data entry and other data capture functionality. The purpose of the technology is to assist people responsible for data capture in getting the data quality right from the start.

If we look at master data management (MDM) platforms we have two possible ways of getting data into the master data hub:

  • Data entry directly in the master data hub
  • Data integration by data feed from other systems as CRM, SCM and ERP solutions and from external partners

In the first case the proactive data governance technology is a part of the MDM platform often implemented as workflows with assistance, checks, controls and permission management. We see this most often related to product information management (PIM) and in business-to-business (B2B) customer master data management. Here the insertion of a master data entity like a product, a supplier or B2B customer involves many different employees each with responsibilities for a set of attributes.

The second case is most often seen in customer data integration (CDI) involving business-to-consumer (B2C) records, but certainly also applies to enriching product master data, supplier master data and B2B customer master data. Here the proactive data governance technology is implemented in the data import functionality or even in the systems of entry best done as Service Oriented Architecture (SOA) components that are hooked into the master data hub as well.

It is a matter of taste if we call such technology proactive data governance support or upstream data quality. From what I have seen so far, it does work.

Bookmark and Share

Hors Catégorie

Right now the yearly paramount in cycling sport Le Tour de France is going on and today is probably the hardest stage in the race with three extraordinary climbs. In cycling races the climbs are categorized on a scale from 4 (the easiest) to 1 (the hardest) depending on the length and steepness. And then there are climbs beyond category, being longer and steeper than usually, like the three climbs today. The description in French for such extreme climbs is “hors catégorie“.

Within master data management categorization is an important activity.

We categorize our customer master data for example depending on what kind of party we dealing with like in the list here called Party Master Data Types that I usually use within customer data integration (CDI). Another way of categorizing is by geography as the data quality challenges may vary depending on where the party in question resides.

In product information management (PIM) categorization of products is one of the most basic activities. Also here the categorization is important for establishing the data quality requirements as they may be very different between various categories as told in the post Hierarchical Completeness.

But there are always some master data records that are beyond categorization in order to fulfill else accepted requirements for data quality as I experienced in the post Big Trouble with Big Names.

Bookmark and Share

Managing Client On-Boarding Data

This year I will be joining FIMA: Europe’s Premier Financial Reference Data Management Conference for Data Management Professionals. The conference is held in London from 8th to 10th November.

I will present “Diversities In Using External Registries In A Globalised World” and take part in the panel discussion “Overcoming Key Challenges In Managing Client On-Boarding Data: Opportunities & Efficiency Ideas”.

As said in the panel discussion introduction: The industry clearly needs to normalise (or is it normalize?) regional differences and establish global standards.

The concept of using external reference data in order to improve data quality within master data management has been a favorite topic of mine for long.

I’m not saying that external reference data is a single source of truth. Clearly external reference data may have data quality issues as exemplified in my previous blog post called Troubled Bridge Over Water.

However I think there is a clear trend in encompassing external sources, increasingly found in the cloud, to make a shortcut in keeping up with data quality. I call this Data Quality 3.0.

The Achilles Heel though has always been how to smoothly integrate external data into data entry functionality and other data capture processes and not to forget, how to ensure ongoing maintenance in order to avoid else inevitable erosion of data quality.

Lately I have worked with a concept called instant Data Quality. The idea is to make simple yet powerful functionality that helps with hooking up with many external sources at the same time when on-boarding clients and making continuous maintenance possible.

One aspect of such a concept is how to exploit the different opportunities available in each country as public administrative practices and privacy norms varies a lot over the world.

I’m looking forward to present and discuss these challenges and getting a lot of feedback.

Bookmark and Share

Mutating Platforms or Intelligent Design

How do we go from single-domain master data management to multi-domain master data management? Will it be through evolution of single-domain solutions or will it require a complete new intelligent design?

The MDM journey

My previous blog post was a book review of “Master Data Management in Practice” by Dalton Servo and Mark Allen – or the full title of the book is in fact “Master Data Management in Practice: Achieving True Customer MDM”.

The customer domain has until now been the most frequent and proven domain for master data management and as said in the book, the domain where most organizations starts the MDM journey in particular by doing what is usually called Customer Data Integration (CDI).

However some organizations do start with Product Information Management (PIM). This is mainly due to the magic numbers being the fact that some organizations have a higher number of products than customers in the database.

Sooner or later most organizations will continue the MDM journey by embracing more domains.

Achieving Multi-Domain MDM

John Owens made a blog post yesterday called “Data Quality: Dead Crows Kill Customers! Dead Crows also Kill Suppliers!” The post explains how some data structures are similar between sales and purchasing. For example a customer and a supplier are very similar as a party.

Customer Data Integration (CDI) has a central entity being the customer, which is a party. Product Information Management (PIM) has an important entity being a supplier, which is a party. The data structures and the workflows needed to Create, Read, Update and perhaps Delete these entities are very similar, not at least in business-to-business (B2B) environments.

So, when you are going from PIM to CDI, you don’t have to start from scratch, not at least in a B2B environment.

The trend in the master data management technology market is that many vendors are working their way from being a single domain vendor to being a multi-domain vendor – and some are promoting their new intelligent design embracing all domains from day one.

Some other vendors are breeding several platforms (often based on acquisition) from different domains into one brand, and some vendors are developing from a single domain into new domains.

Each strategy has its pros and cons. It seems there will be plenty of philosophies to choose from when organizations are going the select the platform(s) to support the multi-domain MDM journey.

Bookmark and Share

Book Review: Cervo and Allen on MDM in Practice

Master Data Management is becoming increasingly popular and so are writing books about Master Data Management.

Last month Dalton Cervo and Mark Allen published their contribution to the book selection. The book is called “Master Data Management in Practice: Achieving True Customer MDM”.

As disclosed in the first part of the title, the book emphasizes on the practical aspects of implementing and maintaining Master Data Management and as disclosed in the second part of the title, the book focuses on customer MDM, which, until now, is the most frequent and proven domain in MDM.  

In my opinion the book has succeeded very well in keeping a practical view on MDM. And I think that limiting the focus to customer MDM supports the understanding of the issues discussed in a good way, though, as the authors also recognizes in the final part, that multi-domain MDM is becoming a trend.   

Mastering customer master data is a huge subject area. In my eyes this book addresses all the important topics with a good balance, both in the sense of embracing business and technology angels with equal weight and not presenting the issues in a too simple way or in a too complex way.  

I like how the authors are addressing the ROI question by saying: “Attempts to try to calculate and project ROI will be swag at best and probably miss the central point that MDM is really an evolving business practice that is necessary to better manage your data, and not a specific project with a specific expectation and time-based outcome that can be calculated up front”.

In the final summary the authors say: “The journey through MDM is a constantly learning, churning and maturing experience. Hopefully, we have contributed with enough insight to make your job easier”. Yep, Dalton and Mark, you have done that.

Bookmark and Share

Party On

The most frequent data domain addressed in data quality improvement and master data management is parties.

Some of the issues related to parties that keeps on creating difficulties are:

  • Party roles
  • International diversity
  • Real world alignment

Party roles

Party data management is often coined as customer data management or customer data integration (CDI).

Indeed, customers are the lifeblood of any enterprise – also if we refer to those who benefit from our services as citizens, patients, clients or whatever term in use in different industries.

But the full information chain within any organization also includes many other party roles as explained in the post 360° Business Partner View. Some parties are suppliers, channel partners and employees. Some parties play more than one role at the same time.

The classic question “what is a customer?” is of course important to be answered in your master data management and data quality journey. But in my eyes there is lot of things to be solved in party data management that don’t need to wait for the answer to that question which anyway won’t be as simple as cutting the Gordian Knot as said in the post Where is the Business.

International diversity

As discussed in the post The Tower of Babel more and more organizations are met with multi-cultural issues in data quality improvement within party data management.

Whether and when an organization has to deal with international issues is of course dependent on whether and in what degree that organization is domestic or active internationally. Even though in some countries like Switzerland and Belgium having several official languages the multi-cultural topic is mandatory. Typically in large countries companies grows big before looking abroad while in smaller countries, like my home country Denmark, even many fairly small companies must address international issues with data quality.

However, as Karen Lopez recently pondered in the post Data Quality in The Wild, Some Where …, actually everyone, even in the United States, has some international data somewhere looking very strange if not addressed properly.

Real world alignment

I often say that real world alignment, sometimes as opposed to the common definition of data quality as being fit for purpose, is the short cut to getting data quality right related to party master data.

It is however not a straight forward short cut. There are multiple challenges connected with getting your business-to-business (B2B) records aligned with the real world as discussed in the post Single Company View.  When it comes to business-to-consumer (B2C) or government-to-citizen (G2C) I think the dear people who sometimes comments on this blog did a fine job on balancing mutating tables and intelligent design in the post Create Table Homo_Sapiens.

Bookmark and Share

Does One Size Fit Anyone?

Following up on a recent post about data silos I have been thinking (and remembering) a bit about the idea that one company can have all master data stored in a single master data hub.

Supply Chain Musings

If you for example look at a manufacturer the procurement of raw materials is of course an important business process.

Besides purchasing raw materials the manufacturer also buys machinery, spare parts for the machinery and maintenance services for the machinery.

Like everyone else the manufacturer also buys office supplies – including rare stuff as data quality tools and master data management consultancy.

If you look at the vendor table in such a company the number of “supporting suppliers” are much higher than the number of the essential suppliers of raw materials. The business processes, data structures and data quality metrics for on-boarding and maintaining supplier data and product data are “same same but very different” for these groups of suppliers and the product data involved.

Supply Chain Centric Selling

I remember at one client in manufacturing a bi-function in procurement was selling bi-products from the production to a completely different audience than the customers for the finished products. They had a wonderful multi-domain data silo for that.

Hierarchical Customer Relations

A manufacturer may have a golden business rule saying that all sales of finished products go through channel partners. That will typically mean a modest number of customers in the basic definition being someone who pays you. Here you typically need a complex data structure and advanced workflows for business-to-business (B2B) customer relationship management.

Your channel partners will then have customers being either consumers (B2B2C) or business users within a wider range of companies. I have noticed an increasing interest in keeping some kind of track of the interaction with end users of your products, and I guess embracing social media will only add to that trend. The business processes, data structures and data quality metrics for doing that are “same same but very different” from your basic customer relationship management.


The above musings are revolved around manufacturing companies, but I have met similar ranges of primary and secondary constructs related to master data management in all other industry verticals.   

So, can all master data in a given company be handled in a single master data hub?

I think it’s possible, but it has to be an extremely flexible hub either having a lot of different built-in functionality or being open for integration with external services.

Bookmark and Share