The Database versus the Hub

In the LinkedIn Multi-Domain MDM group we have an ongoing discussion about why you need a master data hub when you already got some workflow, UI and a database.

I have been involved in several master data quality improvement programs without having the opportunity of storing the results in a genuine MDM solution, for example as described in the post Lean MDM. And of course this may very well result in a success story.

However there are some architectural reasons why many more organizations than those who are using a MDM hub today may find benefits in sooner or later having a Master Data hub.

Hierarchical Completeness

If we start with product master data the main issue with storing product master data is the diversity in the requirements for which attributes is needed and when they are needed dependent on the categorization of the products involved.

Typical you will have hundreds or thousands of different attributes where some are crucial for one kind of product and absolutely ridiculous for another kind of product.

Modeling a single product table with thousands of attributes is not a good database practice and pre-modeling tables for each thought categorization is very inflexible.

Setting up mandatory fields on database level for product master data tables is asking for data quality issues as you can’t miss either over-killing or under-killing.

Also product master data entities are seldom created in one single insertion, but is inserted and updated by several different employees each responsible for a set of attributes until it is ready to be approved as a whole.

A master data hub, not at least those born in the product domain, is built for those realities.

The party domain has hierarchical issues too. One example will be if a state/province is mandatory on an address, which is dependent on the country in question.

Single Business Partner View

I like the term “single business partner view” as a higher vision for the more common “single customer view”, as we have the same architectural requirements for supplier master data, employee master data and other master data concerning business partners as we have for the of course extremely important customer master data.

The uniqueness dimension of data quality has a really hard time in common database managers. Having duplicate customer, supplier and employee master data records is the most frequent data quality issue around.

In this sense, a duplicate party is not a record with accurately the same fields filled and with accurate the same values spelled accurately the same as a database will see it. A duplicate is one record reflecting the same real world entity as another record and a duplicate group is more records reflecting the same real world entity.

Even though some database managers have fuzzy capabilities they are still very inadequate in finding these duplicates based on including several attributes at one time and not at least finding duplicate groups.

Finding duplicates when inserting supposed new entities into your customer list and other party master data containers is only the first challenge concerning uniqueness. Next you have to solve the so called survivorship questions being what values will survive unavoidable differences.

Finally the results to be stored may have several constructing outcomes. Maybe a new insertion must be split into two entities belonging to two different hierarchy levels in your party master data universe.

A master data hub will have the capabilities to solve this complexity, some for customer master data only, some also for supplier master data combined with similar challenges with product master data and eventually also other party master data.

Domain Real World Awareness

Building hierarchies, filling incomplete attributes and consolidating duplicates and other forms of real world alignment is most often fulfilled by including external reference data.

There are many sources available for party master as address directories, business directories and citizen information dependent on countries in question.

With product master data global data synchronization involving common product identifiers and product classifications is becoming very important when doing business the lean way.

Master data hubs knows these sources of external reference data so you, once again, don’t have to reinvent the wheel.

Bookmark and Share

The trees never grow into heaven

This morning most of digital Denmark was closed. You couldn’t do anything at the online bank, you couldn’t do much at public sector websites and you couldn’t read electronic mail from your employer, pension institution and others.

It wasn’t because someone cut a big cable or a computer virus got a lucky strike. The problem was that the centralized internet login service had a three hour outage. It was a classic single point of failure incident.

In Denmark we have a single sign-on identity solution used by public sector, financial services and other organizations. The service is called NemID (Easy ID) and is based on an all-purpose unique national ID for every citizen.

As more and more interaction with public sector and financial services along with online shopping is taking place in the cloud, we are of course more and more vulnerable to these kind of problems.

The benefits of having a single source of truth about who you are became a single point of failure here.

Well, we have this local saying: “The trees never grow into heaven”. All good things have their limit. Even in instant Identity Resolution.

Bookmark and Share

Single Customer Hierarchy View

One of the things I do over and over again as part of my work is data matching.

There is a clear tendency that the goal of the data matching efforts increasingly is a master data consolidation taking place before the launch of a master data management (MDM) solution. Such a goal makes the data matching requirements considerably more complex than if the goal is a one-shot deduplication before a direct marketing campaign.

Hierarchy Management

In the post Fuzzy Hierarchy Management I described how requirements for multiple purposes of use of customer master data makes the terms false positive and false negative fuzzy.

As I like to think of a customer as a party role there are essentially two kinds of hierarchies to be aware of:

  • The hierarchies the involved party is belonging to in the real world. This is for example an individual person seen as belonging to a household or a company belonging at a place in a company family tree.
  • The hierarchies of customer roles as seen in different business functions and by different departments. For example two billing entities may belong to the same account in a CRM system in one example, but in another example two CRM accounts have the same billing entity. 

The first type of hierarchy shouldn’t be seen differently between enterprises. You should reach the very same result in data matching regardless of what your organization is doing. It may however be true that your business rules and the regularity requirements applying to your industry and geography may narrow down the need for exploration.

In the latter case we must of course examine the purpose of use for the customer master data within the organization.

Single Customer View

It is in my experience much easier to solve the second case when the first case is solved. This approach was evaluated in the post Lean MDM.

The same approach also applies to continuous data quality prevention as part of a MDM solution. Aligning with the real world and it’s hierarchies as part of the data capture makes solving the customer roles as seen in different business functions and by different departments much easier.  The benefits of doing this is explained in the post instant Data Quality.

It is often said that a “single customer view” is an illusion. I guess it is. First of all the term “single customer view” is a vision, but a vision worth striving at. Secondly customers come in hierarchies. Managing and reflecting these hierarchies is a very important aspect of master data management. Therefore a “single customer view” often ends up as having a “single customer hierarchy view”.    

Bookmark and Share

Some Deduplication Tactics

When doing the data quality kind of deduplication you will often have two kinds of data matching involved:

  • Data matching in order to find duplicates internally in your master data, most often your customer database
  • Data matching in order to align your master data with an external registry

As the latter activity also helps with finding the internal duplicates, a good question is in which order to do these two activities.

External identifiers

If we for example look at business-to-business (B2B) customer master data it is possible to match against a business directory. Some choices are:

  • If you have mostly domestic data in a country with a public company registration you can obtain a national ID from matching with a business directory based on such a registry. An example will be the French SIREN/SIRET identifiers as mentioned in the post Single Company View.
  • Some registries cover a range of countries. An example is the EuroContactPool where each business entity is identified with a Site ID.
  • The Dun & Bradstreet WorldBase covers the whole world by identifying approximately 200 million active and dissolved business entities with a DUNS-number. The DUNS-number also serves as a privatized national ID for companies in the United States.

If you start with matching your B2B customers against such a registry, you will get a unique identifier that can be attached to your internal customer master data records which will make a succeeding internal deduplication a no-brainer.

Common matching issues

A problem is however is that you seldom get a 100 % hit rate in a business directory matching, often not even close as examined in the post 3 out of 10.

Another issue is the commercial implications. Business directory matching is often performed as an external service priced per record. Therefore you may save money by merging the duplicates before passing on to external matching. And even if everything is done internally, removing the duplicates before directory matching will save process load.

However a common pitfall is that an internal deduplication may merge two similar records that actually are represented by two different entities in the business directory (and the real world).

So, as many things data matching, the answer to the sequence question is often: Both.

A good process sequence may be this one:

  1. An internal deduplication with very tight settings
  2. A match against an external registry
  3. An internal deduplication exploiting external identifiers and having more loose settings for similarities not involving an external identifier

Bookmark and Share

Lean Social MDM

I have previously written some blog posts about “Social MDM” using the term “Social MDM” to describe the trend of having social media (master) data as a new complexity on top of the already known conundrum of mastering traditional master data.

Stephan Zoder of IBM Initiate discussed this topic in a recent post called CMM is Actually High-Frequency, Social MDM (where CMM is about Customer Motivation Management).

As I also briefly examined the term “Lean MDM” last week I wonder if it is possible to start embracing social media (master) data under a term as “Lean Social MDM”.

The lean MDM post included an actual real life project I have been involved in, which was about how the car rental giant Avis achieved lean MDM for the Scandinavian business.

An underlying business case for this project was that many decisions about car rental is made by individual persons who may act as an employee at (changing) employers and as private renters. Therefore the emphasis of the master data management was at the person in contact, user and private roles.

Having a “single person view” is in my eyes, if it wasn’t before, a good place to start your “Lean Social MDM” journey.

Bookmark and Share

Lean MDM

With a discipline as master data management there will of course always be an agile or lean way of doing things.

What is lean MDM?

A document from 2008 called A LEAN APPROACH TO MASTER DATA MANAGEMENT by Duff Bailey examines the benefits of lean MDM.

The document has a view close to me saying that: “While there is little argument over what constitutes an individual person, many existing data models make the mistake of modeling “roles” (customer, employee, stock-holder, vendor contact, etc.) instead”.

As discussed in the article similar views can be made around organization entities, location entities and product entities.

In conclusion Duff says that: “Because of their universality and their abstract nature, these core data models can be established quickly, without the need for lengthy review that normally accompanies an enterprise data model. Thereafter, the focus of the lean data managemnent effort will be to grow the models and populate the repositories in support of specific business objectives”.

MDM in the high gear

The fast time-to-value for lean MDM was also emphasized by MDM guru Aaron Zornes in a tweet yesterday:

The mentioned LeanMDM offer from Omikron Data Quality (which is one of my employers) is described in the link (in German). A short resume of the text is that you among other things will get this from lean MDM:

  • An increase in the corporate value of customer data
  • Short project times and fast results
  • Lower implementation costs through service-oriented architecture (SOA)

I have been involved in one of the implementations of the LeanMDM concept as described in this article (in English) about how the car rental giant Avis achieved lean MDM for the Scandinavian business.

Bookmark and Share

Unmaintainability

Following up on my post about word quality and inspired by a blog post by Joyce Norris-Montanari called “Things That Don’t Work So Well – Doing Analytics Before Their Time” in which the word “unmaintainable” is used I want to challenge my English spell checker even further with the rare and apparently not really existing word but frequent issue of unmaintainability.

I have previously on this blog pondered that you can’t expect that because you get it Right the First Time then everything will be just fine from this day forward. Things change.

This argument is about the data as plain data.

But there is also a maintainability (this is apparently a real word) issue around how we store data. I have many times conducted data quality exercises as deduplication and matching with and enriching from external reference data in order to reach a single version of the truth as far as it goes.

An often encountered problem is that this kind of data processing can get us somewhere close to a single version of the truth. But then there is a huge obstacle: You can’t get these great results back to the daily databases without destroying some of the correctness because the data structures don’t allow you to do that.

Such kind of unmaintainability is in my eyes a good argument for looking into master data management platforms that allows you to maintain your master data in the complexity that supports the business rules that make your company more competitive.

Bookmark and Share

The 20 Million Rupees Question

Here we go again. The same old question: “What is the definition of customer?”  Latest Informatica (a data quality, master data management and data integration firm) has hired David Loshin to find out – started in the blog post The Most Dangerous Question to Ask Data Professionals.

Shortly, my take is that this question in practice has two major implications for data quality and master data management but in theory, it should only have one:

  • The first one is real world alignment. In theory real world alignment is independent of the definition of a customer as it is about the party behind the customer.
  • The second is party roles. It’s actually here we can have an endless discussion.

In practice we of course mix things up as discussed in the post Entity Revolution vs Entity Evolution.

And Now for Something Completely Different

Instead of saying that “What is the definition of customer?”  is the million dollar question it’s probably more like the 20 million rupees question as most data management these days are taking place in India.

The amount of money involved is taken from the film Slumdog Millionaire where 20 million rupees is the top prize in the local “Who Wants to Be a Millionaire?” (Kaun Banega Crorepati), which by the way has the same jingle and graphics as all over the world.

And oh, how much is 20 million rupees? It’s near ½ million US dollars or 300.000 euro (with a dot as thousand separator). But a lot in buying power for a local customer. Exactly 2 crores (2,00,00,000 rupees).  

Party on.

Bookmark and Share

AAA

A top theme in the economic news these days is about credit ratings for countries – also called sovereign credit ratings.

The credit rating practice is a good example of how a lot of data (with a given quality) is transformed into a very compact piece of information as an AAA or whatever rating (with a disputed quality).   

The focus of this blog post is however about how credit ratings may be attached to reference and master data entities.

The figure below is a data visualization of S&P credit ratings for European countries:

The big dark blue land in the upper left corner is the southern part of Greenland. Even though that Greenland has an ISO country code (GL) and an internet TLD (.gl) Greenland hasn’t actually been rated as a country, but is (my qualified guess) rated together with the Faroe Islands and continental Denmark as the Kingdom of Denmark.

On other maps Greenland isn’t included in the triple-A club:

So this is a good example of how a top level reference data list as a country list may have hierarchies and may be specific in a given context, a subject that often is pondered by fellow data geek and blogger Graham Rhind latest in the post: Have you checked your country drop down recently?

A much more frequent subject than sovereign credit rating is of course corporate credit rating.

Here we have the same hierarchical considerations.

A business-to-business (B2B) customer list may have a lot of entities belonging to the same enterprise that is credit rated as one. However you shouldn’t give a credit limit to each entity which would be the credit limit you would assign to the enterprise as a whole. Avoiding that will be an important result from practicing good customer master data management.   

An often observed data quality flaw in customer master data is that entities actually belonging to the same credit rated enterprise has different credit risk assignments resulting in exposed financial risk. Avoiding that will also be an important result from practicing good customer master data management.   

How do you rate your customer master data management? AAA or less?   

Bookmark and Share

Five Moments of Truth

Within Customer Relationship Management (CRM) and related Master Data Management (MDM) the party behind the business-to-business (B2B) customer is an important entity.

It is often said that the data capture is the most important moment where it is essential to get data quality right. However with a complex entity as a B2B customer, there are of course several moments of truth within the life circle for such an entity.     

These are probably the five most important ones:

  • A lead is born
  • Engaging a prospect
  • One more customer
  • Churn happens
  • Win-Back happiness

A lead is born

Leads are born in many different ways: A business card obtained from a little chit-chat on a conference, buying a list of leads or even an engagement in social media as the new way of doing things.

One of the most important things to do when capturing the data at this point is ensuring if you already have the party somewhere in the customer life circle or maybe even in other party roles as examined in the post 360° Business Partner View.

Engaging a prospect

When a lead is qualified as a new prospect and you typically engage in a one-to-one dialogue this process includes capturing more data.

Such new data may include adding a visit address to the first captured mail address or vice versa and expanding the firmographic collection of data.  

As explained in the post What are they doing? there are a lot of data quality issues in capturing such data as:

  • Unstructured versus structured data
  • Internal versus external reference data
  • One versus several values

One more customer

After a successful sales process a new customer can be added to the customer list often with more data being captured as adding a billing address and stating credit risk as credit limit and terms of payment.

This is the point where many party entities are split into data silos. Maybe the current customer master data lives on in the CRM system while new customer data are reentered and enriched in an ERP system and even other business applications.

Keeping these data silos aligned is the classic customer master data challenge as discussed in the post Boiling Data Silos.

Churn happens

There are actually two kind of churns (loss of customers):

  • A customer stops a subscription, a service contract or tell you that further buying will be at your competitors or that there is no further need for the products and services in question
  • A customer dissolves

Sometimes you don’t even discover the latter one. So your data isn’t very useful or valuable if you don’t practice Ongoing Data Maintenance.

Win-Back happiness

In the first kind of churn you may work hard (or be lucky) and win back the customer.

Be sure to build on the data from the first engagement and not start from scratch again capturing master data and history. Avoiding this covers up for some of the 55 reasons to improve data quality related to party master data uniqueness.

Bookmark and Share