Some Deduplication Tactics

When doing the data quality kind of deduplication you will often have two kinds of data matching involved:

  • Data matching in order to find duplicates internally in your master data, most often your customer database
  • Data matching in order to align your master data with an external registry

As the latter activity also helps with finding the internal duplicates, a good question is in which order to do these two activities.

External identifiers

If we for example look at business-to-business (B2B) customer master data it is possible to match against a business directory. Some choices are:

  • If you have mostly domestic data in a country with a public company registration you can obtain a national ID from matching with a business directory based on such a registry. An example will be the French SIREN/SIRET identifiers as mentioned in the post Single Company View.
  • Some registries cover a range of countries. An example is the EuroContactPool where each business entity is identified with a Site ID.
  • The Dun & Bradstreet WorldBase covers the whole world by identifying approximately 200 million active and dissolved business entities with a DUNS-number. The DUNS-number also serves as a privatized national ID for companies in the United States.

If you start with matching your B2B customers against such a registry, you will get a unique identifier that can be attached to your internal customer master data records which will make a succeeding internal deduplication a no-brainer.

Common matching issues

A problem is however is that you seldom get a 100 % hit rate in a business directory matching, often not even close as examined in the post 3 out of 10.

Another issue is the commercial implications. Business directory matching is often performed as an external service priced per record. Therefore you may save money by merging the duplicates before passing on to external matching. And even if everything is done internally, removing the duplicates before directory matching will save process load.

However a common pitfall is that an internal deduplication may merge two similar records that actually are represented by two different entities in the business directory (and the real world).

So, as many things data matching, the answer to the sequence question is often: Both.

A good process sequence may be this one:

  1. An internal deduplication with very tight settings
  2. A match against an external registry
  3. An internal deduplication exploiting external identifiers and having more loose settings for similarities not involving an external identifier

Bookmark and Share

Lean Social MDM

I have previously written some blog posts about “Social MDM” using the term “Social MDM” to describe the trend of having social media (master) data as a new complexity on top of the already known conundrum of mastering traditional master data.

Stephan Zoder of IBM Initiate discussed this topic in a recent post called CMM is Actually High-Frequency, Social MDM (where CMM is about Customer Motivation Management).

As I also briefly examined the term “Lean MDM” last week I wonder if it is possible to start embracing social media (master) data under a term as “Lean Social MDM”.

The lean MDM post included an actual real life project I have been involved in, which was about how the car rental giant Avis achieved lean MDM for the Scandinavian business.

An underlying business case for this project was that many decisions about car rental is made by individual persons who may act as an employee at (changing) employers and as private renters. Therefore the emphasis of the master data management was at the person in contact, user and private roles.

Having a “single person view” is in my eyes, if it wasn’t before, a good place to start your “Lean Social MDM” journey.

Bookmark and Share

AAA

A top theme in the economic news these days is about credit ratings for countries – also called sovereign credit ratings.

The credit rating practice is a good example of how a lot of data (with a given quality) is transformed into a very compact piece of information as an AAA or whatever rating (with a disputed quality).   

The focus of this blog post is however about how credit ratings may be attached to reference and master data entities.

The figure below is a data visualization of S&P credit ratings for European countries:

The big dark blue land in the upper left corner is the southern part of Greenland. Even though that Greenland has an ISO country code (GL) and an internet TLD (.gl) Greenland hasn’t actually been rated as a country, but is (my qualified guess) rated together with the Faroe Islands and continental Denmark as the Kingdom of Denmark.

On other maps Greenland isn’t included in the triple-A club:

So this is a good example of how a top level reference data list as a country list may have hierarchies and may be specific in a given context, a subject that often is pondered by fellow data geek and blogger Graham Rhind latest in the post: Have you checked your country drop down recently?

A much more frequent subject than sovereign credit rating is of course corporate credit rating.

Here we have the same hierarchical considerations.

A business-to-business (B2B) customer list may have a lot of entities belonging to the same enterprise that is credit rated as one. However you shouldn’t give a credit limit to each entity which would be the credit limit you would assign to the enterprise as a whole. Avoiding that will be an important result from practicing good customer master data management.   

An often observed data quality flaw in customer master data is that entities actually belonging to the same credit rated enterprise has different credit risk assignments resulting in exposed financial risk. Avoiding that will also be an important result from practicing good customer master data management.   

How do you rate your customer master data management? AAA or less?   

Bookmark and Share

Five Moments of Truth

Within Customer Relationship Management (CRM) and related Master Data Management (MDM) the party behind the business-to-business (B2B) customer is an important entity.

It is often said that the data capture is the most important moment where it is essential to get data quality right. However with a complex entity as a B2B customer, there are of course several moments of truth within the life circle for such an entity.     

These are probably the five most important ones:

  • A lead is born
  • Engaging a prospect
  • One more customer
  • Churn happens
  • Win-Back happiness

A lead is born

Leads are born in many different ways: A business card obtained from a little chit-chat on a conference, buying a list of leads or even an engagement in social media as the new way of doing things.

One of the most important things to do when capturing the data at this point is ensuring if you already have the party somewhere in the customer life circle or maybe even in other party roles as examined in the post 360° Business Partner View.

Engaging a prospect

When a lead is qualified as a new prospect and you typically engage in a one-to-one dialogue this process includes capturing more data.

Such new data may include adding a visit address to the first captured mail address or vice versa and expanding the firmographic collection of data.  

As explained in the post What are they doing? there are a lot of data quality issues in capturing such data as:

  • Unstructured versus structured data
  • Internal versus external reference data
  • One versus several values

One more customer

After a successful sales process a new customer can be added to the customer list often with more data being captured as adding a billing address and stating credit risk as credit limit and terms of payment.

This is the point where many party entities are split into data silos. Maybe the current customer master data lives on in the CRM system while new customer data are reentered and enriched in an ERP system and even other business applications.

Keeping these data silos aligned is the classic customer master data challenge as discussed in the post Boiling Data Silos.

Churn happens

There are actually two kind of churns (loss of customers):

  • A customer stops a subscription, a service contract or tell you that further buying will be at your competitors or that there is no further need for the products and services in question
  • A customer dissolves

Sometimes you don’t even discover the latter one. So your data isn’t very useful or valuable if you don’t practice Ongoing Data Maintenance.

Win-Back happiness

In the first kind of churn you may work hard (or be lucky) and win back the customer.

Be sure to build on the data from the first engagement and not start from scratch again capturing master data and history. Avoiding this covers up for some of the 55 reasons to improve data quality related to party master data uniqueness.

Bookmark and Share

Good-Bye to the old CRM data model

Today I stumbled upon a blog post called Good-Bye to the “Job” by David Houle, a futurist, strategist and speaker.

In the post it is said: “In the Industrial Age, machines replaced manual or blue-collar labor. In the Information Age, computers replaced office or white-collar workers”.

The post is about that today we can’t expect to occupy one life-long job at a single employer.  We must increasingly create our own job.

My cyberspace friend Phil Simon also wrote about his advanced journey into this space recently in the post Diversifying Yourself Into a Platform Business.

The subject is close to me as I currently have approximately five different occupations as seen in my LinkedIn profile.

A professional angle to this subject is also how that development will turn some traditional data models upside down.

A Customer Relationship Management (CRM) system for business-to-business (B2B) environments has a basic data model with accounts having a number of contacts attached where the account is the parent and the contacts are the children in data modeling language.

Most systems and business processes have trouble when following a contact from account (company) to account (company) when the contact gets a new job or when the same real world individual is a contact at two or more accounts (companies) at the same time.

I have seen this problem many times and also failed to recognize it myself from time to time as told in the post A New Year Resolution.

My guess is that CRM systems in the B2B realm will turn to a more contact oriented view over time and this will probably be along with that CRM systems will rely more on Master Data Management (MDM) hubs in order to effectively reflect a fast, but not equally, changing world, as the development in the way we have jobs doesn’t happen at the same time at all places.  

Bookmark and Share

Proactive Data Governance at Work

Data governance is 80 % about people and processes and 20 % (if not less) about technology is a common statement in the data management realm.

This blog post is about the 20 % (or less) technology part of data governance.

The term proactive data governance is often used to describe if a given technology platform is able to support data governance in a good way.

So, what is proactive data governance technology?

Obviously it must be the opposite of reactive data governance technology which must be something about discovering completeness issues like in data profiling and fixing uniqueness issues like in data matching.

Proactive data governance technology must be implemented in data entry and other data capture functionality. The purpose of the technology is to assist people responsible for data capture in getting the data quality right from the start.

If we look at master data management (MDM) platforms we have two possible ways of getting data into the master data hub:

  • Data entry directly in the master data hub
  • Data integration by data feed from other systems as CRM, SCM and ERP solutions and from external partners

In the first case the proactive data governance technology is a part of the MDM platform often implemented as workflows with assistance, checks, controls and permission management. We see this most often related to product information management (PIM) and in business-to-business (B2B) customer master data management. Here the insertion of a master data entity like a product, a supplier or B2B customer involves many different employees each with responsibilities for a set of attributes.

The second case is most often seen in customer data integration (CDI) involving business-to-consumer (B2C) records, but certainly also applies to enriching product master data, supplier master data and B2B customer master data. Here the proactive data governance technology is implemented in the data import functionality or even in the systems of entry best done as Service Oriented Architecture (SOA) components that are hooked into the master data hub as well.

It is a matter of taste if we call such technology proactive data governance support or upstream data quality. From what I have seen so far, it does work.

Bookmark and Share

Party On

The most frequent data domain addressed in data quality improvement and master data management is parties.

Some of the issues related to parties that keeps on creating difficulties are:

  • Party roles
  • International diversity
  • Real world alignment

Party roles

Party data management is often coined as customer data management or customer data integration (CDI).

Indeed, customers are the lifeblood of any enterprise – also if we refer to those who benefit from our services as citizens, patients, clients or whatever term in use in different industries.

But the full information chain within any organization also includes many other party roles as explained in the post 360° Business Partner View. Some parties are suppliers, channel partners and employees. Some parties play more than one role at the same time.

The classic question “what is a customer?” is of course important to be answered in your master data management and data quality journey. But in my eyes there is lot of things to be solved in party data management that don’t need to wait for the answer to that question which anyway won’t be as simple as cutting the Gordian Knot as said in the post Where is the Business.

International diversity

As discussed in the post The Tower of Babel more and more organizations are met with multi-cultural issues in data quality improvement within party data management.

Whether and when an organization has to deal with international issues is of course dependent on whether and in what degree that organization is domestic or active internationally. Even though in some countries like Switzerland and Belgium having several official languages the multi-cultural topic is mandatory. Typically in large countries companies grows big before looking abroad while in smaller countries, like my home country Denmark, even many fairly small companies must address international issues with data quality.

However, as Karen Lopez recently pondered in the post Data Quality in The Wild, Some Where …, actually everyone, even in the United States, has some international data somewhere looking very strange if not addressed properly.

Real world alignment

I often say that real world alignment, sometimes as opposed to the common definition of data quality as being fit for purpose, is the short cut to getting data quality right related to party master data.

It is however not a straight forward short cut. There are multiple challenges connected with getting your business-to-business (B2B) records aligned with the real world as discussed in the post Single Company View.  When it comes to business-to-consumer (B2C) or government-to-citizen (G2C) I think the dear people who sometimes comments on this blog did a fine job on balancing mutating tables and intelligent design in the post Create Table Homo_Sapiens.

Bookmark and Share

B2C versus B2B Data Quality

The data quality issues in doing business with private consumers (business-to-consumer = B2C) and doing business with other business’s (business-to-business = B2B) have a lot of similar challenges but also differs in a lot of ways.

Some of my experiences (and thoughts) related to different master data domains are:

Customer master data

In B2C the number of customers, prospects and leads is usually high and characterized by relatively few interactions with each entity.  In B2B you usually have a relatively small number of customers with a high number of interactions.

One of the most automated activities in data quality improvement is matching master data records with information about customers. Many of the examples we see in marketing material, research documents, blog posts and so on is about matching in the B2C realm. This is natural since the high number of records typically with a low attached value calls for automation.

Data matching in the B2B realm is indeed more complex due to numerous challenges like less standardized names of companies and typically more options in what constitutes a single customer. The high value attached to each customer also makes the risk of mistakes a showstopper for too much automation.

So in B2B we see an increasing adaption of creating workflows that insures data quality during data capture often by exploiting external reference data which also in general are more available related to business entities.

Location master data

The location of B2C customers means a lot. Accurate and timely delivery addresses for everything from direct mails to bringing goods to the premises are essential. Location data are used to recognize household relations, assigning demographic stereotypes and in many cases calculating fees of different kind. I had a near disaster experience with a really bad address in my early career.

Even though location data for B2B activities theoretically is just as important, I have often seen that a little less precision is fit for purpose or anyway lower prioritized than more pressing issues.

Product master data

Theoretically there should be no difference between B2C and B2B here, but I guess there is in practice?

The most interesting aspect is probably the multi-domain aspect examining the relations between customers and products.   

I had some experiences some years ago with the B2B realm as described in the post What is Multi-Domain MDM?: 1,000 B2B customers buying 1,000 different finished products can be a quite complicated data quality operation.

Within the B2C realm the most predominant multi-domain data quality issues I have met is related to analytics. As discussed in the post Customer/Product Matrix Management it is about typifying your customers correctly and categorizing your products adequately at the same time.

Bookmark and Share

Single Company View

Getting a single customer view in business-to-business (B2B) operations isn’t straight forward. Besides all the fuzz about agreeing on a common definition of a customer within each enterprise usually revolving around fitting multiple purposes of use, we also have complexities in real world alignment.

One Number Utopia

Back in the 80’s I worked as a secretary for the committee that prepared a single registry for companies in Denmark. This practice has been live for many years now.

But in most other countries there are several different public registries for companies resulting in multiple numbering systems.

Within the European Union there is a common registry embracing VAT numbers from all member states. The standard format is the two letter ISO country code followed by the different formatted VAT number in each country – some with both digits and letters.

The DUNS-number used by Dun & Bradstreet is the closest we get to a world-wide unique company numbering system.  

2-Tier Reality

The common structure of a company is that you have a legal entity occupying one or several addresses.

The French company numbering system is a good example of how this is modeled. You have two numbers:

  • SIREN is a 9-digit number for each legal entity (on the head quarter address).
  • SIRET is a 14-digit (9 + 5) number for each business location.

This model is good for companies with several locations but strange for single location companies.

Treacherous Family Trees (and Restaurants)

The need for hierarchy management is obvious when it comes to handling data about customers that belongs to a global enterprise.

Company family trees are useful but treacherous. A mother and a daughter may be very close connected with lots of shared services or it may be a strictly matter of ownership with no operational ties at all.

Take McDonald’s as a not perfectly simple (nor simply perfect) example. A McDonald’s restaurant is operated by a franchisee, an affiliate, or the corporation itself. I’m lovin’ modeling it.

Bookmark and Share

Single Business Partner View

If you search in google for “single customer view” you’ll get over 20,000 hits. If you search for “single business partner view” you’ll get zero – until I just posted this blog post.

Some time ago I wrote about getting a 360° Business Partner View elaborating on extending the 360° Customer View or Single Customer View (SVC) to embrace all sorts of party master data managed within the organization.

In fact there is at least the same amount of similar techniques used between

  • managing supplier master data and business-to business (B2B) customer master data

as there is between

  • managing business-to-business (B2B) customer master data and business-to-consumer (B2C) customer master data.

If you look at Customer Relation Management (CRM) systems almost every package is aimed at managing B2B data as the data model and the functionality supports real world B2B structures and how the sales force and other employees interacts with B2B customers and prospects.

Interacting with B2C customers and prospects is much more diverse and often supported by operational systems specialized for the industry in question like solutions for financial services, healthcare and so on.

A business partner is a party acting in the role as customer, prospect, supplier, reseller, distributor, agent and other forms of partnership. Sometimes the same party is acting in several roles at the same time thus potentially being both on the Sell–side and Buy-side of Master Data Quality management.

As sell side and buy side has intersections within party master data, in some industries we may also go deeper into identity resolution and find intersections between B2B entities and B2C entities. I’ve described these matters in the post So, how about SOHO homes. The business case is that some products in some industries are aimed at the households of business owners and the small businesses at the same time. This is for example true for industries as banking, insurance, telco, real estate and  law.

All in all achieving a single view of business partners is a task going beyond traditional customer data integration (CDI) and stretching into areas traditionally belonging to Product Information Management (PIM). This is a business case for multi-domain master data management.

Bookmark and Share