Data Matching – Page 14 – Liliendahl on Data Quality

Single Customer Hierarchy View

28th August 201129th August 2011Henrik Gabs LiliendahlLeave a comment

One of the things I do over and over again as part of my work is data matching.

There is a clear tendency that the goal of the data matching efforts increasingly is a master data consolidation taking place before the launch of a master data management (MDM) solution. Such a goal makes the data matching requirements considerably more complex than if the goal is a one-shot deduplication before a direct marketing campaign.

Hierarchy Management

In the post Fuzzy Hierarchy Management I described how requirements for multiple purposes of use of customer master data makes the terms false positive and false negative fuzzy.

As I like to think of a customer as a party role there are essentially two kinds of hierarchies to be aware of:

The hierarchies the involved party is belonging to in the real world. This is for example an individual person seen as belonging to a household or a company belonging at a place in a company family tree.
The hierarchies of customer roles as seen in different business functions and by different departments. For example two billing entities may belong to the same account in a CRM system in one example, but in another example two CRM accounts have the same billing entity.

The first type of hierarchy shouldn’t be seen differently between enterprises. You should reach the very same result in data matching regardless of what your organization is doing. It may however be true that your business rules and the regularity requirements applying to your industry and geography may narrow down the need for exploration.

In the latter case we must of course examine the purpose of use for the customer master data within the organization.

Single Customer View

It is in my experience much easier to solve the second case when the first case is solved. This approach was evaluated in the post Lean MDM.

The same approach also applies to continuous data quality prevention as part of a MDM solution. Aligning with the real world and it’s hierarchies as part of the data capture makes solving the customer roles as seen in different business functions and by different departments much easier. The benefits of doing this is explained in the post instant Data Quality.

It is often said that a “single customer view” is an illusion. I guess it is. First of all the term “single customer view” is a vision, but a vision worth striving at. Secondly customers come in hierarchies. Managing and reflecting these hierarchies is a very important aspect of master data management. Therefore a “single customer view” often ends up as having a “single customer hierarchy view”.

Some Deduplication Tactics

24th August 2011Henrik Gabs Liliendahl2 Comments

When doing the data quality kind of deduplication you will often have two kinds of data matching involved:

Data matching in order to find duplicates internally in your master data, most often your customer database
Data matching in order to align your master data with an external registry

As the latter activity also helps with finding the internal duplicates, a good question is in which order to do these two activities.

External identifiers

If we for example look at business-to-business (B2B) customer master data it is possible to match against a business directory. Some choices are:

If you have mostly domestic data in a country with a public company registration you can obtain a national ID from matching with a business directory based on such a registry. An example will be the French SIREN/SIRET identifiers as mentioned in the post Single Company View.
Some registries cover a range of countries. An example is the EuroContactPool where each business entity is identified with a Site ID.
The Dun & Bradstreet WorldBase covers the whole world by identifying approximately 200 million active and dissolved business entities with a DUNS-number. The DUNS-number also serves as a privatized national ID for companies in the United States.

If you start with matching your B2B customers against such a registry, you will get a unique identifier that can be attached to your internal customer master data records which will make a succeeding internal deduplication a no-brainer.

Common matching issues

A problem is however is that you seldom get a 100 % hit rate in a business directory matching, often not even close as examined in the post 3 out of 10.

Another issue is the commercial implications. Business directory matching is often performed as an external service priced per record. Therefore you may save money by merging the duplicates before passing on to external matching. And even if everything is done internally, removing the duplicates before directory matching will save process load.

However a common pitfall is that an internal deduplication may merge two similar records that actually are represented by two different entities in the business directory (and the real world).

So, as many things data matching, the answer to the sequence question is often: Both.

A good process sequence may be this one:

An internal deduplication with very tight settings
A match against an external registry
An internal deduplication exploiting external identifiers and having more loose settings for similarities not involving an external identifier

Lean MDM

17th August 201127th March 2012Henrik Gabs LiliendahlLeave a comment

With a discipline as master data management there will of course always be an agile or lean way of doing things.

What is lean MDM?

A document from 2008 called A LEAN APPROACH TO MASTER DATA MANAGEMENT by Duff Bailey examines the benefits of lean MDM.

The document has a view close to me saying that: “While there is little argument over what constitutes an individual person, many existing data models make the mistake of modeling “roles” (customer, employee, stock-holder, vendor contact, etc.) instead”.

As discussed in the article similar views can be made around organization entities, location entities and product entities.

In conclusion Duff says that: “Because of their universality and their abstract nature, these core data models can be established quickly, without the need for lengthy review that normally accompanies an enterprise data model. Thereafter, the focus of the lean data managemnent effort will be to grow the models and populate the repositories in support of specific business objectives”.

MDM in the high gear

The fast time-to-value for lean MDM was also emphasized by MDM guru Aaron Zornes in a tweet yesterday:

The mentioned LeanMDM offer from Omikron Data Quality (which is one of my employers) is described in the link (in German). A short resume of the text is that you among other things will get this from lean MDM:

An increase in the corporate value of customer data
Short project times and fast results
Lower implementation costs through service-oriented architecture (SOA)

I have been involved in one of the implementations of the LeanMDM concept as described in this article (in English) about how the car rental giant Avis achieved lean MDM for the Scandinavian business.

The 20 Million Rupees Question

11th August 201111th August 2011Henrik Gabs Liliendahl4 Comments

Here we go again. The same old question: “What is the definition of customer?” Latest Informatica (a data quality, master data management and data integration firm) has hired David Loshin to find out – started in the blog post The Most Dangerous Question to Ask Data Professionals.

Shortly, my take is that this question in practice has two major implications for data quality and master data management but in theory, it should only have one:

The first one is real world alignment. In theory real world alignment is independent of the definition of a customer as it is about the party behind the customer.
The second is party roles. It’s actually here we can have an endless discussion.

In practice we of course mix things up as discussed in the post Entity Revolution vs Entity Evolution.

And Now for Something Completely Different

Instead of saying that “What is the definition of customer?” is the million dollar question it’s probably more like the 20 million rupees question as most data management these days are taking place in India.

The amount of money involved is taken from the film Slumdog Millionaire where 20 million rupees is the top prize in the local “Who Wants to Be a Millionaire?” (Kaun Banega Crorepati), which by the way has the same jingle and graphics as all over the world.

And oh, how much is 20 million rupees? It’s near ½ million US dollars or 300.000 euro (with a dot as thousand separator). But a lot in buying power for a local customer. Exactly 2 crores (2,00,00,000 rupees).

Party on.

Psychographic Data Quality

5th July 201112th July 2011Henrik Gabs LiliendahlLeave a comment

I have just read an article on Mashable by Jamie Beckland called The End of Demographics: How Marketers Are Going Deeper With Personal Data.

The article explains how new sources of available data makes it possible for marketers to get a much closer look at potential customers and thereby going from delivering a broad message to a huge crowd to delivering a very targeted message to a small group of people with a high probability of getting a response. In short: Marketers are going from demographic marketing to psychographic marketing.

I believe this is true and ongoing (as I have also been involved in such activities).

The data quality issues we have always known in direct marketing is surely very similar in the psychographic marketing which is going on in the social media realm and in connection with eBusiness.

In my eyes, the concept of a single customer view is also a key to getting success in psychographic marketing.

You are not delivering a targeted message if you are delivering two different messages to two user profiles belonging to the same real world individual.

Your message will be very frustrating if you treat someone as a prospect customer if that someone already is an existing customer perhaps in another channel.

The effectiveness of psychographic marketing depends on a match between the psychographic variables, the behavioral variables and the demographic variables. As seen in the example in the Mashable article a good old thing as geocoding will be needed here.

An exciting thing in the rise of psychographic marketing is that it will add to the trend in data quality technology where it’s much more than simple name and address cleansing and deduplication. Rich location data will despite the virtual playground be further important. The relations between customers and products as described in the post Customer Product Matrix Management will be further refined in psychographic marketing.

B2C versus B2B Data Quality

8th June 2011Henrik Gabs Liliendahl8 Comments

The data quality issues in doing business with private consumers (business-to-consumer = B2C) and doing business with other business’s (business-to-business = B2B) have a lot of similar challenges but also differs in a lot of ways.

Some of my experiences (and thoughts) related to different master data domains are:

Customer master data

In B2C the number of customers, prospects and leads is usually high and characterized by relatively few interactions with each entity. In B2B you usually have a relatively small number of customers with a high number of interactions.

One of the most automated activities in data quality improvement is matching master data records with information about customers. Many of the examples we see in marketing material, research documents, blog posts and so on is about matching in the B2C realm. This is natural since the high number of records typically with a low attached value calls for automation.

Data matching in the B2B realm is indeed more complex due to numerous challenges like less standardized names of companies and typically more options in what constitutes a single customer. The high value attached to each customer also makes the risk of mistakes a showstopper for too much automation.

So in B2B we see an increasing adaption of creating workflows that insures data quality during data capture often by exploiting external reference data which also in general are more available related to business entities.

Location master data

The location of B2C customers means a lot. Accurate and timely delivery addresses for everything from direct mails to bringing goods to the premises are essential. Location data are used to recognize household relations, assigning demographic stereotypes and in many cases calculating fees of different kind. I had a near disaster experience with a really bad address in my early career.

Even though location data for B2B activities theoretically is just as important, I have often seen that a little less precision is fit for purpose or anyway lower prioritized than more pressing issues.

Product master data

Theoretically there should be no difference between B2C and B2B here, but I guess there is in practice?

The most interesting aspect is probably the multi-domain aspect examining the relations between customers and products.

I had some experiences some years ago with the B2B realm as described in the post What is Multi-Domain MDM?: 1,000 B2B customers buying 1,000 different finished products can be a quite complicated data quality operation.

Within the B2C realm the most predominant multi-domain data quality issues I have met is related to analytics. As discussed in the post Customer/Product Matrix Management it is about typifying your customers correctly and categorizing your products adequately at the same time.

We All Hate To Watch It

14th May 2011Henrik Gabs LiliendahlLeave a comment

Tonight the European Song Contest finale will be watched by over 100 million people, despite the fact that most people agree about that the songs aren’t that good.

The winner will be selected by summing up an equal number of votes from each country. Usually there are big differences in how countries votes. A trend is that some neighboring groups of countries like to vote for each other. Such groups include a “Balkan Block” and a “Viking Empire”.

It’s a bit like survivorship when merging matched data rows into a golden record in an enterprise master data hub. Maybe the winning data isn’t that good and several departments probably don’t like it at all.

So I see no reason why Denmark shouldn’t win tonight.

Compound Words

11th May 201111th May 2011Henrik Gabs Liliendahl5 Comments

When working with data quality and not at least data matching an ever recurring issue is compound words. We even have the issue when talking about terms related to data quality like is it called “meta data” or “metadata” and is it called “multi-domain MDM” or “multidomain MDM”. With MDM my spell checker likes the first option, but Gartner (the analyst firm) likes the last option.

In an international context the issue with compound words becomes much more frequent. In some languages like the other Germanic languages than English compound words are used much more. For example a street name as “Main Street” will be “Hauptstrasse” in German and “Hovedgade” in Danish.

If your first language has many compound words (like mine) you tend to use (and overuse) compound words even in English. I stumbled upon that when I was helping a family member looking for searching trends for “hair extensions”.

If you look at the regional interest in Google Insights the interest in “hair extensions” (figure 1) is big mostly in countries with English as first language while the interest in “hairextensions” (figure 2) is big mostly in countries having English as secondary or third language.

Single Company View

27th April 2011Henrik Gabs Liliendahl2 Comments

Getting a single customer view in business-to-business (B2B) operations isn’t straight forward. Besides all the fuzz about agreeing on a common definition of a customer within each enterprise usually revolving around fitting multiple purposes of use, we also have complexities in real world alignment.

One Number Utopia

Back in the 80’s I worked as a secretary for the committee that prepared a single registry for companies in Denmark. This practice has been live for many years now.

But in most other countries there are several different public registries for companies resulting in multiple numbering systems.

Within the European Union there is a common registry embracing VAT numbers from all member states. The standard format is the two letter ISO country code followed by the different formatted VAT number in each country – some with both digits and letters.

The DUNS-number used by Dun & Bradstreet is the closest we get to a world-wide unique company numbering system.

2-Tier Reality

The common structure of a company is that you have a legal entity occupying one or several addresses.

The French company numbering system is a good example of how this is modeled. You have two numbers:

SIREN is a 9-digit number for each legal entity (on the head quarter address).
SIRET is a 14-digit (9 + 5) number for each business location.

This model is good for companies with several locations but strange for single location companies.

Treacherous Family Trees (and Restaurants)

The need for hierarchy management is obvious when it comes to handling data about customers that belongs to a global enterprise.

Company family trees are useful but treacherous. A mother and a daughter may be very close connected with lots of shared services or it may be a strictly matter of ownership with no operational ties at all.

Take McDonald’s as a not perfectly simple (nor simply perfect) example. A McDonald’s restaurant is operated by a franchisee, an affiliate, or the corporation itself. I’m lovin’ modeling it.

Happy Uniqueness

27th March 20118th May 2012Henrik Gabs LiliendahlLeave a comment

When making the baseline for customer data in a new master data management hub you often involve heavy data matching in order to de-duplicate the current stock of customer master data, so you so to speak start with a cleansed duplicate free set of data.

I have been involved in such a process many times, and the result has never been free of duplicates. For two reasons:

Even with the best data matching tool and the best external reference data available you obviously can’t settle all real world alignments with the confidence needed and manual verification is costly and slowly.
In order to make data fit for the business purposes duplicates are required for a lot of good reasons.

Being able to store the full story from the result of the data matching efforts is what makes me, and the database, most happy.

The notion of a “golden record” is often not in fact a single record but a hierarchical structure that reflects both the real world entity as far as we can get and the instances of this real world entity in a form that are suitable for different business processes.

Some of the tricky constructions that exist in the real world and are usual suspects for multiple instances of the same real world entity are described in the blog posts:

The reasons for having business rules leading to multiple versions of the truth are discussed in the posts:

I’m looking forward to yet a party master data hub migration next week under the above conditions.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph