Data Quality Evangelism

There is a famous painting by Leonardo da Vinci with Jesus and the Twelve Apostles having The Last Supper:

Now, most classic historical paintings have anachronisms. In The Last Supper there are oranges on the table, which is strange, since oranges weren’t known in EMEA in the 1st century.  

But, anachronisms aside:

Q: Isn’t it also strange that everyone is on one side of the table?

A: Not at all. It’s like with data quality evangelism: Everyone is on the IT side. The business side has other things to do.

Bookmark and Share

Single Customer Hierarchy View

One of the things I do over and over again as part of my work is data matching.

There is a clear tendency that the goal of the data matching efforts increasingly is a master data consolidation taking place before the launch of a master data management (MDM) solution. Such a goal makes the data matching requirements considerably more complex than if the goal is a one-shot deduplication before a direct marketing campaign.

Hierarchy Management

In the post Fuzzy Hierarchy Management I described how requirements for multiple purposes of use of customer master data makes the terms false positive and false negative fuzzy.

As I like to think of a customer as a party role there are essentially two kinds of hierarchies to be aware of:

  • The hierarchies the involved party is belonging to in the real world. This is for example an individual person seen as belonging to a household or a company belonging at a place in a company family tree.
  • The hierarchies of customer roles as seen in different business functions and by different departments. For example two billing entities may belong to the same account in a CRM system in one example, but in another example two CRM accounts have the same billing entity. 

The first type of hierarchy shouldn’t be seen differently between enterprises. You should reach the very same result in data matching regardless of what your organization is doing. It may however be true that your business rules and the regularity requirements applying to your industry and geography may narrow down the need for exploration.

In the latter case we must of course examine the purpose of use for the customer master data within the organization.

Single Customer View

It is in my experience much easier to solve the second case when the first case is solved. This approach was evaluated in the post Lean MDM.

The same approach also applies to continuous data quality prevention as part of a MDM solution. Aligning with the real world and it’s hierarchies as part of the data capture makes solving the customer roles as seen in different business functions and by different departments much easier.  The benefits of doing this is explained in the post instant Data Quality.

It is often said that a “single customer view” is an illusion. I guess it is. First of all the term “single customer view” is a vision, but a vision worth striving at. Secondly customers come in hierarchies. Managing and reflecting these hierarchies is a very important aspect of master data management. Therefore a “single customer view” often ends up as having a “single customer hierarchy view”.    

Bookmark and Share

Hit by an Outlier

Yesterday something weird happened on this blog. Usually I’m pleased to have between 100 and 250 so called page views on workdays. But yesterday there were 751. This Saturday morning everything is back to normal again:

I have no clue about who visited and why. I didn’t write anything very clever yesterday. Most views were on the home page. The count of referrers indicates a quiet day in the office:

 

Also the search terms counter doesn’t help:

Well, I guess I just have to consider this an outlier, being an observation that appears to deviate markedly from other members of the sample in which it occurs.

That is anyway my gut feeling without performing the Grubb’s test for outliers

Bookmark and Share

Some Flyover Information

My Follow Friday World Tour stop today was at some Flyover States, being states in the United States bicoastal people only see from above when flying over them going from coast to coast.

If I were to fly from (A) Copenhagen to (B) Los Angeles one should, by looking at a traditional flat world map, think that the flight also would pass over these inland states.

But the world isn’t flat. The shortest route for an east to west flight will tend to follow the so called great circle being a much more northerly swing.  

However, this isn’t the shortest route either. The polar route, being flying over the North Pole, is the shortcut in the real round world. Actually the Copenhagen (CPH) to Los Angeles (LAX) connection established in 1954 was the world’s first commercial polar route.

I find great analogies in looking at a map and solving data and information quality issues like in the post Sharing data is key to a single version of the truth which was a blog-bout with a UK guy and a Flyover guy.

Bookmark and Share

Some Deduplication Tactics

When doing the data quality kind of deduplication you will often have two kinds of data matching involved:

  • Data matching in order to find duplicates internally in your master data, most often your customer database
  • Data matching in order to align your master data with an external registry

As the latter activity also helps with finding the internal duplicates, a good question is in which order to do these two activities.

External identifiers

If we for example look at business-to-business (B2B) customer master data it is possible to match against a business directory. Some choices are:

  • If you have mostly domestic data in a country with a public company registration you can obtain a national ID from matching with a business directory based on such a registry. An example will be the French SIREN/SIRET identifiers as mentioned in the post Single Company View.
  • Some registries cover a range of countries. An example is the EuroContactPool where each business entity is identified with a Site ID.
  • The Dun & Bradstreet WorldBase covers the whole world by identifying approximately 200 million active and dissolved business entities with a DUNS-number. The DUNS-number also serves as a privatized national ID for companies in the United States.

If you start with matching your B2B customers against such a registry, you will get a unique identifier that can be attached to your internal customer master data records which will make a succeeding internal deduplication a no-brainer.

Common matching issues

A problem is however is that you seldom get a 100 % hit rate in a business directory matching, often not even close as examined in the post 3 out of 10.

Another issue is the commercial implications. Business directory matching is often performed as an external service priced per record. Therefore you may save money by merging the duplicates before passing on to external matching. And even if everything is done internally, removing the duplicates before directory matching will save process load.

However a common pitfall is that an internal deduplication may merge two similar records that actually are represented by two different entities in the business directory (and the real world).

So, as many things data matching, the answer to the sequence question is often: Both.

A good process sequence may be this one:

  1. An internal deduplication with very tight settings
  2. A match against an external registry
  3. An internal deduplication exploiting external identifiers and having more loose settings for similarities not involving an external identifier

Bookmark and Share

Lean Social MDM

I have previously written some blog posts about “Social MDM” using the term “Social MDM” to describe the trend of having social media (master) data as a new complexity on top of the already known conundrum of mastering traditional master data.

Stephan Zoder of IBM Initiate discussed this topic in a recent post called CMM is Actually High-Frequency, Social MDM (where CMM is about Customer Motivation Management).

As I also briefly examined the term “Lean MDM” last week I wonder if it is possible to start embracing social media (master) data under a term as “Lean Social MDM”.

The lean MDM post included an actual real life project I have been involved in, which was about how the car rental giant Avis achieved lean MDM for the Scandinavian business.

An underlying business case for this project was that many decisions about car rental is made by individual persons who may act as an employee at (changing) employers and as private renters. Therefore the emphasis of the master data management was at the person in contact, user and private roles.

Having a “single person view” is in my eyes, if it wasn’t before, a good place to start your “Lean Social MDM” journey.

Bookmark and Share

The Right Mail Order

In the 70’s when I went to high school I also had a job on Saturdays as a postman.

I remember I had to be at the post office very early in the morning, which was hard after a Friday night out. As I wanted to be able to have a few extra hours of sleep before the Saturday Night Fever, it was crucial to get the job done as fast as possible

The first function in the daily process was hand sorting (these were old times) the letters for my route. First all letters was sorted into streets, and then each street was sorted by the house number (I lived in a fairly small town with short streets with mostly single family houses).

When sorting most streets I had two options:

  1. Have the even numbers sorted ascending followed by the odd numbers sorted descending. This was the easy way of sorting but the hard way of delivering later, as I had to move up and down the street.
  2. Sort the numbers in the order the houses was distributed along the street. This was the hard way of sorting as I had to remember the order as even (right side) and odd (left side) numbers wasn’t necessary distributed equally. But the delivery (if sorted properly) was easier, as I could move up the street in one pass and usually continue on the next street down.  

I feel so lucky I was a postman then and not today. A postman today will get the small number of physical letters we send these days sorted optimally by a geocode aware automated mechanism. No chance of learning sorting mechanisms the hard way.

Bookmark and Share

World Population Excluding Greenland?

According to a newly published paper called The population of the world (2011) we are now 6,987 million citizens on the planet Earth.

However something makes me wonder if they counted Greenland. It’s not that inclusion or exclusion of the 57,564 Greenlanders will rock the figure, but I think we should all be in there.

Greenland does cover a great deal of area on a world map as the big white island on top of the world, not at least when the projection makes areas close to the poles bigger than on a globe.

But is Greenland visible in the population statistics at all?

First I looked for Greenland in North America where Greenland belongs in a geophysical context.

 

Not there.

Then I looked for Greenland in Northern Europe where Greenland belongs in a political context.

 

Not there – or maybe there as part of (the Kingdom of) Denmark?

The population of Denmark is stated as 5.6 million citizens.

If I look up the Kingdom of Denmark on Wikipedia we have these numbers:

It’s a close call. If we round the numbers the 5.6 million citizens is without the North Atlantic dependencies and Greenland, and the Faroe Islands, isn’t anywhere else. And anyway the area clearly suggest that Greenland isn’t included as part of Denmark. So it could be a case of rounding or a case of timeliness – or most probably a case of incompleteness.

Maybe we have passed 7 billion people on earth already if someone else (also) is missing in the statistics.

Bookmark and Share

Lean MDM

With a discipline as master data management there will of course always be an agile or lean way of doing things.

What is lean MDM?

A document from 2008 called A LEAN APPROACH TO MASTER DATA MANAGEMENT by Duff Bailey examines the benefits of lean MDM.

The document has a view close to me saying that: “While there is little argument over what constitutes an individual person, many existing data models make the mistake of modeling “roles” (customer, employee, stock-holder, vendor contact, etc.) instead”.

As discussed in the article similar views can be made around organization entities, location entities and product entities.

In conclusion Duff says that: “Because of their universality and their abstract nature, these core data models can be established quickly, without the need for lengthy review that normally accompanies an enterprise data model. Thereafter, the focus of the lean data managemnent effort will be to grow the models and populate the repositories in support of specific business objectives”.

MDM in the high gear

The fast time-to-value for lean MDM was also emphasized by MDM guru Aaron Zornes in a tweet yesterday:

The mentioned LeanMDM offer from Omikron Data Quality (which is one of my employers) is described in the link (in German). A short resume of the text is that you among other things will get this from lean MDM:

  • An increase in the corporate value of customer data
  • Short project times and fast results
  • Lower implementation costs through service-oriented architecture (SOA)

I have been involved in one of the implementations of the LeanMDM concept as described in this article (in English) about how the car rental giant Avis achieved lean MDM for the Scandinavian business.

Bookmark and Share

Unmaintainability

Following up on my post about word quality and inspired by a blog post by Joyce Norris-Montanari called “Things That Don’t Work So Well – Doing Analytics Before Their Time” in which the word “unmaintainable” is used I want to challenge my English spell checker even further with the rare and apparently not really existing word but frequent issue of unmaintainability.

I have previously on this blog pondered that you can’t expect that because you get it Right the First Time then everything will be just fine from this day forward. Things change.

This argument is about the data as plain data.

But there is also a maintainability (this is apparently a real word) issue around how we store data. I have many times conducted data quality exercises as deduplication and matching with and enriching from external reference data in order to reach a single version of the truth as far as it goes.

An often encountered problem is that this kind of data processing can get us somewhere close to a single version of the truth. But then there is a huge obstacle: You can’t get these great results back to the daily databases without destroying some of the correctness because the data structures don’t allow you to do that.

Such kind of unmaintainability is in my eyes a good argument for looking into master data management platforms that allows you to maintain your master data in the complexity that supports the business rules that make your company more competitive.

Bookmark and Share