Data Quality: The Movie

Learning from courses, books, articles and so on is good – but sometimes a bit like watching a movie and then realizing that the real world – especially your world – isn’t exactly as in the movie.

Examples:

The parking experience:

The movie: You are going to visit someone in a huge building in the centre of a large city. You take your car to the front of the building and smoothly place the car on the free parking spot next to the main entrance.

Real life: You drive round and round for ages until finally you find a free parking spot hardly in walking distance from your destination.

My life: I have during my 30 years in the IT business visited a lot of companies and spent time in the IT departments. Nobody does everything by the book. Not even close.

Maybe large companies within financial services are those who in my experience are within some distance of doing something by the book. This is probably because most books about IT seem to be written by folks who had their experiences from working in large financial service businesses.

(And no, I have absolutely no documentation on that. It is just a gut feeling).

Hitting them hard:

The movie: You are a good guy observing a bad guy harassing a good looking girl. You engage the bad guy in an intense fist fight, you are hit over and over again, but in the end you win. The good looking girl thanks you by kissing your beautiful face.

Real life: Well, you may win the fight. But after that you have to go the hospital and have them fix your face – and during the following month any girl can’t look at you without feeling very bad.

My life: Recently I was involved in a data management project aimed at producing some new business intelligence results. Executive sponsorship was no problem, the CEO was the initiator. Objectives were pretty clear. High level business requirements were well known and not to forget, everyone was fully aware of the impact from data quality. The only issue was the absence of more concrete detailed requirements and business rules for reporting. And of course a political settled deadline.

Facing the business rule issue we took a data centric and test driven approach. We produced incremental results, verified test cases, negotiated business rules based on real data examples and in the end a first report came out. The result was far from expected in the sense that the numbers was expected to be different. We dived into data again, found an unexpected data quality issue, corrected accordingly. The result was still far from expected. Based on a specific expected result we dived into a section of data, made detailed reports and compared to real world. In the end it turned out that the report was right, the gut feeling perception of the real world had been wrong for a long time.

Now that’s a winner, right? Well, the project is on hold now for political reasons and also the project has a bad name for going over budget and deadline.

Looking great:

The movie: Morning scene from the nuclear family. Mommy is looking really great (stylish hair, perfect face) while cooking and serving a nice breakfast and helping the kids doing some last minute homework at the same time.

Real life: I think you know.

My life: Actually I have learned that you don’t have to strive for perfection. With data quality; don’t expect you are able to fix everything and having all data fit for every purpose of use at any time.

Bookmark and Share

Aadhar (or Aadhaar)

The solution to the single most frequent data quality problem being party master data duplicates is actually very simple. Every person (and every legal entity) gets an unique identifier which is used everywhere by everyone.

Now India jumps the bandwagon and starts assigning a unique ID to the 1.2 billion people living in India. As I understand it the project has just been named Aadhar (or Aadhaar). Google translate tells me this word (आधार) means base or root – please correct if anyone knows better.

In Denmark we have had such an identifier (one for citizens and one for companies) for many years. It is not used by everyone everywhere – so you still are able to make money being a data quality professional specializing in data matching.

The main reason that the unique citizen identifier is not used all over is of course privacy considerations. As for the unique company identifier the reason is that data quality often are defined as fit for immediate purpose of use.

Bookmark and Share

A user experience

As a data quality professional it is a learning experience when you are the user.

During the last years I have worked for a data quality tool vendor with headquarter in Germany. As part of the role of serving partners, prospects and customers in Scandinavia I have been a CRM system user. As a tool vendor own medicine has been taken which includes intelligent real time duplicate check, postal address correction, fuzzy search and other goodies built into the CRM system.

Sounds perfect? Sure, if it wasn’t for a few diversity glitches.

The address doesn’t exist

Postal correction is only activated for Germany. This actually makes some sense since most activity is in Germany and postal correction is not that important in Scandinavia as company (and citizen) information is more available and then usually a better choice. Due to a less fortunate setup during the first years  my routine when inserting a new account was to pick correct data from a business directory, paste into the CRM system and then angry override the warning that the address doesn’t exist (in Germany).

Dear worshipful Mr Doctor Oetker

In Germany salutation is paramount. In Scandinavia it is not common to use a prefixed salutation anymore – and if you do, you are regarded as very old fashioned. So having the salutation field for a contact as mandatory is an annoyance and setting up an automated salutation generation mechanism is a complete waste of time.

Bookmark and Share

Data Quality and World Food

I have touched the analogy between food (quality) and data (quality) several times before for example in the posts “Bon Appétit” and “Under New Master Data Management”.

Why not continue down that road?

Let’s have a look at some local food that has become popular around the world.

寿司

Imagine you go to a restaurant where you order a fish dish. When starting to consume your dinner you realize that the fish hasn’t been boiled, fried or in any other way exposed to heat. Then I guess it is perfectly normal to shout out: THE FISH IS RAW – and demanding apologies from the chef, the head waiter, Gordon Ramsey or anyone else in charge. Unless of course if you are in a sushi restaurant where the famous Japanese dish that may include raw fish is prepared.

Köttbullar

Köttbullar is the Swedish word for meatballs. This had rightfully stayed as a fact only known to Swedes if it wasn’t for cheap furniture sold around the world by IKEA. By reasons still unclear to me IKEA has chosen to serve Köttbullar in the store cafeterias and even sell the stuff along with the particle board furniture on their e-commerce sites.

Pizza

Italian originated dish usually brought to you by someone on a bike or in extreme cases in a very old car.

McChicken

Selling food of different kind in the form as a burger works in the United States – and by reasons that I can’t explain even in France.

Data Quality analogies

Well, let’s just say that data quality tools and services:

  • May be regarded very different around the world,
  • Usually are sold along with tools and services made for something completely different,
  • Are brought to you in various ways by local vendors and
  • By reasons I can’t explain often are made for use in the United States (no other pun intended but pure admiration of execution).

Bon appétit.

Bookmark and Share

Merging Customer Master Data

One of the most frequent assignments I have had within data matching is merging customer databases after two companies have been merged.

This is one of the occasions where it doesn’t help saying the usual data quality mantras like:

  • Prevention and root cause analysis is a better option
  • Change management is a critical factor in ensuring long-term data quality success
  • Tools are not important

It is often essential for the new merged company to have a 360 degree view of business partners as soon as possible in order to maximize synergies from the merger. If the volumes are above just a few thousand entities it is not possible to obtain that using human resources alone. Automated matching is the only realistic option.

The types of entities to be matched may be:

  • Private customers – individuals and households (B2C)
  • Business customers (B2B) on account level, enterprises, legal entities and branches
  • Contacts for these accounts

I have developed a slightly extended version of this typification here.

One of the most common challenges in merging customer databases is that hierarchy management may have been done very different in the past within the merging bodies. When aligning different perceptions I have found that a real world approach often fulfils the different reasoning.

The fuzziness needed for the matching is basically dependent on the common unique keys available in the two databases. These are keys as citizen ID’s (whatever labeled around the world) and public company ID’s (the same applies). Matching both databases with an external source (per entity type) is an option. “Duns Numbering” is probably the most common known type of such an approach. Maintaining a solution for assigning Duns Numbers to customer files from the D&B WorldBase is by the way one of my other assignments as described here.

The automated matching process may be divided into these three steps:

During my many years of practice in doing this I have found that the result from the automated process may vary considerable in quality and speed depending on the tools used.

Bookmark and Share

What a Happy Day

I have got a lot of good news today.

First a Nigerian gentleman wants to deposit 40.5 M $ on my bank account and 35 % is for me to keep. Wow.

Next it seems that data quality improvement and master data management is not about technology at all. Practically you only need smart people doing smart processes. Wow.

Nobody actually needs my assistance that much and soon I will have plenty of money in the bank.

Data Quality from the Cloud

One of my favorite data quality bloggers Jim Harris wrote a blog post this weekend called “Data, data everywhere, but where is data quality?

I believe in that data quality will be found in the cloud (not the current ash cloud, but to put it plainer: on the internet). Many of the data quality issues I encounter in my daily work with clients and partners is caused by that adequate information isn’t available at data entry – or isn’t exploited. But information needed will in most cases already exist somewhere in the cloud. The challenge ahead is how to integrate available information in the cloud into business processes.

Use of external reference data to ensure data quality is not new. Especially in Scandinavia where I live, this has been in use for long because of the tradition with public sector recording data about addresses, citizens, companies and so on far more intensely than done in the rest of the world.  The Achilles Heel though has always been how to smoothly integrate external data into data entry functionality and other data capture processes and not to forget, how to ensure ongoing maintenance in order to avoid else inevitable erosion of data quality.

The drivers for increased exploitation of external data are mainly:

  • Accessibility, which is where the fast growing (semantic) information store in the cloud helps – not at least backed up by the world wide tendency of governments releasing public sector data
  • Interoperability where increased supply of Service Orientated Architecture (SOA) components will pave the way
  • Cost; the more subscribers to a certain source, the lower the price – plus many sources will simply be free

As said, smoothly integration into business processes is key – or sometimes even better, orchestrating business processes in a new way so that available and affordable information (from the cloud) is pulled into these business processes using only a minimum of costly on premise human resources.

Bookmark and Share

My Ash Cloud Prediction

The Master Data Management Summit Europe 2010 starts tomorrow. I have attended the IRM events in London several times (and also spoken there once). This year I didn’t plan to go to London in April because I predicted the no fly havoc in Northern Europe that would follow the Iceland volcanic eruption given the wind direction. Not?

Royal Exceptions

I am not a royalist, but anyway: Today 16th April 2010 is the 70 years birthday of Queen Margrethe II of Denmark. Congratulations Your Majesty.

Having a queen (or king) and a royal family is a good example of that there are always exceptions. As a matter related to data quality: I would say that every person in our country has a first (given) name and a last (family) name. But the royal family hasn’t a last name – only they have some first names like those of Her Majesty being Margrethe Alexandrine Þórhildur Ingrid. (By the way: The third name is actually Icelandic; I guess that explains the ash cloud sent as a greeting from there.)

There are always exceptions. We may define data quality validation rules from here to doomsday – there will always be exceptions. We may write down business rules from now to eternity – tomorrow you will encounter the first exception. Data quality (and democracy) is never perfect – but it’s worth striving for.