Data Quality: The Movie

Learning from courses, books, articles and so on is good – but sometimes a bit like watching a movie and then realizing that the real world – especially your world – isn’t exactly as in the movie.

Examples:

The parking experience:

The movie: You are going to visit someone in a huge building in the centre of a large city. You take your car to the front of the building and smoothly place the car on the free parking spot next to the main entrance.

Real life: You drive round and round for ages until finally you find a free parking spot hardly in walking distance from your destination.

My life: I have during my 30 years in the IT business visited a lot of companies and spent time in the IT departments. Nobody does everything by the book. Not even close.

Maybe large companies within financial services are those who in my experience are within some distance of doing something by the book. This is probably because most books about IT seem to be written by folks who had their experiences from working in large financial service businesses.

(And no, I have absolutely no documentation on that. It is just a gut feeling).

Hitting them hard:

The movie: You are a good guy observing a bad guy harassing a good looking girl. You engage the bad guy in an intense fist fight, you are hit over and over again, but in the end you win. The good looking girl thanks you by kissing your beautiful face.

Real life: Well, you may win the fight. But after that you have to go the hospital and have them fix your face – and during the following month any girl can’t look at you without feeling very bad.

My life: Recently I was involved in a data management project aimed at producing some new business intelligence results. Executive sponsorship was no problem, the CEO was the initiator. Objectives were pretty clear. High level business requirements were well known and not to forget, everyone was fully aware of the impact from data quality. The only issue was the absence of more concrete detailed requirements and business rules for reporting. And of course a political settled deadline.

Facing the business rule issue we took a data centric and test driven approach. We produced incremental results, verified test cases, negotiated business rules based on real data examples and in the end a first report came out. The result was far from expected in the sense that the numbers was expected to be different. We dived into data again, found an unexpected data quality issue, corrected accordingly. The result was still far from expected. Based on a specific expected result we dived into a section of data, made detailed reports and compared to real world. In the end it turned out that the report was right, the gut feeling perception of the real world had been wrong for a long time.

Now that’s a winner, right? Well, the project is on hold now for political reasons and also the project has a bad name for going over budget and deadline.

Looking great:

The movie: Morning scene from the nuclear family. Mommy is looking really great (stylish hair, perfect face) while cooking and serving a nice breakfast and helping the kids doing some last minute homework at the same time.

Real life: I think you know.

My life: Actually I have learned that you don’t have to strive for perfection. With data quality; don’t expect you are able to fix everything and having all data fit for every purpose of use at any time.

Bookmark and Share

Data Quality and World Food

I have touched the analogy between food (quality) and data (quality) several times before for example in the posts “Bon Appétit” and “Under New Master Data Management”.

Why not continue down that road?

Let’s have a look at some local food that has become popular around the world.

寿司

Imagine you go to a restaurant where you order a fish dish. When starting to consume your dinner you realize that the fish hasn’t been boiled, fried or in any other way exposed to heat. Then I guess it is perfectly normal to shout out: THE FISH IS RAW – and demanding apologies from the chef, the head waiter, Gordon Ramsey or anyone else in charge. Unless of course if you are in a sushi restaurant where the famous Japanese dish that may include raw fish is prepared.

Köttbullar

Köttbullar is the Swedish word for meatballs. This had rightfully stayed as a fact only known to Swedes if it wasn’t for cheap furniture sold around the world by IKEA. By reasons still unclear to me IKEA has chosen to serve Köttbullar in the store cafeterias and even sell the stuff along with the particle board furniture on their e-commerce sites.

Pizza

Italian originated dish usually brought to you by someone on a bike or in extreme cases in a very old car.

McChicken

Selling food of different kind in the form as a burger works in the United States – and by reasons that I can’t explain even in France.

Data Quality analogies

Well, let’s just say that data quality tools and services:

  • May be regarded very different around the world,
  • Usually are sold along with tools and services made for something completely different,
  • Are brought to you in various ways by local vendors and
  • By reasons I can’t explain often are made for use in the United States (no other pun intended but pure admiration of execution).

Bon appétit.

Bookmark and Share

What a Happy Day

I have got a lot of good news today.

First a Nigerian gentleman wants to deposit 40.5 M $ on my bank account and 35 % is for me to keep. Wow.

Next it seems that data quality improvement and master data management is not about technology at all. Practically you only need smart people doing smart processes. Wow.

Nobody actually needs my assistance that much and soon I will have plenty of money in the bank.

My Ash Cloud Prediction

The Master Data Management Summit Europe 2010 starts tomorrow. I have attended the IRM events in London several times (and also spoken there once). This year I didn’t plan to go to London in April because I predicted the no fly havoc in Northern Europe that would follow the Iceland volcanic eruption given the wind direction. Not?

Reduplication: The next big thing?

Today I got a very exciting Master Data Management assignment. Usually I do deduplication processes which means that two or more rows in a database are merged into one golden record because the original rows represents the same real world entity.

But in this case we are going to split one row into several rows with random keys (a so called MNUID = Messy Non-Unique IDentifier). Also names and addresses have to be misspelled in different ways so they are not easily recognized as being the same.

My client, the Danish Tax Authorities, has for years tried to develop methods for taxation above 100% and has finally reached this simple but very efficient method. Until now you as one person or one company pay up to 60% tax, but now each duplicate row will pay 60%. Hereby in phase one you may in fact pay 120%, but in later phases this will be extended to larger duplicate groups paying much higher percentages.

Already some foreign tax authorities have shown deep interest in this model (called Intelligent Reduplication for Supertaxation). First of all our Scandinavian neighbors are very interested, but eventually it may spread to the rest of the world.

Bookmark and Share

Santa Quality

On the 3rd of December I feel inspired to relate some data quality issues to Mr. Santa Claus – or what is exactly the name. Is it:

  • Saint Nicholas or
  • Père Noël as they say in French or
  • Weihnachtsmann as they say in German or
  • Julemand as we say in Denmark or
  • Plenty of other local names?

Santa Claus versus Saint Nicholas is an example of the use of nicknames which is a main issue in name matching in many cultures.

It’s also important to observe that the German and Danish name is one word versus two words in English and French. Many company names and other names in respective languages shares the same linguistic characteristic.

Father Christmas is an alternative identification maybe more being a job title.

Another question is where he lives.

The North Pole is acknowledged as the correct geographical address in Anglo countries – but there seems to be alternative mailing possibilities as:

  • Santa Claus, North Pole, Canada, HOH OHO
  • Father Christmas, North Pole, SAN TA1 (UK)

However the Finish claims the valid address to be:

In my home country Denmark we will accept nothing but:

  • Julemanden, Box 1615, 3900 Nuuk, Greenland

Finally I could imagine which data quality issues the Santa business has to face:

  • Too many duplicates on the “nice list” leading to heavy overhead in gift spending as well as extra costs in reindeer management.
  • Inaccurate product masters resulting in complaints from nice boys and girls and a lot of scrap and rework.
  • Fraud entries from children already on the ‘naughty list’ may be a challenge.
  • A lot of missing chimney positions may cause severe delivery problems.

But then, why should Santa be smarter than everyone else?

Bookmark and Share