What’s best: Safe or sorry?

As I have now moved much closer to downtown I have now also changed my car accordingly, so two month ago I squeezed myself into a brand new city car, the Fiat Nuova Cinquecento.

(Un)fortunately the car dealer’s service department called the other day and said some part of the motor had to be replaced because there could be a problem with that part. The manufacturer must have calculated that it’s cheaper (and may be a better customer experience) to be proactive rather than being reactive and deal with the problem if it should occur with my car later.  

(Un)fortunately that’s not the way we usually do it with possible data problems. So, back to work again. Someone’s direct marketing data just crashed in the middle of a campaign.    

Bookmark and Share

Notes about the North Pole

This is the seventh post in a series of short blog posts focusing on data quality related to different countries around the world. However, today we will be at a place not belonging to any country (so far) and only reachable on foot because it is in the middle of an ocean covered by ice (so far).

Who lives on the North Pole?

Obviously no one – except of course that according to tradition in some Western countries the North Pole is described as the residence of Santa Claus. Actually the Canada Post as assigned the postal code “H0H 0H0” to the North Pole. So it’s a good data quality question if “H0H 0H0” is a valid Canadian postal code.

Also Santa Claus may have several other residences, as the Finnish claims the correct address is “Santa Claus Village, FIN-96930 Arctic Circle, Finland” and in Denmark we believe the correct address of Santa Claus to be “Box 1615, DK-3900 Nuuk, Greenland”.

If you are interested in identity resolution covering multiple countries, there is a discussion going on in the LinkedIn Data Matching Group.

Where is the North Pole?

The latitude is 90° – but there is no longitude. So if you don’t accept null in the longitude attribute of your geocodes you might get a data quality issue when Santa Claus becomes a customer and you believe the Canada Post is the only single version of the truth.

Previous Data Quality World Tour blog posts:

We All Hate To Watch It

Tonight the European Song Contest finale will be watched by over 100 million people, despite the fact that most people agree about that the songs aren’t that good.

The winner will be selected by summing up an equal number of votes from each country. Usually there are big differences in how countries votes. A trend is that some neighboring groups of countries like to vote for each other. Such groups include a “Balkan Block” and a “Viking Empire”.

It’s a bit like survivorship when merging matched data rows into a golden record in an enterprise master data hub. Maybe the winning data isn’t that good and several departments probably don’t like it at all.

So I see no reason why Denmark shouldn’t win tonight.

Bookmark and Share

Using X Factor in Data Quality

Lately I have been experimenting with the X Factor (or Idol) approach to data quality – and I must say, with very promising results.

The basic idea with the X Factor approach to data quality is that it is not about accuracy of data, but all about data appeal.

Data appeal is initially measured by a panel of judges in a data audition. Usually you have 3 or 4 judges, where at least one judge is unbelievably nice and friendly and at least one judge is extremely rude (aka honest). After a following rootcamp the surviving data records are knocked out one by one by the users until we have a golden record as the winner. A secret data steward is usually hosting the show. 

The great thing about the X Factor approach is that the so called “xingle version of the truth” doesn’t last very long. Soon we will have a new season where data is going through the same process again with a completely new golden record as the winner.

Wonder about what Simon says?   

Bookmark and Share

Fitness Data

About a month ago I wrote about how my personal data was on-boarded in the local fitness club in the post called Right the First Time.

Since then I have actually succeeded in visiting the gym twice a week and used the amazing technology necessary to get me in action.

As a complete data geek I of course use the full TV screen on the machine not to watch TV but to display the full dashboard with key performance indicators related to my workout. These include:

  • Time done / remaining
  • Pulse with red alert when I’m over the healthy threshold for my age
  • Distance I would have gone if I wasn’t in the same fixed position
  • Calories burned

As with many data presentations we here have a mix of hard facts, like the time done, and then some assumed figures like calories burned. The machine doesn’t really measure the actual accurate burning but calculates the assumed burning as a function of power level, speed, my weight and age.  

It’s actually a question if I really want to know about the calories burned. My conclusion is yes. The time done is wasted anyway, the high pulse doesn’t last and the distance is virtual. So the calories burned fit the purpose of use. It keeps me going.   

Bookmark and Share

So I’m not a Capricorn?

Yesterday was my birthday. Being born the 14th January makes me a Capricorn according to astrology.

Only there is a slight problem. As told in an article on Huffingtonpost an astronomer has kindly remarked that the assignment of signs with the calendar was made thousands of years ago. In the mean time the earth’s orbit has changed, so we should have completely new signs (and personalities?) today.     

I guess astrology qualifies as a data and information quality trainwreck by forgetting one of the most common pitfalls in data quality: Things change.  

Bookmark and Share

A Data Quality Immaturity Model

There are several maturity models related to data quality out there. I have found a good collection in this document from NASCIO.

I guess the mother of all maturity models is the Capability Maturity Model (CMM). This model is related to software development.

There is also a parody model for that called the Capability Immaturity Model (CIMM). Inspired by an article yesterday by Jill Dyché on Information Management called Anti-Predictions for 2011 I have found that the CIMM model is easily adapted to a data quality immaturity model with levels from zero to minus three as this:

0 : Negligent

The organization pays lip service, often with excessive fanfare, to implementing data quality processes, but lacks the will to carry through the necessary effort. Whereas level 1 assumes eventual success in producing and measuring quality data, level 0 organizations generally fail to have any idea about the actual horrible quality of the data assets.

-1 : Obstructive

Processes, however inappropriate and ineffective, are implemented with rigor and tend to obstruct work. Adherence to process is the measure of success in a level -1 organization. Any actual creation of quality data is incidental. The quality of any data is not assessed, presumably on the assumption that if the proper process was followed, high quality data is guaranteed.

-2 : Contemptuous

While processes exist, they are routinely ignored by the staff and those charged with overseeing the processes are regarded with hostility. Measurements are fudged to make the organization look good.

-3 : Undermining

Not content with faking their own performance, undermining departments within the organization routinely work to downplay and sabotage the efforts of rival departments. This is worst where company policy causes departments to compete for scarce resources, which are allocated to the loudest advocates.

Bookmark and Share

Despite Best Intentions

Sometimes you have the best intentions in improving things as data quality and a lot of other things, but somewhere you failed seeing the big picture and it is too late to correct.

From the sports world this apparently happened to the Singapore water polo team at the current Asian Games.

They have new designed speedos honoring the nation’s flag.

But now some ministry tells them, that the swimsuit is inappropriate. But you can’t change outfit during the games.

By the way: I also work at a company with this logo:

Fortunately we haven’t got company speedos.

Bookmark and Share

Is a Small Difference a Big Deal?

The title of this blog post is stolen from/was inspired by a post on the Nation of Why Not blog. The Nation of Why Not is the branded name of Royal Caribbean. Royal Caribbean operates among a lot of other vessels the world’s two largest cruise ships: ‘Oasis of the Seas’ and ‘Allure of the Seas’. The youngest ship ‘Allure of the Seas’ has just left the shipyard in Turku, Finland and passed under the Great Belt Bridge in grey Danish waters on the way to the blue Caribbean Sea.    

The Oasis and Allure are sister ships supposed to have exactly the same dimensions. But according to the official measures by DNV, Allure is 50 millimeters longer than Oasis. This has led to some teasing between the crews and now it has been suggested that NASA should make a new measurement (from up above I guess).

This is a good old classic data quality issue. Is it acceptable to assume that two similar things have the same attributes? Or do you need to measure each thing separately? And is an eventual difference a difference in the real world or a difference in measurement?

Now, with the ships I think they are a bit different anyway, as I see that the new ship Allure opposite to Oasis also have a Samba Grill, Rita’s Cantina and a Starbucks café inside.     

Bookmark and Share