The Three Big M’s of Data Quality

Most organizations have a lot of data quality issues where there is a wealth of possible solutions to deal with these challenges.

What you usually do is that that you categorize the problems into three different types of best resolutions:


You could go ahead with solving the data quality problems today but probably you have better and more important things to do right now.

Your organization may have a global SAP rollout going on or other resource demanding implementations. Therefore it is most wise to deal with the data quality issues when everything is running smoothly.

Mission impossible

Maybe a resolution has been tried before and didn’t work. Chances that alternate people management, different orchestration of processes and development in available technology will change that are very slim.

May the force be with you

Many problems solve themselves over time or hopefully don’t get noticed by anyone. If things get ugly you always have your lightsaber.

Bookmark and Share

Well Met, Stranger

Finally, the hosted version of WordPress that I am using, has added geography to the stats.

The counter has been running for 14 days now, so I have tried to have a first look into the numbers.

First of all I’m pleased that I during these 14 days have had visitors from 67 different countries around the globe:

Most visitors have been from the United States, followed by my current home country United Kingdom and then my former home country Denmark:

Note: This figure is made by copying the results into excel.

If grouped by regions of the world, it looks like this:

The world has certainly become a small place. Of course your interactions are biased towards your neighborhood, but in blogging as well as in business our success will increasingly become dependent on meeting, understanding and interacting with (maybe not so) strange people of the world.

Bookmark and Share

It’s Hard to Be a Data Geek

Sometimes I, along with other folks in my social network circles and groups, describe myself as a data geek.

Another none anonymous data geek, Rich Murnane, recently started a series of excellent cartoons on his blog about DataGeek’s first days on a new job. Hard work indeed.

Then the data geeky corporate twitter account of IBM Initiate has made a twittpoll asking: Do you consider yourself a data geek or a management geek?

It’s a hard question. Because you know that a lot of things about better data is about better management and it’s much more admirable to be a management geek than a poor data geek.

Anyway I stood firm and admitted that I am a data geek. Because the world has always been crowded with management consultants with little attention to the needs of the data. Someone has to take care about the data. It’s hard, but it’s worth it.

Bookmark and Share

A pain in the …

When we move around in the traffic we may have different roles at different times. Sometimes I drive a car, sometimes I’m a pedestrian and sometimes I ride a bicycle. The traffic infrastructure tries to separate these roles by having roads for cars, sidewalks (pavements) for pedestrians and bicycle paths for bicycles. But in intersections these separations meets and creates cases of who’s to have the upper hand and sometimes all three constructions aren’t available, so pedestrians and bicycle riders may use a road made for cars.

I have just completed a short (kind of) holiday where we took our bicycles on a tour around parts of the Baltic Sea coast through four different countries: Denmark, Germany, Poland and Sweden. Our start and end was in Copenhagen, which is known for having extremely good conditions for bicycling coined by the term “Copenhagenization”.     

The quality and availability of bicycle paths varied a lot on the route. Sometimes you felt that the bicycle paths were constructed to make the life of bicycle riders as miserable as possible. When the bicycle path wasn’t there or was too bad we hit the road, which was extremely unpopular among the car drivers. Not at least German Mercedes drivers love their horns.

But I guess it’s nothing personal. When I drive my car I also think pedestrians and bicycle riders are a pain in the …

Such cases of not liking a role you have yourself at another time also applies to a lot of other situations in life. For example I’m not very excited about all the data quality checks and mandatory fields I have to deal with in the CRM system when I have sold a data quality tool or service. I see them as a pain in the …

And oh yes, after finishing the cycling tour I did have some pain…

Bookmark and Share

History of Data Quality

When did the first data quality issue occur? Wikipedia says in the data quality article section titled history that it began with the mainframe computer in the United States of America.

Fellow data quality blogger Steve Sarsfield made a blog post a few years ago called A Brief History of Data Quality where it is said “Believe it or not, the concept of data quality has been touted as important since the beginning of the relational database”.

However, a predominant sentiment in the data quality realm is that data quality is not about technology. It is about people. People are the sinners of data quality flaws and as the main part of the problem people should also be the overwhelming part, if not the only part, of the solution.

So I guess data quality challenges were introduced when people showed up in the real world. How and when that happened is a matter of discussion as discussed in the blog post Out of Africa.

As explained in the post Movable Types the invention of movable types in printing some hundreds of years ago (the most important invention since someone invented the wheel for the first time) made a big boost in knowledge sharing among people – and also a big boost in data and information quality issues.

But I think the saying “To err is human, but to really foul things up you need a computer” is valid. Consequently I also think you may need a computer to help with cleaning up the mess and to prevent the mess from happening again. End of (hi)story.    

Bookmark and Share

A geek about Greek

This ninth Data Quality World Tour blog post is about Greece, a favorite travel destination of mine and the place of origin of so many terms and thoughts in today’s civilization.

Super senior citizens

Today Greece has a problem with keeping records over citizens. A recent data profiling activity has exposed that over 9,000 Greeks receiving pensions are over 100 years old. It is assumed that relatives has missed reporting the death of these people and therefore are taking care of the continuing stream of euro’s. News link here.

Diverse dimensions

I found those good advices for you, when going to Greece today:

Timeliness: When coming to dinner, arriving 30 minutes late is considered punctual.

Accuracy:  Under no circumstances should you publicly question someone’s statements.

Uniqueness: Meetings are often interrupted. Several people may speak at the same time.

(We all have some Greek in us I guess).

Previous Data Quality World Tour blog posts:

Don’t confuse me with facts of life

As humans we like to know about simple facts. As with weather forecasts we like to know exactly what temperature it’s going to be, if the sun will be shining or it’s going to be rain and sometimes also about the wind speed and direction relating to a given place and time in the future.

Meteorologists have struggled for ages to tell us about that. A traditional weather forecast will tell us the best guess for these few key indicators.

Many people today, including me, don’t really rely on the weather to do our work. But we may plan when to work, how to get to work and what to do besides work depending on the weather forecast.

So I usually study the weather forecast. Lately I have noticed that the Danish Meteorological Institute has experimented with how to visualize to the common people that the weather forecast is a best guess. So for example instead of having single colored blue plies indicating how much rain to expect, they now have the choice to have blue piles in different light or darker blue colors indicating the risk (or chance if you like) of rain.

Better data quality? I think so. Less confusing? I think not. It could be rain anytime. But it probably won’t.          


Bookmark and Share