Word Quality

One of the top blogging advices is to be careful about your spelling and grammar and you might say that this should be even more important on a data quality blog.

Unfortunately I have to admit that I’m not particularly good at that.

Perhaps I’m somewhat excused because I’m blogging in English and English isn’t my mother tongue. When I write articles and other stuff in English for companies I work for, there is always someone with English skills to catch my mistakes. But when I’m blogging, I’m on my own.

I do strive to get it right. I always write my texts in a word processor with English spell check and grammar on. But there is a lot of mistakes that aren’t corrected by the spell checker as use of a wrong word, forgetting a word and not concatenating words that should (or might) be a compound word.

Many times I also try to google the terms I’m using. It’s a helpful trick, but sometimes you are cheated by hitting other people’s mistakes.

Occasionally folks are kind to help me by saying that I should use another word instead of some rare word I have found in an English dictionary.

So, not at least to the subscribers on this blog, who gets my first takes, please forgive my occasional bad spelling, grammar and odd words. I’m constantly thinking about continuous word quality improvement.            

Bookmark and Share

AAA

A top theme in the economic news these days is about credit ratings for countries – also called sovereign credit ratings.

The credit rating practice is a good example of how a lot of data (with a given quality) is transformed into a very compact piece of information as an AAA or whatever rating (with a disputed quality).   

The focus of this blog post is however about how credit ratings may be attached to reference and master data entities.

The figure below is a data visualization of S&P credit ratings for European countries:

The big dark blue land in the upper left corner is the southern part of Greenland. Even though that Greenland has an ISO country code (GL) and an internet TLD (.gl) Greenland hasn’t actually been rated as a country, but is (my qualified guess) rated together with the Faroe Islands and continental Denmark as the Kingdom of Denmark.

On other maps Greenland isn’t included in the triple-A club:

So this is a good example of how a top level reference data list as a country list may have hierarchies and may be specific in a given context, a subject that often is pondered by fellow data geek and blogger Graham Rhind latest in the post: Have you checked your country drop down recently?

A much more frequent subject than sovereign credit rating is of course corporate credit rating.

Here we have the same hierarchical considerations.

A business-to-business (B2B) customer list may have a lot of entities belonging to the same enterprise that is credit rated as one. However you shouldn’t give a credit limit to each entity which would be the credit limit you would assign to the enterprise as a whole. Avoiding that will be an important result from practicing good customer master data management.   

An often observed data quality flaw in customer master data is that entities actually belonging to the same credit rated enterprise has different credit risk assignments resulting in exposed financial risk. Avoiding that will also be an important result from practicing good customer master data management.   

How do you rate your customer master data management? AAA or less?   

Bookmark and Share

Data Quality and Decision Intelligence

“The substitute for Business Intelligence is called Decision Intelligence” was the headline in an article on the Danish IT site Version2 last month. The article was an interview with Michael Borges, head of Copenhagen based data management system integrator Platon. The article is introduced in English on Platon’s Australia site.

The term Decision Intelligence as a successor for Business Intelligence (BI) has been around for a while. In an article from 2008 Claudia Imhoff and Colin White explains what Decision Intelligence does that Business Intelligence don’t.  Very simplified it is embracing and integrating operational Business Intelligence, traditional Data Warehouse based Business Intelligence and (Business) Content Analytics.  

It is said in the article: “This, of course, has implications for both data integration and data quality. This aspect of decision intelligence will be covered in a future article.” I haven’t been able to find that future article. Maybe it’s still pending.

Anyway, certainly this – call it Decision Intelligence or something else – has implications for data quality.

The operational BI side is about supporting, and maybe have the systems making, decisions based on events taking place here and now based on incoming transactions and related master data. This calls for data quality prevention at data collection time opposite to data cleansing downstream which may have served well for informed decisions in traditional Data Warehouse based BI.

The content analysis side, which according to Imhoff/White article includes information expertise, makes me consider the ever recurring discussion in the data quality realm about the difference between data quality and information quality. Maybe we will come to an intelligent decision on that one when Business Intelligence is succeeded by Decision Intelligence.   

Bookmark and Share

When a Cloudburst Hit

Some days ago Copenhagen was hit by the most powerful cloudburst ever measured here.

More powerful cloudbursts may be usual in warmer regions on the earth, but this one was very unusual at 55 degrees north.

Fortunately there was only material damage, but the material damage was very extensive. When you take a closer look you may divide the underground constructions into two categories.

The first category is facilities constructed with the immediate purpose of use in mind. Many of these facilities are still out of operation.

The second category is facilities constructed with the immediate purpose of use in mind but also designed to resist heavy pouring rain. These facilities kept working during the cloudburst. One example is the metro. If the metro was constructed for only the immediate purpose of use, being circling trains below ground, it would have been flooded within minutes, with the risk of lost lives and a standstill for months.

We have the same situation in data management. Things may seem just fine if data are fit for the immediate purpose of use. But when a sudden change in conditions hit, then you know about data quality.

Bookmark and Share

The Data Quality Cuisine

Analogies between making and serving good food and improving data and information quality are among the recurring topics on this blog. Like the term good food is a subjective matter also good information is a subjective matter though the ones who have the task of preparing both knows that fresh and clean raw materials / data is a must for preparing both, as explained in the post Bon Appetit.

Food preferences and data and information preferences differs around the world. High esteemed local dishes from one country may not have the same traction in other parts of the world. As discussed in the post Data Quality and World Food this is also true for data and information quality.

In the post Metadata Meatballs it is examined how the same diversity applies to metadata.

Sometimes you can’t trust data even if data is captured correctly. If you for example ask people about food consumption habits we tend to give answers with some distance from reality. That calls for a Survey Data Laundering.

Estimating the return on investment for improving data quality has always been hard. The post Miracle Food for Thought is about how that resembles how following “good” advices around what you should eat and drink isn’t as simple as often stated.   

Anyway, we all know that better food and better serving in a restaurant does create more business and sometimes we have to put the restaurant and the information bistro Under new Master Data Management.

And finally, tomorrow this blog is two years old. That calls for a Birthday Party in the cloud.

Bookmark and Share

Don’t confuse me with facts of life

As humans we like to know about simple facts. As with weather forecasts we like to know exactly what temperature it’s going to be, if the sun will be shining or it’s going to be rain and sometimes also about the wind speed and direction relating to a given place and time in the future.

Meteorologists have struggled for ages to tell us about that. A traditional weather forecast will tell us the best guess for these few key indicators.

Many people today, including me, don’t really rely on the weather to do our work. But we may plan when to work, how to get to work and what to do besides work depending on the weather forecast.

So I usually study the weather forecast. Lately I have noticed that the Danish Meteorological Institute has experimented with how to visualize to the common people that the weather forecast is a best guess. So for example instead of having single colored blue plies indicating how much rain to expect, they now have the choice to have blue piles in different light or darker blue colors indicating the risk (or chance if you like) of rain.

Better data quality? I think so. Less confusing? I think not. It could be rain anytime. But it probably won’t.          

   

Bookmark and Share

How long is a Marathon?

Many large cities around the world have a yearly marathon event. Today it’s Copenhagen (and possibly other cities too).

The marathon distance today is 42,195 kilometers (if I use comma as decimal point) which resembles 26 miles and 385 yards or 26.22 miles (if I use a dot as decimal point).

So even if we today agree about the distance we might represent that distance in various ways. The distance has however varied during history as seen in the table with the length of the Olympic marathons.

What about real world alignment?

Well, if the Greek runner called Pheidippides (sometimes spelled Phidippides or Philippides) took the long but flat Southern route from Marathon to Athens it would have been around 42 kilometers. If he took the shorter but steeper Northern route it would only have been around 35 kilometers.

What about me? Oh, I’ll go for 42,195 kilometers – on the bike.   

Bookmark and Share

Quotes not originally about Data Quality

Yesterday I was looking for some quotations for a data quality presentation.

I stumbled upon these ones by Niels Bohr:

An expert is a person who has made all the mistakes which can be made in a very narrow field

I found that this quote is most often used this way:

“An expert is a man who has made all the mistakes which can be made in a very narrow field”.

I am pretty sure Bohr said person – not man. There are just as many female experts as male experts around.

And indeed: Learning from mistakes is the path to expertise in data quality.

There are two sorts of truth: Trivialities, where opposites are obviously absurd and profound truths, recognized by the fact that the opposite is also a profound truth

Bohr was into quantum mechanics. I think data quality is very much like quantum mechanics. Sometimes there is a simple single version of the truth; sometimes there are several great versions of a complex truth.

Anyone who is not shocked by quantum theory has not understood it

Anyone who is not shocked by the actual quality of data has probably not measured it (yet).

Bookmark and Share

The Value of Used Data

Motivated by a comment from Larry Dubov on the Data Quality ROI page on this blog I looked up the term Information Economics on Wikipedia.

When discussing information quality a frequent subject is if we can compare quality in manufacturing (and the related methodology) with information and data quality. The predominant argument against this comparison is that raw data can be reused multiple times while raw materials can’t.

Information Economics circles around that difference as well.

The value of data is very much dependent on how the data is being used and in many cases the value increases with the times the data is being used.

Data quality will probably increase with multiple uses as the accuracy and timeliness is probed with each use, a new conformity requirement may be discovered and the completeness may be expanded.

The usefulness of data (as information) may also be increased by each new use as new relations to other pieces of data are recorded.

In my eyes the value of (used) data is very much relying on how well you are able to capture the feedback from how data is used in business processes. This is actually the same approach as in continuous quality improvement (Kaizen) in manufacturing, only here the improvement is only good for the next goods to be produced. In data management we have the chance to improve the quality and value of already used data.    

Bookmark and Share

Survey Data Laundering

There are a lot of different words for data quality improvement activities like data cleaning, data cleansing, data scrubbing and data hygiene.

Today I stumbled upon “data laundering” and the site http://www.datalaundering.com that is owned by an old colleague of mine from way back when we were doing stuff not focused on data quality.

Joseph is specializing in laundering data from surveys. The issue is that surveys always have some unreliable responses that lead to wrong conclusions that again lead to wrong decisions.  This is a trail well known in data and information quality.

Unreliable responses resemble outliers in business intelligence. These are responses from respondents that provide answers distant from the most conceivable result. What I like about the presentation of the business value is that the example is about food: What we say that we eat and what we actually consume. Then there is a lot of math and even induction mechanism to support the proposition. Read all about it here.      

Bookmark and Share