Liliendahl on Data Quality

My View

15th May 201116th May 2011Henrik Gabs Liliendahl7 Comments

This post is inspired by the view from our roof terrace, where I’m sitting with the laptop right now.

One of the buildings I can see in the skyline is the spectacular new Hotel Bella Sky that will open tonight.

The new hotel is situated by the main fair in Copenhagen called Bella Center, the venue of the recent disastrous climate change summit where Wen, Obama and Singh couldn’t agree about anything.

The Bella Sky isn’t the only new high rising hotel in the nearby skyline. Actually there is currently an overcapacity of hotel rooms in Copenhagen. But as it is said, the new hotels were planned before the credit crunch and couldn’t be stopped.

Planning several years in advance has always been difficult. Within information technology it’s also a well known fact that projects that is set to deliver some years ahead almost always fails to meet the actual business needs when that time is reached.

On the one hand we need some more agile hotel projects – and agile information technology projects – including agile master data management and data quality programs.

On the under hand, I like it when I see some nice hotel architecture and some good data architecture.

We All Hate To Watch It

14th May 2011Henrik Gabs LiliendahlLeave a comment

Tonight the European Song Contest finale will be watched by over 100 million people, despite the fact that most people agree about that the songs aren’t that good.

The winner will be selected by summing up an equal number of votes from each country. Usually there are big differences in how countries votes. A trend is that some neighboring groups of countries like to vote for each other. Such groups include a “Balkan Block” and a “Viking Empire”.

It’s a bit like survivorship when merging matched data rows into a golden record in an enterprise master data hub. Maybe the winning data isn’t that good and several departments probably don’t like it at all.

So I see no reason why Denmark shouldn’t win tonight.

Compound Words

11th May 201111th May 2011Henrik Gabs Liliendahl5 Comments

When working with data quality and not at least data matching an ever recurring issue is compound words. We even have the issue when talking about terms related to data quality like is it called “meta data” or “metadata” and is it called “multi-domain MDM” or “multidomain MDM”. With MDM my spell checker likes the first option, but Gartner (the analyst firm) likes the last option.

In an international context the issue with compound words becomes much more frequent. In some languages like the other Germanic languages than English compound words are used much more. For example a street name as “Main Street” will be “Hauptstrasse” in German and “Hovedgade” in Danish.

If your first language has many compound words (like mine) you tend to use (and overuse) compound words even in English. I stumbled upon that when I was helping a family member looking for searching trends for “hair extensions”.

If you look at the regional interest in Google Insights the interest in “hair extensions” (figure 1) is big mostly in countries with English as first language while the interest in “hairextensions” (figure 2) is big mostly in countries having English as secondary or third language.

Quotes not originally about Data Quality

4th May 2011Henrik Gabs Liliendahl5 Comments

Yesterday I was looking for some quotations for a data quality presentation.

I stumbled upon these ones by Niels Bohr:

An expert is a person who has made all the mistakes which can be made in a very narrow field

I found that this quote is most often used this way:

“An expert is a man who has made all the mistakes which can be made in a very narrow field”.

I am pretty sure Bohr said person – not man. There are just as many female experts as male experts around.

And indeed: Learning from mistakes is the path to expertise in data quality.

There are two sorts of truth: Trivialities, where opposites are obviously absurd and profound truths, recognized by the fact that the opposite is also a profound truth

Bohr was into quantum mechanics. I think data quality is very much like quantum mechanics. Sometimes there is a simple single version of the truth; sometimes there are several great versions of a complex truth.

Anyone who is not shocked by quantum theory has not understood it

Anyone who is not shocked by the actual quality of data has probably not measured it (yet).

Georgian Geography and History

1st May 201125th August 2011Henrik Gabs LiliendahlLeave a comment

This is the sixth post in a series of short blog posts focusing on data quality related to different countries around the world. I am not aiming at presenting a single version of the full truth but rather presenting a few random observations that I hope someone living in or with knowledge about the country are able to clarify in a comment.

Georgia

Georgia is the English name for a sovereign state in the South Caucasus where Europe meets Asia. Georgia was a part of the Soviet Union under the English name Georgian SSR from 1922 to 1991. Back in the 4th century BC a unified kingdom of Georgia was established as an early example of an advanced state organization under one king and an aristocratic hierarchy.

Georgia

Georgia is a state located in the southeastern United States. Back in the 18^th century the area was known as the Province of Georgia within the British colonies. Before the arrival of the Europeans some of current Georgia was part of the Cofitachequi paramount chiefdom.

Ambiguous place names and slowly changing dimensions

Like with Georgia there are lots of examples of place names belonging to more than one place on Earth. Besides that location reference data like the Georgia’s have slowly changing dimensions as what area is covered, where in a hierarchy it belongs and what it is called at a certain time.

Previous Data Quality World Tour blog posts:

A Business Rule and a Missing Master Data Hub

28th April 2011Henrik Gabs LiliendahlLeave a comment

It seems that the United States of America has a problem with the business rule saying you have to be born in the country to become president and a missing citizen master data hub telling about who’s born in the country.

This is an aspect of a previous blog post called Did They Put a Man on the Moon.

Single Company View

27th April 2011Henrik Gabs Liliendahl2 Comments

Getting a single customer view in business-to-business (B2B) operations isn’t straight forward. Besides all the fuzz about agreeing on a common definition of a customer within each enterprise usually revolving around fitting multiple purposes of use, we also have complexities in real world alignment.

One Number Utopia

Back in the 80’s I worked as a secretary for the committee that prepared a single registry for companies in Denmark. This practice has been live for many years now.

But in most other countries there are several different public registries for companies resulting in multiple numbering systems.

Within the European Union there is a common registry embracing VAT numbers from all member states. The standard format is the two letter ISO country code followed by the different formatted VAT number in each country – some with both digits and letters.

The DUNS-number used by Dun & Bradstreet is the closest we get to a world-wide unique company numbering system.

2-Tier Reality

The common structure of a company is that you have a legal entity occupying one or several addresses.

The French company numbering system is a good example of how this is modeled. You have two numbers:

SIREN is a 9-digit number for each legal entity (on the head quarter address).
SIRET is a 14-digit (9 + 5) number for each business location.

This model is good for companies with several locations but strange for single location companies.

Treacherous Family Trees (and Restaurants)

The need for hierarchy management is obvious when it comes to handling data about customers that belongs to a global enterprise.

Company family trees are useful but treacherous. A mother and a daughter may be very close connected with lots of shared services or it may be a strictly matter of ownership with no operational ties at all.

Take McDonald’s as a not perfectly simple (nor simply perfect) example. A McDonald’s restaurant is operated by a franchisee, an affiliate, or the corporation itself. I’m lovin’ modeling it.

Japanese Jargon

20th April 20111st May 2011Henrik Gabs LiliendahlLeave a comment

This is the fifth post in a series of short blog posts focusing on data quality related to different countries around the world. I am not aiming at presenting a single version of the full truth but rather presenting a few random observations that I hope someone living in or with knowledge about the country are able to clarify in a comment.

Home of quality philosophy

Japan is the home and inspiration of quality thinking. Therefore we also have some Japanese words used when talking quality. For example kaizen is used for continuous quality improvement, muda is the waste we should avoid and gemba is the real place where things happens and things could be changed.

Streets with no names

When sending letters to Japan the way of addressing is different from how it is done in most other parts of the world. Street names are seldom used in Japanese postal addresses, but the numbers/names of the blocks between the streets are used.

Would you like Kanji, Hiragana, Katakana or Romaji?

No, this is not a selection from the a la carte menu at a Japanese restaurant but different kind of writing systems to choose from in Japan covering three different kinds of script systems. Kanji is the old symbolic writing system similar to Chinese writing. Hiragana and Katakana are syllabic writing systems while Romaji is transcription of Japanese into Roman alphabetic letters.

Previous Data Quality World Tour blog posts:

The Value of Used Data

17th April 2011Henrik Gabs Liliendahl5 Comments

Motivated by a comment from Larry Dubov on the Data Quality ROI page on this blog I looked up the term Information Economics on Wikipedia.

When discussing information quality a frequent subject is if we can compare quality in manufacturing (and the related methodology) with information and data quality. The predominant argument against this comparison is that raw data can be reused multiple times while raw materials can’t.

Information Economics circles around that difference as well.

The value of data is very much dependent on how the data is being used and in many cases the value increases with the times the data is being used.

Data quality will probably increase with multiple uses as the accuracy and timeliness is probed with each use, a new conformity requirement may be discovered and the completeness may be expanded.

The usefulness of data (as information) may also be increased by each new use as new relations to other pieces of data are recorded.

In my eyes the value of (used) data is very much relying on how well you are able to capture the feedback from how data is used in business processes. This is actually the same approach as in continuous quality improvement (Kaizen) in manufacturing, only here the improvement is only good for the next goods to be produced. In data management we have the chance to improve the quality and value of already used data.

Finding Finland

15th April 201120th April 2011Henrik Gabs LiliendahlLeave a comment

This is the fourth post in a series of short blog posts focusing on data quality related to different countries around the world. I am not aiming at presenting a single version of the full truth but rather presenting a few random observations that I hope someone living in or with knowledge about the country are able to clarify in a comment.

Let’s start with Finnish

Finland is situated in the North Eastern corner of Europe. The Finnish language is together with Estonian and Hungarian much longer south in Europe totally different from the neighboring countries languages which are Germanic or Slavic. Swedish is also an official language in Finland, and in some parts of Finland cities and streets have both (usually totally different) Finnish and Swedish names.

Galoshes

The by far largest company in Finland is the cell phone maker Nokia. Before the cell phone was invented Nokia made paper and galoshes – the old way of connecting people. Nokia also from 2006 to 2008 owned the data quality firm Identity Systems. It was sold to Informatica. I guess Identity Systems connected with the Gaelic Tiger firm Similarity Systems make up the data matching capabilities at Informatica.

Syslore

One of the remaining (relatively) larger independent data matching firms in the world is Syslore. Syslore is hiding in Finland.

Previous Data Quality World Tour blog posts:

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph