Marathon, Spartathlon and Data Quality

Tomorrow there is a Marathon race in my home city Copenhagen. 8 years ago, a post on this blog revolved around some data quality issues connected with the Marathon race. The post was called How long is a Marathon?

Pheidippides at the end of his Marathon race in a classic painting

However, another information quality issue is if there ever was a first Marathon race ran by Pheidippides? Historians toady do not think so. It has something to do with data lineage. The written mention of the 42.192 (or so) kilometre effort from Marathon to Athens by Pheidippides is from Plutarch whose records was made 500 years after the events. The first written source about the Battle of Marathon is from Herodotus. It was written (in historian perspective) only 40 years after the events. He did not mention the Marathon run. However, he wrote, that Pheidippides ran from Athens to Sparta. That is 245 kilometres.

By the way: His mission in Sparta was to get help. But the Spartans did not have time. They were in the middle of an SAP roll-out (or something similar festive).

Some people make the 245-kilometre track in what is called a Spartathlon. In data and information quality context this reminds me that improving data quality and thereby information quality is not a sprint. Not even a Marathon. It is a Spartathlon.


When You Know that Statement is Wrong

1271Oftentimes it still takes a human eye to establish if a number, year, term or other piece of information is wrong.

I had that experience today at Harvard Square in Cambridge (Boston) when looking at the sign in front of our lunch restaurant. Established 1271 it says. Hmmmm. North American natives were not known for establishing restaurants. Also, the Vikings did not stay that long or went that south in North America.

The restaurant website actually admits the sign is wrong and this is a printing flaw (should have been 1971) that they have chosen to keep – maybe also in order to test the clever people hanging around Harvard.

Anyway, without attempting to turn this into a foodie blog, the food is OK but the waiting time for being served does resemble spans of centuries.

A Product Information Management (PIM) Solar System

Hundreds of years ago the geocentric model was replaced by heliocentrism, meaning that we recognize that the earth travels around the sun and not the other way around.

When it comes to Product Information Management (PIM), we also need a Copernican Revolution, meaning that it is good to manage product information consistently inside a given company, but it is better to manage product information in the light of the business ecosystem where we participate.

Exchanging product information in the business ecosystems of manufacturers, distributors and merchants cannot work properly by asking all your trading partners to use your version of a spreadsheet – if they don’t get to you first with their version. Nor will self-centered supplier / customer product data portals work as examined in the post PIM Supplier Portals: Are They Good or Bad?

Your company is not a lonely planet. You are part of a business ecosystem, where you may be:

  • Upstream as the maker of goods and services. For that you need to buy raw materials and indirect goods from the parties being your vendors. In a data driven world you also to need to receive product information for these items. You need to sell your finished products to the midstream and downstream parties being your B2B customers. For that you need to provide product information to those parties.
  • Midstream as a distributor (wholesaler) of products. You need to receive product information from upstream parties being your vendors, perhaps enrich and adapt the product information and provide this information to the parties being your downstream B2B customers.
  • Downstream as a retailer/etailer or large end user of product information. You need to receive product information from upstream parties being your vendors and enrich and adapt the product information so you will be the preferred seller to the parties being your B2B customers and/or B2C customers.

At Product Data Lake we support business ecosystems in Product Information Management (PIM). And this is not just a nice model. There are concrete business benefits too. 5 for you and 5 for your trading partner:  Check our 10 business benefits.


No plan of operations extends with any certainty beyond the first contact with the full load of data

There is a famous saying from the military world stating that: “No plan survives contact with the enemy.” At least one blogger has used the paraphrasing saying: “No plan survives contact with the data.” A good read by the way.

Helmuth von Moltke the Elder

Like most famous sayings also this phrase is simplified from the original version. The military observation made by Helmuth von Moltke the Elder is in full length: “No plan of operations extends with any certainty beyond the first contact with the main hostile force.”

Translating the extended military learning into data management makes a lot of sense too. You may plan data management activities using selected examples and you may test those using nice little samples. Like skirmishes before the real battle in warfare. But if your data management solution goes live on the full load of data for the first time, there most often is news for you.

From my data matching days I remember this clearly as explained in the post Seeing is Believing.

The mitigation is to test with a full load of data before going live. In data management we actually have a realistic way of overcoming the observation made by Field Marshall Helmuth Carl Bernard Graf von Moltke and revisit our plan of operations before the second and serious contact with the full load of data.

Bookmark and Share

Who Discovered the Americas?

Today I read a strange story about who discovered the Americas. It is about that Turkish President Recep Tayyip Erdogan said that Muslims, not Columbus, discovered Americas. The assumed discovery should have happened in the year 1178 in the Gregorian calendar.

AmericasWell, in my history book it goes like this:

1st the indigenous peoples of the Americas, sometimes called Indians (as opposed to cowboys), found that land by crossing the Bering Strait thousands of years ago.

2nd there is much speculation about that someone else crossed the oceans. Only archaeological evidence (so far) is that the Vikings were on Newfoundland of the coast of Canada at a place today called L’Anse aux Meadows. That happened around year 1000 in the Gregorian calendar. (By the way they came from Greenland, that geographically is a part of the Americas).

3rd Christopher Columbus and his crew arrived in the Americas in the year 1492 in the Gregorian calendar.

That is the data quality part of the story. The rest is information quality.

Bookmark and Share

Anachronism and Data Quality

The term anachronism is used for something misplaced in time. An example is classical paintings where a biblical event is shown with people in clothes from the time when the painting was done.

anachronismIn data quality lingo such a flaw will be categorized as lack of timeliness.

The most frequent example of lack of timeliness, or should we say example of anachronism, in data management today is having an old postal address attached to a party master data entity. A remedy for avoiding this kind of anachronism is explained in the post The Relocation Event.

In a recent blog post called 3-2-1 Start Measuring Data Quality by Janani Dumbleton of Experian QAS the timeliness dimension in data quality is examined along with five other important dimensions of data quality. As said herein an impact of anachronism could be:

“Not being aware of a change in address could result in confidential information being delivered to the wrong recipient. “

Hope you got it.

Bookmark and Share

Famous False Positives

You should Beware of False Positives in Data Matching. A false positive in the data quality realm is a match of two (or more) identities that actually isn’t the same real world entity.

Throughout history and within art we have seen some false positives too. Here are my three favorites:

The Piltdown Man

In 1912 a British amateur archeologist apparently found a fossil claimed to be the missing link between apes and man: The so called Piltdown Man. Backed up by the British Museum it was a true discovery until 1953 when it was finally revealed as a hoax. It was however disputed during all the years but defended by the British establishment maybe due to envy on the French having a Cro-Magnon man first found there and the Germans having a name giving true discovery in Neandertal.

Eventually the Piltdown Man was exposed as a middle age human upper skull, an orangutan jawbone and chimpanzee teeth.

Jimmy Bond in Casino Royale

James and Jimmy Bond

As told in the post My Name is Bond. Jimmy Bond: James Bond is British intelligence and Jimmy Bond is an American agent. It’s always a question if two identities residing in different countries are the same as discussed (about me) in the post Hello Leading MDM Vendor.

Dupond et Dupont

In English they are known as Thomson and Thompson. In the original Belgian/French (and in my childhood Danish comics) piece of art about the adventures of Tintin they are known as Dupond et Dupont. They are two incompetent detectives who look alike and have names with a low edit distance and same phonetic sound. For twin names in a lot of other languages check the Wikipedia article here.

And hey, today I’m going to the creator of these two guy’s home country Belgium to be at the Belgian Data Quality Association congress tomorrow.

Bookmark and Share

The Letter Å

I have previously written about the letter Æ and the letter Ø. Now it’s time to write about another letter in Scandinavian alphabets that doesn’t belong to the English alphabet: The letter Å which is å in lower case.

When transliterated to the English alphabet Å becomes AA and å becomes aa. When a name begins with Å it becomes Aa. For example the second largest city in Denmark was called Århus being Aarhus in English. Actually the city council by 1st January 2011, as reported here, changed the name of the city to Aarhus.

AarhusThe Master Data Management tool vendor Stibo Systems has it’s headquarter in an Aarhus suburban. As Stibo was founded in 1794 the company has stayed in Århus some of its life.

The term Master Data Management (MDM) wasn’t known in 1794 and IT wasn’t invented then. Stibo is basically a printing company who became a specialist in making catalogues, later electronic catalogues and the software for doing this, which led to being a Product Information Management (PIM) vendor and now a multi-domain MDM solution provider. By the way: å is pronounced as the o in catalogue. Catalåg.

Bookmark and Share

Doing Census versus doing Master Data Management

“In those days Caesar Augustus issued a decree that a census should be taken of the entire Roman world. This was the first census that took place while Quirinius was governor of Syria. And everyone went to their own town to register.”

These are the famous words from the Gospel According to Luke that you, if you belong to the part of the world where Christianity is practiced, hear every Christmas.

Today scholars don’t think that there actually was a census for the whole Roman Empire but there are evidences that a local census in Syria and Judea took place around year 1. This was in order to collect taxes in those provinces. As you know: The taxman is data quality’s best friend.

Today doing census is still the most practiced method of knowing about the people living in a given country. The alternative is a public registry that is constantly updated with all the information needed about you. I had the chance to describe such a method in the post on a Canadian blog some years ago. The post is called How Denmark does it.

India has a similar scheme with a centralized citizen registry on the go. This program is called Aadhaar.

As reported in the post Citizen ID and Biometrics the United Kingdom was close to adapting doing citizen Master Data Management some years ago. But it didn’t happen, so it’s still possible to have multiple names and multiple addresses at the same time in different registries while Cameron is Prime Minister of the United Kingdom, First Lord of the Treasury and Minister for the Civil Service.

Merry Christmas.

going to census

Bookmark and Share

What Happened in 1013

At this time of year it is very popular to try to predict what will happen in the next year, being 2013, within your field of expertise.

However, predictions, not at least about the future, may fail. And within data quality we don’t like flaws. So instead I will tell a little bit about what happened in year 1013 with respect to data quality.

1013As always Wikipedia is your friend when seeking knowledge. So I have picked a few of the highlights from the Wikipedia article about 1013:


In 1013 the Viking warlord Sweyn Forkbeard replaced Æthelred the Unready as King of England. These were the happy days when the letter Æ was part of the English alphabet. Today Æ only exists in some of the Viking alphabets.


Kaifeng, capital of China, becomes the largest city of the world in 1013, taking the lead from Córdoba in Al-Andalus. However this is estimation. And even today, as reported by BBC, we actually can’t tell which one is the largest city in the world.

Multiple versions of the truth

The anti-pope John XVI dies in 1013. An anti-pope is a person who, in opposition to the one who is generally seen as the legitimately elected Pope, makes a significantly accepted competing claim to be the Pope. Even today we can’t always establish a single version of the truth.

Bookmark and Share