The World of Measuring

A common data quality issue in data management is the use of different measuring systems. Let’s have a look at some of the issues.

Mile or Kilometer, Pound or Kilogram

There is the imperial system with units as a mile and a pound. And there is the metric system with units as meter and gram.

According to Wikipedia the metric system, though there are nuances in world-wide use, is used all over except in notably the United States.

Metric Penetratiion

Celsius or Fahrenheit

For temperature scale we have the Celsius scale used all over and the Fahrenheit scale in the United States.

Big-endian, Little-endian or Middle-endian

When expressing a date we have the ISO standard as a big-endian format like today is 2013-04-27. But all over the world a little-endian format like today is 27-04-2013 is used except in the United States (and all the social networks coming from there) where today is expressed in a middle-endian format being 04-27-2013.

Bookmark and Share

Names, Addresses and National Identification Numbers

When working with customer, or rather party, master data management and related data quality improvement and prevention for traditional offline and some online purposes, you will most often deal with names, addresses and national identification numbers.

While this may be tough enough for domestic data, doing this for international data is a daunting task.

Names

In reality there should be no difference between dealing with domestic data and international data when it comes to names, as people in today’s globalized world move between countries and bring their names with them.

Traditionally the emphasize on data quality related to names has been on dealing with the most frequent issues be that heaps of nick names in the United States and other places, having a “van” in bulks of names in the Netherlands or having loads of surname like middle names in Denmark.

With company names there are some differences to be considered like the inclusion of legal forms in company names as told in the post Legal Forms from Hell.

UPU S42Addresses

Address formats varies between countries. That’s one thing.

The availability of public sources for address reference data varies too. These variations are related to for example:

  • Coverage: Is every part of the country included?
  • Depth: Is it street level, house number level or unit level?
  • Costs: Are reference data expensive or free of charge?

As told in the post Postal Code Musings the postal code system in a given country may be the key (or not) to how to deal with addresses and related data quality.

National Identification Numbers

The post called Business Entity Identifiers includes how countries have different implementations of either all-purpose national identification numbers or single-purpose national identification numbers for companies.

The same way there are different administrative practices for individuals, for example:

  • As I understand it is forbidden by constitution down under to have all-purpose identification numbers for individuals.
  • The United States Social Security Number (SSN) is often mentioned in articles about party data management. It’s an example of a single-purpose number in fact used for several purposes.
  • In Scandinavian countries all-purpose national identification numbers are in place as explained in the post Citizen ID within seconds.

Dealing with diversity

Managing party master data in the light of the above mentioned differences around the world isn’t simple. You need comprehensive data governance policies and business rules, you need elaborate data models and you need a quite well equipped toolbox regarding data quality prevention and exploiting external reference data.

Bookmark and Share

Happy New Year

Am I too late? Not at all. Today is the last day in the year of the dragon and tomorrow will be the first day in the year of the snake according to the Chinese calendar. It’s the Chinese New Year.

As globalization moves on we are becoming more and more aware of celebrations from different cultures and I guess we will end up having almost every day as a special day.

Next up as I am aware of is the coming Thursday being Valentine’s Day, a day that has gained much in importance during the last decades in many European countries and other places. Not at least taunted by retailers.

In Chinese symbology, snakes are regarded as intelligent, but with a tendency to be somewhat unscrupulous. So I guess Valentine’s day this year will be great (for retailers).

Everything a good reminder of the diversity issues in data quality which is a frequent subject on this blog.

Happy new year and for god’s sake don’t forget Valentine’s Day.

Chinatown_london

Bookmark and Share

The Dangers of being a Global Shopper

The global shopper is a multi-channel beast.

A global shopper may be a tourist or a business traveler buying goods in exciting cities around the world in shops most probable operated by the very same brands that occupies his local high street. The global shopper may also do his business from his living room by shopping online on sites with strange foreign privacy rules and unusual registration forms.

Oxford_StreetBeing a global shopper is risky business.

For example it’s unbelievable why Oxford Street in London hasn’t been made into a pedestrian street long time ago like any other respectable high street in major cities. But no, global shoppers on Oxford Street are constantly in danger of being hit by a red double-decker bus when crossing the street for a good bargain while looking to the right wrong side.

And how about shoe sizes? Measuring systems and standards around the world is a jungle and as a global shopper you will in 8 ½ out of 10 trials pick the wrong number 42.

Going online isn’t any better.

When registering your home address on a foreign site you are on very slippery ground.

If the site is from the United States, and you are not, you have to choose living in one of 50 different states meaning nothing to you. But there is no way around. My favorite state then is Alaska usually being on the top of the list.

Having a postal code with letters in it can be a no go. Not having a postal code is much like not existing at all.

But don’t give up. As a global shopper you will be able to find sites online with absolutely no clue about what an address looks like. Only thing of course will be the question about if you actually will get your goods or have to settle with the credit card withdrawal only.

Bookmark and Share

MDM Summit Europe 2013 Wordle

The Master Data Management Summit Europe 2013, co-located with the Data Governance Conference Europe 2013, takes place in London the 15th to 17th April.

Here is a wordle with the session topics:

MDMDG 2013 wordle

Some of the words catching my eyes are:

Global is part of several headlines. There is no doubt about that governing master data on a global scale is a very timely subject. Handling master data in a domestic context can be hard enough, but enterprises are facing a daunting task when embracing party master data, product master data and location master data covering the diversity of languages, script systems, measuring systems, national standards and regulatory requirements. However, there is no way around the challenges when synergies in global enterprises are to be harvested.

RDM (Reference Data Management) is becoming a popular subject as well. Being successful with governing master data requires a steady hand with the reference data layer that sits on top of the master data. Some reference data sets may be small, but the importance of getting them right must not be underestimated.

Business. Oh yes. All the data stuff is there to enable business processes, drive business transformation and make business opportunities.

Bookmark and Share

Postal Code Musings

When working with master data management and data quality including data matching one of the most frequent pieces of information you work with is a postal code.

Postal codesWikipedia has a good article about postal code.

Some of the data quality issues related to the datum postal code are:

Metadata

Over the world different words are used for a postal code:

  • ZIP code, the United States implementation of a postal code, is often used synonymously for a postal code in many databases and user interfaces. This is not seriously wrong, but not right either.
  • In India a postal code (in English) is called a PIN Code (Postal Index Number). This could definitely trick me.

Format

There are basically two different formats of postal codes around:

  • Numeric postal codes are the most common ones. The number of digits does however differ between countries. And there may be some additional considerations:
    •  For example the 9 digit United States ZIP code is split into the original 5 digits and the additional 4 digits implemented later.
    • Postal codes may begin with 0 which may create formatting errors when treated as numeric.
  • Some countries, for example the United Kingdom, the Netherlands, Canada and Argentina, have alphanumeric postal codes.

Embedded Information

Numeric postal codes usually forms some kind of hierarchy in which you can guess the geographical position within the country and make ranges representing smaller or larger geographical areas. But you never know.

This also goes for Dutch (you know, the ones in the Netherlands) postal codes as the first 4 characters are numeric.

The UK postal codes usually start with a mnemonic of the main city in the area, except in a lot of cases.

Precision

Some postal code systems have postal codes covering larger areas with many streets and some postal code systems are very granular where each street, or part of a street, has a distinct postal code.

The UK postal code system is very granular which have paved the way for using rapid addressing as told in a recent article on the UK Database Marketing Magazine.

Coverage

Utilizing rapid addressing requires that reference data for postal codes practically covers every spot in the country and updates are available on a near real time basis.

Some countries have postal code systems not covering every corner and some countries haven’t a postal code system at all.

Uniqueness

The main reason for implementing postal code systems is that a town or city name in many cases isn’t unique within a country.

But that doesn’t mean that uniqueness works the other way as well. A postal code may in many countries cover several town names. France is an example.

Consistency

While we basically have granular and not so granular postal code systems we of course also have hybrids.

In Denmark for example there is a granular system in the capital Copenhagen with a postal code for each street, named by the street, and a system in the rest of country with a postal code for an area named by the suburban or town.

Fit for purpose

A postal code is a hierarchical element in a postal address. We basically have two forms of postal addresses:

  • A geographical address where the postal address including the postal code points to place you also can visit and meet the people receiving the things sent to there
  • A post-office box which may have more or less geographical connection to where the people receiving the things sent to there are

Penetration of post-office boxes differs around the world. In Namibia it is mandatory. In Sweden most companies have a post-office box address.

Trying to compare data with these different concepts is like comparing apples and oranges, which often goes bananas.

Bookmark and Share

The Letter Å

I have previously written about the letter Æ and the letter Ø. Now it’s time to write about another letter in Scandinavian alphabets that doesn’t belong to the English alphabet: The letter Å which is å in lower case.

When transliterated to the English alphabet Å becomes AA and å becomes aa. When a name begins with Å it becomes Aa. For example the second largest city in Denmark was called Århus being Aarhus in English. Actually the city council by 1st January 2011, as reported here, changed the name of the city to Aarhus.

AarhusThe Master Data Management tool vendor Stibo Systems has it’s headquarter in an Aarhus suburban. As Stibo was founded in 1794 the company has stayed in Århus some of its life.

The term Master Data Management (MDM) wasn’t known in 1794 and IT wasn’t invented then. Stibo is basically a printing company who became a specialist in making catalogues, later electronic catalogues and the software for doing this, which led to being a Product Information Management (PIM) vendor and now a multi-domain MDM solution provider. By the way: å is pronounced as the o in catalogue. Catalåg.

Bookmark and Share

What Happened in 1013

At this time of year it is very popular to try to predict what will happen in the next year, being 2013, within your field of expertise.

However, predictions, not at least about the future, may fail. And within data quality we don’t like flaws. So instead I will tell a little bit about what happened in year 1013 with respect to data quality.

1013As always Wikipedia is your friend when seeking knowledge. So I have picked a few of the highlights from the Wikipedia article about 1013:

Diversity

In 1013 the Viking warlord Sweyn Forkbeard replaced Æthelred the Unready as King of England. These were the happy days when the letter Æ was part of the English alphabet. Today Æ only exists in some of the Viking alphabets.

Definition

Kaifeng, capital of China, becomes the largest city of the world in 1013, taking the lead from Córdoba in Al-Andalus. However this is estimation. And even today, as reported by BBC, we actually can’t tell which one is the largest city in the world.

Multiple versions of the truth

The anti-pope John XVI dies in 1013. An anti-pope is a person who, in opposition to the one who is generally seen as the legitimately elected Pope, makes a significantly accepted competing claim to be the Pope. Even today we can’t always establish a single version of the truth.

Bookmark and Share