Your Point, My Comma

Spam mails can be great food for thought.

This morning I had this one in one of my many mailboxes:

So, the amount in question was:

It’s interesting to see how the spammer used points and commas in the large amount of money he wanted to trick me with. Don’t know if he was sloppy or had the problem of showing an amount to a not segmented audience of the world that are:

  • Using point as decimal mark and comma as thousand separator
  • Using comma as decimal mark and point as thousand separator

The use of a sign for decimal mark and thousand separators is indeed divided across the globe as seen on this map:

The blue countries are using point as decimal mark and comma as thousand separator and the green countries are doing the opposite.

Then there may be diversities within a country as in Canada there are always questions about Quebec, where they are following the French custom. India also has its own numerals with 100 groupings besides the English heritage.  

The pattern of a approximately one half world using one standard and approximately another half of the world using an opposite standard is seen in other notations as arranging person names, writing street addresses as well as place names and postal codes as told in the post Having the Right Element to the Left.

Bookmark and Share

A Sudden Change: South Sudan

This tenth Data Quality World Tour blog post is about South Sudan, a new country born today the 9th July 2011.

Reference data

The term “reference data” is often used to describe small collections of data that are basically maintained outside an enterprise and being common to all organizations. A list of countries is a good example of what is reference data.

Sometimes the terms “reference data” and “master data” are used interchangeable. I started a discussion on that subject on the mdm community some time ago.

One problem with reference data as a country list is if you are able to keep such a list updated. A country list doesn’t change every day, but sometimes it actually does like today with South Sudan as a new country.  

Suddenly changing dimensions

If you have master data entities linking to reference data like a country list it is not that simple when the reference data changes. If you have a customer placed in what is South Sudan today that entity should rightfully link to Sudan regarding yesterday’s transactions, but you may also have changed the name of Sudan to North Sudan which is the continuing part of the former Sudan. 

We call that kind of challenge “slowly changing dimensions” but it actually looks like “suddenly changing dimensions” when we have to figure out who belongs to where at a certain time.

Previous Data Quality World Tour blog posts:

A geek about Greek

This ninth Data Quality World Tour blog post is about Greece, a favorite travel destination of mine and the place of origin of so many terms and thoughts in today’s civilization.

Super senior citizens

Today Greece has a problem with keeping records over citizens. A recent data profiling activity has exposed that over 9,000 Greeks receiving pensions are over 100 years old. It is assumed that relatives has missed reporting the death of these people and therefore are taking care of the continuing stream of euro’s. News link here.

Diverse dimensions

I found those good advices for you, when going to Greece today:

Timeliness: When coming to dinner, arriving 30 minutes late is considered punctual.

Accuracy:  Under no circumstances should you publicly question someone’s statements.

Uniqueness: Meetings are often interrupted. Several people may speak at the same time.

(We all have some Greek in us I guess).

Previous Data Quality World Tour blog posts:

New Eyes on Iceland

This eights Data Quality World Tour blog post is about Iceland.

Patronymics

Rather than using family names, the Icelanders use patronymics. This means that the first Icelandic President Sveinn Björnsson must have been son of Björn and I guess current Prime Minister Jóhanna Sigurðardóttir is the daughter of Sigurð. This must create some havoc for well proven algorithms for finding households. (Add to that that the Prime Minister is in a same-sex marriage).

Volcanoes

In the good old days air traffic wasn’t concerned with the recurring volcanic eruptions on Iceland. Today it seems to be a repeating cause of travel havoc. A bit like poor data quality wasn’t taken seriously in the good old days, but today dirty data creates havoc in business intelligence implementations.  

Previous Data Quality World Tour blog posts:

Notes about the North Pole

This is the seventh post in a series of short blog posts focusing on data quality related to different countries around the world. However, today we will be at a place not belonging to any country (so far) and only reachable on foot because it is in the middle of an ocean covered by ice (so far).

Who lives on the North Pole?

Obviously no one – except of course that according to tradition in some Western countries the North Pole is described as the residence of Santa Claus. Actually the Canada Post as assigned the postal code “H0H 0H0” to the North Pole. So it’s a good data quality question if “H0H 0H0” is a valid Canadian postal code.

Also Santa Claus may have several other residences, as the Finnish claims the correct address is “Santa Claus Village, FIN-96930 Arctic Circle, Finland” and in Denmark we believe the correct address of Santa Claus to be “Box 1615, DK-3900 Nuuk, Greenland”.

If you are interested in identity resolution covering multiple countries, there is a discussion going on in the LinkedIn Data Matching Group.

Where is the North Pole?

The latitude is 90° – but there is no longitude. So if you don’t accept null in the longitude attribute of your geocodes you might get a data quality issue when Santa Claus becomes a customer and you believe the Canada Post is the only single version of the truth.

Previous Data Quality World Tour blog posts:

Georgian Geography and History

This is the sixth post in a series of short blog posts focusing on data quality related to different countries around the world. I am not aiming at presenting a single version of the full truth but rather presenting a few random observations that I hope someone living in or with knowledge about the country are able to clarify in a comment.

Georgia

Georgia is the English name for a sovereign state in the South Caucasus where Europe meets Asia. Georgia was a part of the Soviet Union under the English name Georgian SSR from 1922 to 1991. Back in the 4th century BC a unified kingdom of Georgia was established as an early example of an advanced state organization under one king and an aristocratic hierarchy.

Georgia

Georgia is a state located in the southeastern United States. Back in the 18th century the area was known as the Province of Georgia within the British colonies. Before the arrival of the Europeans some of current Georgia was part of the Cofitachequi paramount chiefdom.

Ambiguous place names and slowly changing dimensions

Like with Georgia there are lots of examples of place names belonging to more than one place on Earth. Besides that location reference data like the Georgia’s have slowly changing dimensions as what area is covered, where in a hierarchy it belongs and what it is called at a certain time.

Previous Data Quality World Tour blog posts:

Japanese Jargon

This is the fifth post in a series of short blog posts focusing on data quality related to different countries around the world. I am not aiming at presenting a single version of the full truth but rather presenting a few random observations that I hope someone living in or with knowledge about the country are able to clarify in a comment.

Home of quality philosophy

Japan is the home and inspiration of quality thinking. Therefore we also have some Japanese words used when talking quality. For example kaizen is used for continuous quality improvement, muda is the waste we should avoid and gemba is the real place where things happens and things could be changed.

Streets with no names

When sending letters to Japan the way of addressing is different from how it is done in most other parts of the world. Street names are seldom used in Japanese postal addresses, but the numbers/names of the blocks between the streets are used.

Would you like Kanji, Hiragana, Katakana or Romaji?

No, this is not a selection from the a la carte menu at a Japanese restaurant but different kind of writing systems to choose from in Japan covering three different kinds of script systems. Kanji is the old symbolic writing system similar to Chinese writing. Hiragana and Katakana are syllabic writing systems while Romaji is transcription of Japanese into Roman alphabetic letters.  

Previous Data Quality World Tour blog posts:

Bookmark and Share