Two Kinds of Business Rules within Data Governance

Yin and yangWhen laying out data policies and data standards within a data governance program one the most important input is the business rules that exist within your organization.

I have often found that it is useful to divide business rules into two different types:

  • External business rules, which are rules based on laws, regulations within industries and other rules imposed from outside your organization.
  • Internal business rules, which are rules made up within your organization in order to make you do business more competitive than colleagues in your industry do.

External imposed business rules are most often different from country to country (or group of countries like the EU). Internal business rules may be that too but tend to be rules that apply worldwide within an organization.

The scope of external business rules tend to be fairly fixed and so does the deadline for implementing the derived data policy and standard. With internal business rules you may minimize and maximize the scope and be flexible about the timetable for bringing them into force and formalizing the data governance around the rules. It is often a matter of prioritizing against other short term business objectives.

The distinctions between these two kinds of business rules may not be so important in the first implementation of a data governance program but comes very much into play in the ongoing management of data policies and data standards.

Bookmark and Share

Foreign Addresses

The New YorkerThere is a famous poster called The New Yorker. This poster perfectly illustrates the centricity we often have about the town, region or country we live in.

The same phenomenon is often seen in data management as told in the post Foreign Affaires.

If we for example work with postal addresses we tend to think that postal addresses in our own country has a well-known structure while foreign addresses is a total mess.

In Denmark where I am born and raised and has worked most of my life we have two ways of expressing an address:

  • The envelope way where there are a certain range of possibilities especially on how to spell a street name and how to write the exact unit within a high rise building, though there is a structure more or less known to native people.
  • The code way, as every street has a code too and there is a defined structure for units (known as the KVHX code). This code is used by the public sector as well as in private sectors as financial services and utility companies and this helps tremendously with data quality.

But around 3.5 percent of Danes, including yours truly, has a foreign address. And until now the way of registering and storing those addresses in the public sector and elsewhere has been totally random.

This is going to change. The public authorities has, with a little help from yours truly, made the first standard and governance principles for foreign addresses as seen in this document (in Danish).

At iDQ A/S we have simultaneously developed Master Data Management (MDM) services that helps utility companies, financial services and other industries in getting foreign addresses right as well.

Bookmark and Share

Sharing Big Location Reference Data

In the post Location Data Quality for MDM the different ways of handling location master data within many companies was examined.

A typical “as is” picture could be this:

Location1

Location data are handled for different purposes using different kinds of systems. Customer data may be data quality checked by using address validation tools and services, which also serves as prerequisite for better utilization of these data in a Geographical Information System (GIS) and in using internal customer master data in marketing research for example by utilizing demographic classifications for current and prospective customers.

Often additional external location data are used for enrichment and for supplementing internal master data downstream in these specialized systems. It may very well be that the external location reference data used at different points does not agree in terms of precision, timeliness, conformity and other data quality dimensions.

A desired “to be” picture could be this:

Location2

In this set-up everything that can be shared across different purposes are kept as common (big) reference data and/or are accessible within a data-as-a-service environment maintained by third party data providers.

Bookmark and Share

Hello Leading MDM Vendor

This morning I received messages from a leading MDM vendor about an upcoming webinar the 12th September.

INFA 01

As we have the 3rd October today this is strange and the vendor of course sent out a correction later today:

INFA 02

That’s OK. Shit happens. Even at data quality and MDM vendors marketing departments.

I am probably a kind of a strange person been living in two countries lately, so I got the original message and the correction both to my Scandinavian identity from the vendor’s Scandinavian body:

INFA 03

As well as to my UK identity from the vendor’s UK body:

INFA 04

That’s OK. Getting a 360 degree view of migrating persons is difficult as discussed in the post 180 Degree Prospective Customer View isn’t Unusual.

Both (double) messages have a salutation.

UK:

INFA 05

Scandinavian:

INFA 06

Being Mr. Sorensen in the UK is OK. Using Mister and surname fits with an English stiff upper lip and The Letter ø could be o in the English alphabet.

I’m not sure if Dear Mr. Sørensen is OK in a Scandinavian context. Hello Henrik would be a better fit.

Bookmark and Share

The Postal Address Hierarchy

Using postal addresses is a core element in many data quality improvement and master data management (MDM) activities.

HierarchyAs touched many times on this blog postal addresses are formatted very differently around the world. However they may all be arranged in a sort of hierarchy, where there are up to 6 general levels being:

  • Country
  • Region
  • City or district
  • Thoroughfare (street) or block
  • Building number
  • Unit within building

In addition to that the postal code (postcode or zip code) is part of many address formats. Seen in the hierarchical light the postal code is a tricky concept as it may identify a city, district, thoroughfare, a single building or even a given unit within or section of a building. The latter is true for my company address in the United Kingdom, where we have a very granular postcode system.

Country

As discussed in the post The Country List even the top level of a postal address hierarchy isn’t a simple list fit for every purpose. Some issues are:

  • There are different sources with different perceptions of which are the countries on this planet
  • What we regard as countries comes in hierarchies
  • Several coding systems are available

Region

The region is an element in some address formats like the states in the United States and the provinces in Canada, while other countries like Germany that is divided into quite independent Länder do not have the region as a required part of the postal address. The same goes for Swiss cantons.

City or district

I once read that if you used the label city in a web form in Australia, you would get a lot of values like: “I do not live in a city”.

Anyway this level is often (but as mentioned certainly not always) where the postal code is applied. The postal code district may be a single town with surroundings, several villages or a district within a big city.

Thoroughfare (street) or block

Most countries use thoroughfares as streets, roads, lanes, avenues, mews, boulevards and whatever they are called around. Beware that the same street may have several spellings and even several names.

Japan is a counterexample of the use of thoroughfares, as here it’s the blocks between the thoroughfares that are part of the postal address.

Building number

Usually this element will be an integer. However formats with a letter behind the integer (example: 21 A) or a range of integers (example: 21-23) are most annoying. And then this British classic: One Main Grove. OMG.

Unit within a building

This element may or may not be present in a postal address depending on if the building is a single family house or company site, the postal delivery sees it as such or you may actually indicate where within the building the delivery goes or you go. The ups and downs of this level are examined in the post A Universal Challenge.

Bookmark and Share

Where the Streets have one Name but Two Spellings

Last week’s post called Where The Streets have Two Names caught a lot of comments both on this blog and in LinkedIn groups as here on Data Quality Professionals and on The Data Quality Association, with a lot of examples from around the world on how this challenge actually exist more or less everywhere.

Recently I had the pleasure of experiencing a variant of the challenge when driving around in a rented car in the Saint Petersburg area in Russia. Here the streets usually only have one name but that may be presented in two different alphabets being the local Cyrillic or the Latin alphabet I’m used to which also was included in the reference data on the Sat Nav. So while it was nice for me to type destinations in Latin letters it was nice to have directions in Cyrillic in order to follow the progress on road signs.

So here standardization (or standardisation) to one preferred language, alphabet or script system isn’t the best solution. Best of breed solutions for handling addresses must be able to handle several right spellings for the same address.

Nevsky_Prospekt,_St_Petersburg,_street_sign
Street sign in Cyrillic with Latin subtitle

Bookmark and Share

Where the Streets have Two Names

As told in post The Art in Data Matching a common challenge in matching names and addresses is that in some parts of the world the streets have more than one name at the same time because more than one language is in use.

We have the same challenge when building functionality for rapid addressing, being functionality that facilitates fast and quality assured entry of addresses supported by reference data that knows about postal codes / cities and street names.

The below example is taken from the instant Data Quality tool address form:

Finish Swedish

The Finnish capital Helsinki also has an official name in Swedish being Helsingfors and the streets in Helsinki/Helsingfors have both Finnish and Swedish names. So when you start typing a letter suggestions could be in both Finnish and Swedish.

What challenges have you encountered with street names in multiple languages?

Bookmark and Share

Is Data Cleansing Bad for Data Matching?

Today I stumbled upon an article from Australia on BMC: Medical Informatics and Decision Making. The article is called The effect of data cleaning on record linkage quality.

The result of the described research is:

“Data cleaning made little difference to the overall linkage quality, with heavy cleaning leading to a decrease in quality. Further examination showed that decreases in linkage quality were due to cleaning techniques typically reducing the variability – although correct records were now more likely to match, incorrect records were also more likely to match, and these incorrect matches outweighed the correct matches, reducing quality overall.”

datamatchingThis resonates very well with my experience too. Usually I like to match with both original data and standardized (cleansed) data in order to exploit the best of both approaches.

What are your experiences?

Bookmark and Share

The World of Measuring

A common data quality issue in data management is the use of different measuring systems. Let’s have a look at some of the issues.

Mile or Kilometer, Pound or Kilogram

There is the imperial system with units as a mile and a pound. And there is the metric system with units as meter and gram.

According to Wikipedia the metric system, though there are nuances in world-wide use, is used all over except in notably the United States.

Metric Penetratiion

Celsius or Fahrenheit

For temperature scale we have the Celsius scale used all over and the Fahrenheit scale in the United States.

Big-endian, Little-endian or Middle-endian

When expressing a date we have the ISO standard as a big-endian format like today is 2013-04-27. But all over the world a little-endian format like today is 27-04-2013 is used except in the United States (and all the social networks coming from there) where today is expressed in a middle-endian format being 04-27-2013.

Bookmark and Share

New Standards

This morning people in the United States will not wake up to the date being 04/01/2013. Instead the date will be 01/04/2013 as it is in the rest of the world. The days of the mm/dd/yyyy date format are counted.

In a related statement a US government representative writes: What can be standardized must be standardised.

celcius fahrenheitThis is only the first step in a plan for the US to adapt to other more commonly used standards world-wide. The Fahrenheit temperature scale will be changed to Celsius by the 04/01/2014 for degrees below 0 Celsius (formerly 01/04/2014 and 32 degrees Fahrenheit).  When spring comes along at the 01/04/2014 (formerly 04/01/2014) the change will be due also for all warm degrees.

In another move the United Kingdom has released plans for changing from driving in the wrong side of the road to driving in the right side of the road. There will be a phased implementation starting with lorries, then black London Taxis and red double-decker busses and finally all other vehicles.

The phased implementation is explained by a UK government spokesman by saying: We don’t believe in a big bang implementation.

Bookmark and Share