Names, Addresses and National Identification Numbers

When working with customer, or rather party, master data management and related data quality improvement and prevention for traditional offline and some online purposes, you will most often deal with names, addresses and national identification numbers.

While this may be tough enough for domestic data, doing this for international data is a daunting task.

Names

In reality there should be no difference between dealing with domestic data and international data when it comes to names, as people in today’s globalized world move between countries and bring their names with them.

Traditionally the emphasize on data quality related to names has been on dealing with the most frequent issues be that heaps of nick names in the United States and other places, having a “van” in bulks of names in the Netherlands or having loads of surname like middle names in Denmark.

With company names there are some differences to be considered like the inclusion of legal forms in company names as told in the post Legal Forms from Hell.

UPU S42Addresses

Address formats varies between countries. That’s one thing.

The availability of public sources for address reference data varies too. These variations are related to for example:

  • Coverage: Is every part of the country included?
  • Depth: Is it street level, house number level or unit level?
  • Costs: Are reference data expensive or free of charge?

As told in the post Postal Code Musings the postal code system in a given country may be the key (or not) to how to deal with addresses and related data quality.

National Identification Numbers

The post called Business Entity Identifiers includes how countries have different implementations of either all-purpose national identification numbers or single-purpose national identification numbers for companies.

The same way there are different administrative practices for individuals, for example:

  • As I understand it is forbidden by constitution down under to have all-purpose identification numbers for individuals.
  • The United States Social Security Number (SSN) is often mentioned in articles about party data management. It’s an example of a single-purpose number in fact used for several purposes.
  • In Scandinavian countries all-purpose national identification numbers are in place as explained in the post Citizen ID within seconds.

Dealing with diversity

Managing party master data in the light of the above mentioned differences around the world isn’t simple. You need comprehensive data governance policies and business rules, you need elaborate data models and you need a quite well equipped toolbox regarding data quality prevention and exploiting external reference data.

Bookmark and Share

Hurray! I am on LinkedIn

These days LinkedIn are celebrating passing 200 million profiles.

This is done by sending us members a mail telling about our part in the success.

The mail message is easily sharable on LinkedIn, Twitter and Facebook. What I’ve seen is that you can be among the 1 % most viewed (including yours truly), the 5 % most viewed, the 10 % most viewed and among the first 500,000 members in a given country.

The latter incident includes for example being among the first 500,000 members in Malta.

Malta LinkedIn

I guess that will include every member in Malta as Malta has a population around 450,000, unless of course the Maltese are world champions in creating duplicate profiles.

Bookmark and Share

My Name is Bond. Jimmy Bond.

Right now the 23rd James Bond film called Skyfall is out in cinemas. And oh yes, he does say that his name is Bond. James Bond.

There were actually some films before the current row of James Bond films based on Ian Fleming’s character. The first one was Casino Royale from 1954. This was a pure American production and herein James Bond was an American agent mostly referred to as Jimmy Bond.

There are plenty of examples around on how films and TV series are adopted for a foreign audience by changing the characters to have local names and habits.

When preparing software, including data quality tools and master data management solutions, you have the same balancing to do. Should you emphasis on the strength of the product based on a particular advantage within the country where the product is born or do you have to rewrite some features and unique selling points to make it understandable and feasible in another part of the world?

This challenge is close to me as I’m working with internationalization of the iDQ service. This service is born in a Scandinavian context where there is good availability around public sector master data indentifying and describing addresses, companies and individuals which helps with getting high quality contact master data.

But this may not resonate as well in a British context where ability to do rapid addressing and support vanity addressing may be the current hot stuff or in an American context where external reference data are much more privatized.

Technically the services will be pretty much the same, but it has to be twisted a bit and so do the story telling around the service.

Bookmark and Share

Hotel Rating Data Quality

Whether you are traveling for business or pleasure you like to stay in a hotel that suites your expectations.

What is good and what is bad differs between us individuals. But we may all belong to some type of stereotype depending on from where in the world we are from. For example, if I walk into an even modest rated American driven (managed) hotel anywhere in the world, I am pretty sure that there will be a bed much larger that I actually need. On a local driven hotel I’m not so sure.

The most common used hotel rating methodology are one to five stars rating systems. However, the classification criteria are not universal. They differ from country to country. Some countries have a public regulated system, in some countries the industry sets the standards and in some countries there are competing systems.

So, I can’t be sure that three stars in one country means the same as three stars in another country. One of my personal foremost requirements is that there is a WiFI available. In the Swiss criteria that will be only 2 out of 863 possible points. So I couldn’t be sure even on a five star hotel. Using the English criteria I will have to go for a four star hotel to be sure.

Besides official ratings social ratings has become more and more popular. Typically guests rates the hotels on the portal where they booked using a scale from 1 to 10 and you may add verbal descriptions about the appealing things and even more popular the appalling things.

Bookmark and Share

Cross Border Data Quality

In data quality improvement you always have to find a balance between the almost impossible, and usually not sensible, vision of achieving zero percent defects and the good old 80-20 rule about aiming at the 80% most frequent issues and leaving the 20% not so frequent issues to a random fate.

One of the issues that usually falls into the 20% neglected issues is cross border challenges with contact master data.

In a recent blog post on the Postcode Anywhere blog Graham Rhind describes the data quality flaws arising from his relocation from Holland in the Netherlands to Germany. The post is called Validate … intelligently.

Personally I have had a lot of similar issues when moving from Denmark to England in the United Kingdom as for example described in the post Staying in Doggerland.

My guess is that we will see an increasing demand for cross border data quality services not at least as regulators are increasingly looking into cross border issues. The FATCA regulation from the United States tax authorities is an example as described in the post The Taxman: Data Quality’s Best Friend.

As globalization moves forward organizations will increasingly work cross border, people will move between countries and more frequently live in one country and work in another country and buy services in another country. In coping with this reality you can’t keep up with data quality by just using a National Change of Address service and other data quality services focused on and optimized for a single country.

Bookmark and Share

Sometimes Google Translate is a Foolish Friendship

This morning I stumbled upon an article in a Norwegian online newspaper. A rather unlikely incident actually happened to a driver, as he avoided hitting an elk on the road, but then ran into a bear.

The original text in Norwegian is here:

As I wanted to see how that would be in English, I hit the Google Translate button:

In the headline the two animals are translated from “elg” to “elk” and from “bjørn” to “bear”. Very well.

But in the subtitle the two words are translated differently. Now “elg” is “moose” and “bjørn” is “disservice”.

Hmmm…

Not sure why elk is substituted to moose. The two words are used synonymously. As I understand it, it must have been a moose, which is called an elk. Wkipedia has the details here.

But how did the bear become a disservice. Well, I guess it relates to an old fable called “The Bear and the Gardener” or the variant “The Hermit and the Bear”. Here a human becomes friend with a bear. While the man takes a nap, the bear helps driving off the flies, but eventually crushes the mans head in doing so. The moral is that you should not make foolish friendships.

In Danish/Norwegian such a well-meant but very bad attempt to help is a “bear’s service” (bjørnetjeneste) also known in German as a bärendienst. Just like Google Translate in this case became a disservice.

Bookmark and Share

The Big Tower of Babel

3 years ago one of the first blog posts on this blog was called The Tower of Babel.

This post was the first of many posts about multi-cultural challenges in data quality improvement. These challenges includes not only language variations but also different character sets reflecting different alphabets and script systems, naming traditions, address formats, measure units, privacy norms, government registration practice to name some of the ones I have experienced.

When organizations are working internationally it may be tempting to build a new Tower of Babel imposing the same language for metadata (probably English) and the same standards for names, addresses and other master data (probably the ones of the country where the head quarter is).

However, building such a high tower may end up the same way as the Tower of Babel known from the old religious tales.

Alternatively a mapping approach may be technically a bit more complex but much easier when it comes to change management.

The mapping approach is used in the Universal Postal Unions’ (UPU) attempt to make a “standard” for worldwide addresses. The UPU S42 standard is mentioned in the post Down the Street. The S42 standard does not impose the same way of writing on envelopes all over the world, but facilitates mapping the existing ways into a common tagging mapped to a common structure.

Building such a mapping based “standard” for addresses, and other master data with international diversity, in your organization may be a very good way to cope with balancing the need for standardization and the risks in change management including having trusted and actionable master data.

The principle of embracing and mapping international diversity is a core element in the service I’m currently working with. It’s not that the instant Data Quality service doesn’t stretch into the clouds. Certainly it is a cloud service pulling data quality from the cloud. It’s not that that it isn’t big. Certainly it is based on big reference data.

Bookmark and Share

Beyond Address Validation

The quality of contact master data is the number one data quality issue around.

Lately there has been a lot of momentum among data quality tool providers in offering services for getting at least the postal address in contact data right. The new services are improved by:

  • Being cloud based offering validation services that are implemented at data entry and based on fresh reference data.
  • Being international and thus providing address validation for customer and other party data embracing a globalized world.

Capturing an address that is aligned with the real world may have a significant effect on business outcomes as reported by the tool vendor WorldAddresses in a recent blog post.

However, a valid address based on address reference data only tells you if the address is valid, not if the addressee is (still) on the address, and you are not sure if the name and other master data elements are accurate and complete. Therefore you often need to combine address reference data with other big reference data sources as business directories and consumer/citizen reference sources.

Using business directories is not new at all. Big reference sources as the D&B WorldBase and many other directories have been around for many years and been a core element in many data quality initiatives with customer data in business-to-business (B2B) environments and with supplier master data.

Combining address reference data and business entity reference data makes things even better, also because business directories doesn’t always come with a valid address.

Using public available reference data when registering private consumers, employees and other citizen roles has until now been practiced in some industries and for special reasons. Therefore the big reference data and the services are out there and being used today in some business processes.

Mashing up address reference data, business entity reference data and consumer/citizen reference data is a big opportunity for many organizations in the quest for high quality contact master data, as most organizations actually interact with both companies and private persons if we look at the total mix of business processes.

The next big source is going to be exploiting social network profiles as well. As told in the post Social Master Data Management social media will be an additional source of knowledge about our business partners. Again, you won’t find the full truth here either. You have to mashup all the sources.

Bookmark and Share

Sharing Bigger Data

Yesterday I attended an event called Big Data Forum 2012 held in London.

Big data seems to be yet a buzzing term with many definitions. Anyway, surely it is about datasets that are bigger (and more complex) than before.

The Olympics is Going to be Bigger

One session on the big data forum was about how BBC will use big data in covering the upcoming London Olympics on the BBC website.

James Howard who I know as speckled_jim on Twitter told that the bulk of the content on the BBC Sports website is not produced by BBC. The data is sourced from external data providers and actually also the structure of the content is based on the external sources.

So for the Olympics there will be rich content about all the 10,000 athletes coming from all over the world. The BBC editorial stuff will be linked to this content of course emphasizing on the British athletes.

I guess that other broadcasting bodies and sports websites from all over the world will base the bulk of the content from the same sources and then more or less link targeted own produced content in the same way and with their look and feel.

There are some data quality issues related to sourcing such data Jim told. For example you may have your own guideline for how to spell names in other script systems.

I have noticed exactly that issue in the news from major broadcasters. For example BBC spells the new Egyptian president Mursi while CNN says his name is Morsi.

Bigger Data in Party Master Data Management

The postal validation firm Postcode Anywhere recently had a blog post called Big Data – What’s the Big Deal?

The post has the well known sentiment that you may use your resources better by addressing data quality in “small data” rather than fighting with big data and that getting valid addresses in your party master data is a very good place to start.

I can’t agree more about getting valid addresses.

However I also see some opportunities in sharing bigger datasets for valid addresses. For example:

  • The reference dataset for UK addresses typically based on the Royal Mail Postal Address File (PAF) is not that big. But the reference dataset for addresses from all over the world is bigger and more complex. And along with increasing globalization we need valid addresses from all over the world.
  • Rich address reference data will be more and more available. The UK PAF file is not that big. The AddressBase from Ordnance Survey in the UK is bigger and more complex. So are similar location reference data with more information than basic postal attributes from all over world not at least when addressed together.
  • A valid address based on address reference data only tells you if the address is valid, not if the addressee is (still) on the address. Therefore you often need to combine address reference data with business directories and consumer/citizen reference sources. That means bigger and more complex data as well.

Similar to how BBC is covering the Olympics my guess is that organizations will increasingly share bigger public address, business entity and consumer/citizen reference data and link private master data that you find more accurate (like the spelling example) along with essential data elements that better supports your way of doing business and makes you more competitive.

My recent post Mashing Up Big Reference Data and Internal Master Data describes a solution for linking bigger data within business processes in order to get a valid address and beyond.

Bookmark and Share

Pulling Data Quality from the Cloud

In a recent post here on the blog the benefits of instant data enrichment was discussed.

In the contact data capture context these are some examples:

  • Getting a standardized address at contact data entry makes it possible for you to easily link to sources with geo codes, property information and other location data.
  • Obtaining a company registration number or other legal entity identifier (LEI) at data entry makes it possible to enrich with a wealth of available data held in public and commercial sources.
  • Having a person’s name spelled according to available sources for the country in question helps a lot with typical data quality issues as uniqueness and consistency.

However, if you are doing business in many countries it is a daunting task to connect with the best of breed sources of big reference data. Add to that, that many enterprises are doing both business-to-business (B2B) and business-to-consumer (B2C) activities including interacting with small business owners. This means you have to link to the best sources available for addresses, companies and individuals.

A solution to this challenge is using Cloud Service Brokerage (CSB).

An example of a Cloud Service Brokerage suite for contact data quality is the instant Data Quality (iDQ™) service I’m working with right now.

This service can connect to big reference data cloud services from all over the world. Some services are open data services in the contact data realm, some are international commercial directories, some are the wealth of national reference data services for addresses, companies and individuals and even social network profiles are on the radar.

Bookmark and Share