Diversities in Civil Registration

Citizen Registry

The way governments around the world has organized their Master Data Management (MDM) is quite different. When it comes to registering citizens, the practice varies a lot as described in the post Citizen Master Data Management.

I have lived most of my years in Denmark where our national ID is unique and used for everything by public agencies and also a lot by private companies. Some years ago I lived in the United Kingdom, where the public agencies (and my bank) had no clue about who I were, when I came, what I did and when I left.

Recently the World Economic Forum has circulated some videos on LinkedIn telling about how stuff is done differently around the world. The video below is about the Danish civil registry (which by the way is similar in other Scandinavian countries):

What do you think? Would this public MDM and data quality practice work in USA, UK, Germany or where else you live?

Business Entity Identifiers

The least cumbersome way of uniquely identifying a business partner being a company, government body or other form of organization is to use an externally provided number.

However, there are quite a lot of different numbers to choose from.

All-Purpose National Identification Numbers

In some counties, like in Scandinavia, the public sector assigns a unique number to every company to be used in every relation to the public sector and open to be used by the private sector as well for identification purposes.

As reported in the post Single Company View I worked with the early implementation of such a number in Denmark way back in time.

Single-Purpose National Identification Numbers

In most countries there are multiple systems of numbers for companies each with an original special purpose. Examples are registration numbers, VAT numbers and employer identification numbers.

My current UK company has both a registration number and a VAT number and very embarrassing for a data quality and master data geek these two numbers have different names and addresses attached.

Other Numbering Systems

The best known business entity numbering system around the world is probably the DUNS-number used by Dun & Bradstreet. As examined in the post Select Company_ID from External_Source Where Possible the use of DUNS-numbers and similar business directory id’s is a very common way of uniquely identifying business partners.

In the manufacturing and retail world legal entities may, as part of the Global Data Synchronization Network, be identified with a Global Location Number (GLN).

There has been a lot of talk in the financial sector lately around implementing yet a new numbering system for legal entities with an identifier usually abbreviated as LEI. Wikipedia has the details about a Legal Entity Identification for Financial Contracts.

These are only some of the most used numbering systems for business entities.

So, the trend doesn’t seem to be a single source of truth but multiple sources making up some kind of the truth.

Bookmark and Share

Instant Data Enrichment

Data enrichment is one of the core activities within data quality improvement. Data enrichment is about updating your data in order to be more real world aligned by correcting and completing with data from external reference data sources.

Traditionally data enrichment has been a follow up activity to data matching and doing data matching as a prerequisite for data enrichment has been a good part of my data quality endeavor during the recent 15 years as reported in the post The GlobalMatchBox.

During the last couple of years I have tried to be part of the quest for doing something about poor data quality by moving the activities upstream. Upstream data quality prevention is better than downstream data cleansing wherever applicable. Doing the data enrichment at data capture is the fast track to improve data quality for example by avoiding contact data entry flaws.

It’s not that you have to enrich with all the possible data available from external sources at once. What is the most important thing is that you are able to link back to external sources without having to do (too much) fuzzy data matching later. Some examples:

  • Getting a standardized address at contact data entry makes it possible for you to easily link to sources with geo codes, property information and other location data at a later point.
  • Obtaining a company registration number or other legal entity identifier (LEI) at data entry makes it possible to enrich with a wealth of available data held in public and commercial sources.
  • Having a person’s name spelled according to available sources for the country in question helps a lot when you later have to match with other sources.

In that way your data will be fit for current and future multiple purposes.

Bookmark and Share

Finding Me

Many people have many names and addresses. So have I.

A search for me within Danish reference sources in the iDQ tool gives the following result:

Green T is positive in the Danish Telephone Books. Red C is negative in the Danish Citizen hub. Green C is positive in the Danish Citizen Hub.

Even though I have left Denmark I’m still registered with some phone subscriptions there. And my phone company hasn’t fully achieved single customer view yet, as I’m registered there with two slightly different middle (sur)names.

Following me to the United Kingdom I’m registered here with more different names.

It’s not that I’m attempting some kind of fraud, but as my surname contains The Letter Ø, and that letter isn’t part of the English alphabet, my National Insurance Number (kind of similar to the Social Security Number in the US) is registered by the name “Henrik Liliendahl Sorensen”.

But as the United Kingdom hasn’t a single citizen view, I am separately registered at the National Health Service with the name “Henrik Sorensen”. This is due to a sloppy realtor, who omitted my middle (sur)name on a flat rental contract. That name was taken further by British Gas onto my electricity bill. That document is (surprisingly for me) my most important identity paper in the UK, and it was used as proof of address when registering for health service.

How about you, do you also have several identities?

Bookmark and Share

Costs of a Single Citizen View

Recently Andrew Dean made a blog post called National Identity Numbers. The post generated some comments in the Data Matching group on LinkedIn.

Andrew’s post is based on the ongoing project in India called Aadhaar, where every citizen is assigned a unique identification number to be used for multiple purposes when interacting with the government and financial institutions.

As Andrew mentions the United Kingdom cancelled such a project a few years ago. This cancellation was, in some part, due to fear of excessive costs. The question Andrew, and comments in the LinkedIn group, poses, is if the (feared) costs will justify the benefits of getting a “single citizen view”.

Indeed large governmental projects have a bad name these days all over the world as I know it.

Back in the late 60’s the United States was able to put a man on the moon.

It was at the same time that the Scandinavian countries implemented their “single citizen view”.

Besides digitalizing the national identification number Sweden also, in 1967, managed to change from driving on the left side of the road to driving on the right side. I’m not sure if Sweden could afford turning to the right side today not to say the United Kingdom doing the same.

Bookmark and Share

Real World Identity

How far do you have to go when checking your customer’s identity?

This morning I read an article on the Danish Computerworld telling about a ferry line now dropping a solution for checking if the passenger using an access card is in fact the paying customer by using a lightweight fingerprint stored on the card. The reason for dropping was by the way due to the cost of upgrading the solution compared to future business value and not any renewed privacy concerns.

I have been involved in some balancing of real world alignment versus fitness for use and privacy in public transport as well as described in the post Real World Alignment. Here it was the question about using a national identification number when registering customers in public transportation.

As citizens of the world we are today used to sometimes having our iris scanned when flying as our passport holds our unique identification that way. Some of the considerations around using biometrics in general public registration were discussed in the post Citizen ID and Biometrics.

In my eyes, or should we say iris, there is no doubt that we will meet an increasing demand of confirming and registering our identification around. Doing that in the fight against terrorism has been there for long. Regulatory compliance will add to that trend as told in the post Know Your Foreign Customer, mentioning the consequences of the FATCA regulation and other regulations.

When talking about identity resolution in the data quality realm we usually deal with strings of text as names, addresses, phone numbers and national identification numbers. Things that reflect the real world, but isn’t the real world.

We will however probably adapt more facial recognition as examined in the post The New Face of Data Matching. We do have access to pictures in the cloud, as you may find your B2C customers picture on FaceBook and your B2B customer contacts picture on LinkedIn or other similar services. It’s still not the real world itself, but a bit closer than a text string. And of course the picture could be false or outdated and thus more suitable for traction on a dating site.

Fingerprint is maybe a bit old fashioned, but as said, more and more biometric passports are issued and the technology for iris and retinal scanning is used around for access control even on mobile devices.

In the story starting this post the business value for reinvesting in a biometric solution wasn’t deemed positive. But looking from the print on my fingers down to my hand lines I foresee some more identity resolution going beyond name and address strings into things closer to the real world as facial recognition and biometrics.

Bookmark and Share

Citizen Master Data Management

Citizen Master Data Management in the public sector is the equivalence of Customer Master Data Management in the private sector.

Where are we?

As private organizations find different solutions to how to manage customer master data, governments around the world also have found their particular solution for managing citizen master data.

Most descriptions on data management are originated in the United States and so are also many examples and issues related to citizen master data management. One example is this blog post from IBM Initiate called The End of the Social Security Number?

As mentioned in the post there are different administrative practices around the world where governments may learn from experiences with alternative solutions in other countries.

During last year’s discussion in Canada about the census form I had the chance to write a guest blog post on a Canadian blog about How Denmark does it.

The way of the world does change. One example is the program in India called Aadhaar aiming at providing a unique national ID for the over one billion people living in India.

When to register?

The question about when a citizen has to be included in a citizen master data registry of course depends on the purpose of the registry. If the single purpose for example is driving license administration it will depend on when a citizen may obtain a driving license and that will exclude citizens under a certain age depending on the rules in place. The same applies to an electoral roll.

In my country we have an all-purpose citizen master data hub, which today means that a new born is registered and provided a unique Citizen ID within seconds.

Similar considerations apply to immigration and cross boarder employment.

What to store?

Citizen master data registries typically hold attributes as an identifier, name and address and status information.

As new technologies matures governments of course considers if such technologies may be feasible and may add benefits as part of the master data stored about citizens.

Using biometrics is a controversial topic here. The pros and cons were discussed, based on the cancelled program in the United Kingdom, in the post Citizen ID and Biometrics.

Who will share?

Privacy considerations are paramount in most discussions around citizen master data hubs.

Even if you have an all-purpose citizen registry there will be laws limiting how public sector may exploit data identified with the registry and the identifier in use.

On the other hand, in some countries even private sector organizations may benefit from such a master data hub.

An example from Sweden is shown here in the post No Privacy Customer Onboarding.

Bookmark and Share

The trees never grow into heaven

This morning most of digital Denmark was closed. You couldn’t do anything at the online bank, you couldn’t do much at public sector websites and you couldn’t read electronic mail from your employer, pension institution and others.

It wasn’t because someone cut a big cable or a computer virus got a lucky strike. The problem was that the centralized internet login service had a three hour outage. It was a classic single point of failure incident.

In Denmark we have a single sign-on identity solution used by public sector, financial services and other organizations. The service is called NemID (Easy ID) and is based on an all-purpose unique national ID for every citizen.

As more and more interaction with public sector and financial services along with online shopping is taking place in the cloud, we are of course more and more vulnerable to these kind of problems.

The benefits of having a single source of truth about who you are became a single point of failure here.

Well, we have this local saying: “The trees never grow into heaven”. All good things have their limit. Even in instant Identity Resolution.

Bookmark and Share

Some Deduplication Tactics

When doing the data quality kind of deduplication you will often have two kinds of data matching involved:

  • Data matching in order to find duplicates internally in your master data, most often your customer database
  • Data matching in order to align your master data with an external registry

As the latter activity also helps with finding the internal duplicates, a good question is in which order to do these two activities.

External identifiers

If we for example look at business-to-business (B2B) customer master data it is possible to match against a business directory. Some choices are:

  • If you have mostly domestic data in a country with a public company registration you can obtain a national ID from matching with a business directory based on such a registry. An example will be the French SIREN/SIRET identifiers as mentioned in the post Single Company View.
  • Some registries cover a range of countries. An example is the EuroContactPool where each business entity is identified with a Site ID.
  • The Dun & Bradstreet WorldBase covers the whole world by identifying approximately 200 million active and dissolved business entities with a DUNS-number. The DUNS-number also serves as a privatized national ID for companies in the United States.

If you start with matching your B2B customers against such a registry, you will get a unique identifier that can be attached to your internal customer master data records which will make a succeeding internal deduplication a no-brainer.

Common matching issues

A problem is however is that you seldom get a 100 % hit rate in a business directory matching, often not even close as examined in the post 3 out of 10.

Another issue is the commercial implications. Business directory matching is often performed as an external service priced per record. Therefore you may save money by merging the duplicates before passing on to external matching. And even if everything is done internally, removing the duplicates before directory matching will save process load.

However a common pitfall is that an internal deduplication may merge two similar records that actually are represented by two different entities in the business directory (and the real world).

So, as many things data matching, the answer to the sequence question is often: Both.

A good process sequence may be this one:

  1. An internal deduplication with very tight settings
  2. A match against an external registry
  3. An internal deduplication exploiting external identifiers and having more loose settings for similarities not involving an external identifier

Bookmark and Share

A Business Rule and a Missing Master Data Hub

It seems that the United States of America has a problem with the business rule saying you have to be born in the country to become president and a missing citizen master data hub telling about who’s born in the country.

This is an aspect of a previous blog post called Did They Put a Man on the Moon.

Bookmark and Share