Citizen Master Data Management

Citizen Master Data Management in the public sector is the equivalence of Customer Master Data Management in the private sector.

Where are we?

As private organizations find different solutions to how to manage customer master data, governments around the world also have found their particular solution for managing citizen master data.

Most descriptions on data management are originated in the United States and so are also many examples and issues related to citizen master data management. One example is this blog post from IBM Initiate called The End of the Social Security Number?

As mentioned in the post there are different administrative practices around the world where governments may learn from experiences with alternative solutions in other countries.

During last year’s discussion in Canada about the census form I had the chance to write a guest blog post on a Canadian blog about How Denmark does it.

The way of the world does change. One example is the program in India called Aadhaar aiming at providing a unique national ID for the over one billion people living in India.

When to register?

The question about when a citizen has to be included in a citizen master data registry of course depends on the purpose of the registry. If the single purpose for example is driving license administration it will depend on when a citizen may obtain a driving license and that will exclude citizens under a certain age depending on the rules in place. The same applies to an electoral roll.

In my country we have an all-purpose citizen master data hub, which today means that a new born is registered and provided a unique Citizen ID within seconds.

Similar considerations apply to immigration and cross boarder employment.

What to store?

Citizen master data registries typically hold attributes as an identifier, name and address and status information.

As new technologies matures governments of course considers if such technologies may be feasible and may add benefits as part of the master data stored about citizens.

Using biometrics is a controversial topic here. The pros and cons were discussed, based on the cancelled program in the United Kingdom, in the post Citizen ID and Biometrics.

Who will share?

Privacy considerations are paramount in most discussions around citizen master data hubs.

Even if you have an all-purpose citizen registry there will be laws limiting how public sector may exploit data identified with the registry and the identifier in use.

On the other hand, in some countries even private sector organizations may benefit from such a master data hub.

An example from Sweden is shown here in the post No Privacy Customer Onboarding.

Bookmark and Share

Some Voter Musings

Tomorrow there is a general election in my home country Denmark.

Voter registration

There are different systems of voter registration around the world.

In some countries there are electoral roles being data silos of citizen master data more or less integrated with other citizen master data silos for other purposes as driving license administration, social security and taxation.

In Denmark we have an all-purpose single master data hub for citizens. When we have to vote, the ballots are extracted from the hub based on your age (from 18 on election day) and citizen status (excluding citizens of other countries living or working here).

The political scope

The voter’s role is to select members for the parliament. Then the parliament will select a prime minister.

One of the two most likely candidates for next prime minister is the current one with the nickname “Little Lars”, who came to power when the former one became general secretary of NATO and moved to the HQ in Brussels. Lars is head of the political party called Left (Venstre), which is a right wing party. He is going to defend the welfare state, including universal healthcare and free college.

His main opponent has the nickname “Gucci Helle”.  She is leading the left block. She is going to defend the welfare state, including universal healthcare and free college.

Head of state

As voters we are not trusted to select the head of state. The queen was born to be queen, and her eldest son will be the next king. On the other hand, the members of the Royal Family are not allowed to vote in the election.  This is the exception that confirms the rule.

Bookmark and Share

The Location Domain

When talking master data management we usually divide the discipline into domains, where the two most prominent domains are:

  • Customer, or rather party, master data management
  • Product, sometimes also named “things”, master data management

One the most frequent mentioned additional domains are locations.

But despite that locations are all around we seldom see a business initiative aimed at enterprise wide location data management under a slogan of having a 360 degree view of locations. Most often locations are seen as a subset of either the party master data or in some cases the product master data.  

Industry diversity

The need for having locations as focus area varies between industries.

In some industries like public transit, where I have been working a lot, locations are implicit in the delivered services. Travel and hospitality is another example of a tight connection between the product and a location. Also some insurance products have a location element. And do I have to mention real estate: Location, Location, Location.

In other industries the location has a more moderate relation to the product domain. There may be some considerations around plant and warehouse locations, but that’s usually not high volume and complex stuff.  

Locations as a main factor in exploiting demographic stereotypes are important in retailing and other business-to-consumer (B2C) activities. When doing B2C you often want to see your customer as the household where the location is a main, but treacherous, factor in doing so. We had a discussion on the house-holding dilemma in the LinkedIn Data Matching group recently.

Whenever you, or a partner of yours, are delivering physical goods or a physical letter of any kind to a customer, it’s crucial to have high quality location master data. The impact of not having that is of course dependent on the volume of deliveries.   

Globalization

If you ask me about London, I will instinctively think about the London in England. But there is a pretty big London in Canada too, that would be top of mind to other people. And there are other smaller Londons around the world.

Master data with location attributes does increasingly come in populations covering more than one country. It’s not that ambiguous place names don’t exist in single country sets. Ambiguous place names were the main driver behind that many countries have a postal code system. However the British, and the Canadians, invented a system including letters opposite to most other systems only having numbers typically with an embedded geographic hierarchy.

Apart from the different standards used around the possibilities for exploiting external reference data is very different concerning data quality dimensions as timeliness, consistency, completeness, conformity – and price.

Handling location data from many countries at the same time ruins many best practices of handling location data that have worked for handling location for a single country.

Geocoding

Instead of identifying locations in a textual way by having country codes, state/province abbreviations, postal codes and/or city names, street names and types or blocks and house numbers and names it has become increasingly popular to use geocoding as supplement or even alternative.

There are different types of geocodes out there suitable for different purposes. Examples are:

  • Latitude and longitude picturing a round world,
  • UTM X,Y coordinates picturing peels of the world
  • WGS84 X, Y coordinates picturing a world as flat as your computer screen.

While geocoding has a lot to offer in identifying and global standardization we of course has a gap between geocodes and everyday language. If you want to learn more then come and visit me at N55’’38’47, E12’’32’58.

Bookmark and Share

Geocoding from 100 Feet Under

I stumbled upon this image posted by Ellie K. on Google+

The title is World map of Flickr and Twitter locations and the legend is that red dots are locations of Flickr pictures, blue dots are locations of Twitter tweets and white dots are locations that have been posted to both.

You may be able to see your city following this link.

For example Copenhagen looks like this:

Here you have Copenhagen in Denmark to the left and Malmoe in Sweden to the right.

The strip between is the fixed link known as the Øresund Bridge.

However the connection isn’t entirely a bridge. If you look at a flyover picture you may think that there wasn’t money enough to finish the connection. Fortunately there was. The part closest to Copenhagen Airport is a 4 kilometer (2.5 miles) undersea tunnel.

So what puzzles me is the dots apparently representing Flickr uploads and tweets made from the tunnel. Are you able to upload to Flickr from down there? How are the tweets geocoded with that precision? My GPS never works when passing the tunnel.

(PS: I know you may geotag when back to surface)

Bookmark and Share

Klout Data Quality

Today it was announced that yet a social media service has passed a 100 million mark, as now 100 Million People have Klout.

Klout is a service that measures your online influence based on your activity on Twitter, LinkedIn, FaceBook and so on. The main measure is a score between 1 and 100.

 

As many others I have from time to time been tempted to have a narcissistic look at my profile. I haven’t recorded it, but it seems to me that some of the other attributes on Klout changes a lot. Or maybe it’s just me who is moving around in the social media realm in all directions.

Today my Klout style is being a “broadcaster”. And that may be right, as I’m re-tweeting a lot of links. But I’m sure I was a “specialist” the last time I checked, and that is in the opposite corner of the style quadrant. Well, never mind, every description of the styles is positive.

Klout also have beliefs in what topics you are influential about. One of my top 10 topics is “magic”. I think I must be more careful about tweeting about “data quality magic”. Another topic of mine is “Tripoli”. That’s right too; I did make one tweet about Tripoli that ended up as an information quality trainwreck.

Unfortunately I’m not influential about data quality or MDM at all. I’ll have to work on that.

Bookmark and Share

International Data Steward of the Year

The 11th October is declared International Data Steward Day by the Data Roundtable and yesterday I threw in my candidate for the The Data Steward of the Year. So the next month I will be lobbying the fine selection of judges.

It’s going to be hard work as my candidate is behind from the start, as she will not see the 11th October 2011 as 10.11.11 but as 11.10.11. Let’s see if the contest is truly international or if the US candidates are playing on home ground.

Bookmark and Share

The Database versus the Hub

In the LinkedIn Multi-Domain MDM group we have an ongoing discussion about why you need a master data hub when you already got some workflow, UI and a database.

I have been involved in several master data quality improvement programs without having the opportunity of storing the results in a genuine MDM solution, for example as described in the post Lean MDM. And of course this may very well result in a success story.

However there are some architectural reasons why many more organizations than those who are using a MDM hub today may find benefits in sooner or later having a Master Data hub.

Hierarchical Completeness

If we start with product master data the main issue with storing product master data is the diversity in the requirements for which attributes is needed and when they are needed dependent on the categorization of the products involved.

Typical you will have hundreds or thousands of different attributes where some are crucial for one kind of product and absolutely ridiculous for another kind of product.

Modeling a single product table with thousands of attributes is not a good database practice and pre-modeling tables for each thought categorization is very inflexible.

Setting up mandatory fields on database level for product master data tables is asking for data quality issues as you can’t miss either over-killing or under-killing.

Also product master data entities are seldom created in one single insertion, but is inserted and updated by several different employees each responsible for a set of attributes until it is ready to be approved as a whole.

A master data hub, not at least those born in the product domain, is built for those realities.

The party domain has hierarchical issues too. One example will be if a state/province is mandatory on an address, which is dependent on the country in question.

Single Business Partner View

I like the term “single business partner view” as a higher vision for the more common “single customer view”, as we have the same architectural requirements for supplier master data, employee master data and other master data concerning business partners as we have for the of course extremely important customer master data.

The uniqueness dimension of data quality has a really hard time in common database managers. Having duplicate customer, supplier and employee master data records is the most frequent data quality issue around.

In this sense, a duplicate party is not a record with accurately the same fields filled and with accurate the same values spelled accurately the same as a database will see it. A duplicate is one record reflecting the same real world entity as another record and a duplicate group is more records reflecting the same real world entity.

Even though some database managers have fuzzy capabilities they are still very inadequate in finding these duplicates based on including several attributes at one time and not at least finding duplicate groups.

Finding duplicates when inserting supposed new entities into your customer list and other party master data containers is only the first challenge concerning uniqueness. Next you have to solve the so called survivorship questions being what values will survive unavoidable differences.

Finally the results to be stored may have several constructing outcomes. Maybe a new insertion must be split into two entities belonging to two different hierarchy levels in your party master data universe.

A master data hub will have the capabilities to solve this complexity, some for customer master data only, some also for supplier master data combined with similar challenges with product master data and eventually also other party master data.

Domain Real World Awareness

Building hierarchies, filling incomplete attributes and consolidating duplicates and other forms of real world alignment is most often fulfilled by including external reference data.

There are many sources available for party master as address directories, business directories and citizen information dependent on countries in question.

With product master data global data synchronization involving common product identifiers and product classifications is becoming very important when doing business the lean way.

Master data hubs knows these sources of external reference data so you, once again, don’t have to reinvent the wheel.

Bookmark and Share

Down the Street

Having an address consisting of a house number and a street name, or vice versa, is the usual way of addressing in most parts of the world. This construct is also featured in the presentation of the Universal Postal Union’s (UPU) international standard initiative (S42):

(Click on image to see the presentation)

Somehow I always end up living at a place with issues in relation to this construct.

Our current address is (without unit):

“Kenny Drews Vej 27” which would be “27 Kenny Drews Way” in an Anglo-phone country.

But our area has a new style of block buildings with canals between as we like to pretend that we live in Venice or Amsterdam:

This means that the house numbers aren’t sequenced down the street, but is spread round the block as if we were living in Japan. Google maps have the position exactly as it is:

Number 27 on Kenny Drews Vej is actually much closer to two other streets, which makes it very difficult when people are visiting us the first time and for some also the second time.

But that’s because I, and some of our visitors, are old fashioned. As Prashanta Chan says in his blog post Geocoding: Accurate Location Master Data: It will be much better to invite folks to your geocode.

The same thing applies to when you want some goods delivered to your premises or want a taxi as close to your front door as possible.

And regarding letters delivered by the good old postman: They will probably all be sent electronically before the UPU S42 addressing mapping standard is adapted by everyone.

Bookmark and Share

The trees never grow into heaven

This morning most of digital Denmark was closed. You couldn’t do anything at the online bank, you couldn’t do much at public sector websites and you couldn’t read electronic mail from your employer, pension institution and others.

It wasn’t because someone cut a big cable or a computer virus got a lucky strike. The problem was that the centralized internet login service had a three hour outage. It was a classic single point of failure incident.

In Denmark we have a single sign-on identity solution used by public sector, financial services and other organizations. The service is called NemID (Easy ID) and is based on an all-purpose unique national ID for every citizen.

As more and more interaction with public sector and financial services along with online shopping is taking place in the cloud, we are of course more and more vulnerable to these kind of problems.

The benefits of having a single source of truth about who you are became a single point of failure here.

Well, we have this local saying: “The trees never grow into heaven”. All good things have their limit. Even in instant Identity Resolution.

Bookmark and Share

Oranges, Apples and Pears go Bananas

My post yesterday about Data Quality Evangelism included the fruit oranges and a comment from Jim Harris added apples to the analogies by using the idiom about comparing apples and oranges.

There are a lot of linguistic musings around the words apples and oranges.

In many languages we use the similar idiom as comparing apples and pears. But it may be geographic depended as in European French it is apples and pears but in Quebec French it is apples and oranges.

In some Germanic languages the fruit orange can be translated as “Chinese apple”. For example the Dutch word is “sinaasappel”  and the Danish/Norwegian word is “appelsin”. In Germany it is “Apfelsine” in the North and “Orange” in the South. The linguistic line across Germany is by the way called the apple-line, but for the opposite reason.

In English a “Chinese apple” is a pomegranate.

The word orange has two meanings in English: A fruit and the color (as they write in American English) or a colour (as they write on the British English).

The two meanings make Google Translate go bananas. When Google translates between languages it does it via English. So if I translate “appelsin” from Danish to Dutch I don’t get “sinaasappel”. Instead I get “oranje”, the Dutch national color.

No wonder Data Quality Evangelism most often isn’t fruitful.

Bookmark and Share