Getting a single customer view in business-to-business (B2B) operations isn’t straight forward. Besides all the fuzz about agreeing on a common definition of a customer within each enterprise usually revolving around fitting multiple purposes of use, we also have complexities in real world alignment.
One Number Utopia
Back in the 80’s I worked as a secretary for the committee that prepared a single registry for companies in Denmark. This practice has been live for many years now.
But in most other countries there are several different public registries for companies resulting in multiple numbering systems.
Within the European Union there is a common registry embracing VAT numbers from all member states. The standard format is the two letter ISO country code followed by the different formatted VAT number in each country – some with both digits and letters.
The DUNS-number used by Dun & Bradstreet is the closest we get to a world-wide unique company numbering system.
The common structure of a company is that you have a legal entity occupying one or several addresses.
The French company numbering system is a good example of how this is modeled. You have two numbers:
- SIREN is a 9-digit number for each legal entity (on the head quarter address).
- SIRET is a 14-digit (9 + 5) number for each business location.
This model is good for companies with several locations but strange for single location companies.
Treacherous Family Trees (and Restaurants)
The need for hierarchy management is obvious when it comes to handling data about customers that belongs to a global enterprise.
Company family trees are useful but treacherous. A mother and a daughter may be very close connected with lots of shared services or it may be a strictly matter of ownership with no operational ties at all.
Take McDonald’s as a not perfectly simple (nor simply perfect) example. A McDonald’s restaurant is operated by a franchisee, an affiliate, or the corporation itself. I’m lovin’ modeling it.
This is the second post in a series of short blog posts focusing on data quality related to different countries around the world. I am not aiming at presenting a single version of the full truth but rather presenting a few random observations that I hope someone living in or with knowledge about the country are able to clarify in a comment.
India‘s culture is marked by a high degree of syncretism and cultural pluralism. Every state and union territory has its own official languages, and the constitution also recognizes 21 languages.
National Identification Number for 1.2 Billion People
The government of India has initiated a program for assigning a unique citizen ID for the over 1.2 billion people living in India. The program called Aadhaar is the largest of that kind in the world.
A System Integration Superpower
Tata, Satyam, Infosys, Wipro is just some of the many mega system integrators within master data management and data quality with headquarters in India. Add to that that companies like Cognizant and many others have most of their professionals based in India.
This post is a follow up on today’s #DataKnightsJam happening on twitter. Today’s subject was data quality and data privacy.
Diversity in data quality is a subject discussed a lot of times on this blog.
So I want to share a real life example of a good upstream get it right first time data sharing approach that might compromise privacy thresholds in other places.
The image to the right is the data entry form from a Swedish webshop used for customer self-registration. The main flow is that:
- You type your national ID (personnummer in Swedish)
- You press the following button
- The system fetches your name and address data from the public citizen hub
- The webshop gets an accurate, complete single customer view
The webshop www.jula.se sells tools for home improvement.
As I have stated earlier on this blog: The solution to the single most frequent data quality problem being party master data duplicates is actually very simple: Every person (and every legal entity) gets a unique identifier which is used everywhere by everyone.
Some countries, like Denmark where I live, has a unique Citizen ID (National identification number). Some countries are on the way like India with the Aadhaar project. But some of the countries with the largest economies in the world like United Kingdom, Germany and United States don’t seem to getting it in the near future.
I think United Kingdom was close lately, but as I understand it the project was cancelled. As seen in a tweet from a discussion on twitter today the main obstacles were privacy considerations and costs:
A considerable cost in the suggested project in United Kingdom, and also as I have seen in discussions for a US project, may be that an implementation today should also include biometric technology.
The question is however if that is necessary.
If we look at the systems in force today for example in Scandinavia they were implemented +40 years ago, and the Swedish citizen ID was actually implemented without digitalization in 1947. There are discussions going on about biometrics also as this is inevitable for issuing passports anyway. In the mean time the systems however continues to make a lot of data quality prevention and party master data management a lot easier than else around the world without having biometrics as a component.
No doubt about that biometrics will solve some problems related to fraud and so. But these are rare exceptions. So the cost/benefit analysis for enhancing an existing system with biometrics seems to be negative.
I guess the alleged need for biometric may have something to do with privacy considerations in a strange way: Privacy considerations are often overruled by the requirements for fighting terrorism – and here you need biometrics in identity resolution.
Since I have just relocated (and we have just passed the new year resolution point) I have become a member of the nearby fitness club.
Guess what: They got my name, address and birthday absolutely right the first time.
Now, this could have been because the young lady at the counter is a magnificent data entry person. But I think that her main competency actually rightfully is being a splendid fitness instructor.
What she did was that she asked for my citizen ID card and took the data from there. A little less privacy yes, but surely a lot better for data quality – or data fitness (credit Frank Harland) you might say.
Recently I have been reading some blog posts circling around having a national ID for citizens in the United States including a post from Steve Sarsfield and another post from Jeffrey Huth of Initiate.
In Denmark where I live we have had such a national ID for about half a century. So if you are a vendor with a great solution for data matching and master data management in healthcare and wants to approach a Danish prospect in healthcare (which are mainly public sector here), they will tell you, that the solutions looks really nice, but they don’t have that problem. You can’t stay many seconds as a patient in a Danish hospital before you are asked to provide your national ID. And if you came in inside your mother you will be given an ID for life within seconds after you are born.
The same national ID is the basis when we have elections. Some weeks before the authorities will push the button and every person with the right status and age gets a ballot. Therefore we are in disbelief when we every fourth year are following when United States elects a president and we learn about all the mess in voter registration.
Is that happening in the nation that put a man on the moon in 1969?. Or did they? Was it after all a studio recording?
I am currently involved in a data management program dealing with multi-entity (multi-domain) master data management described here.
Besides covering several different data domains as business partners, products, locations and timetables the data also serves multiple purposes of use. The client is within public transit so the subject areas are called terms as production planning (scheduling), operation monitoring, fare collection and use of service.
A key principle is that the same data should only be stored once, but in a way that makes it serve as high quality information in the different contexts. Doing that is often balancing between the two ways data may be of high quality:
- Either they are fit for their intended uses
- Or they correctly represent the real-world construct to which they refer
Some of the balancing has been:
For some intended uses you don’t have to know the precise identity of a passenger. For some other intended uses you must know the identity. The latter cases at my client include giving discounts based on age and transport need like when attending educational activity. Also when fighting fraud it helps knowing the identity. So the data governance policy (and a business rule) is that customers for most products must provide a national identification number.
Like it or not: Having the ID makes a lot of things easier. Uniqueness isn’t a big challenge like in many other master data programs. It is also a straight forward process when you like to enrich your data. An example here is accurately geocoding where your customer live, which is rather essential when you provide transportation services.
You may use a range of different coordinate systems to express a position as explained here on Wikipedia. Some systems refers to a round globe (and yes, the real world, the earth, is round), but it is a lot easier to use a system like the one called UTM where you easily may calculate the distance between two points directly in meters assuming the real world is as flat as your computer screen.
Here is a picture of my grandson Jonas taken minutes after his was born. He has a ribbon around his wrist showing his citizen ID which has just been assigned. There is even a barcode with it on the ribbon.
Now, I have mixed feelings about that. It is indeed very impersonal. But as a data quality professional I do realize that this is a way of solving a problem at the root. Duplicate master data in healthcare is a serious problem as Dylan Jones reported last year when he had a son in this article from DataQualityPro.
A unique citizen ID (National identification number) assigned in seconds after a birth have a lot of advantages. As said it is a foundation for data quality in healthcare from the very start of a life. Later when you get your first job you hand the citizen ID to your employer and tax is collected automatically. When the rest of the money is in the bank you are uniquely identified there. When you turn 18 you are seamlessly put on the electoral roll. Later your marriage is merely a relation in a government database between your citizen ID and the citizen ID of your beloved one.
Oh joy, Master Data Management at the very best.
The solution to the single most frequent data quality problem being party master data duplicates is actually very simple. Every person (and every legal entity) gets an unique identifier which is used everywhere by everyone.
Now India jumps the bandwagon and starts assigning a unique ID to the 1.2 billion people living in India. As I understand it the project has just been named Aadhar (or Aadhaar). Google translate tells me this word (आधार) means base or root – please correct if anyone knows better.
In Denmark we have had such an identifier (one for citizens and one for companies) for many years. It is not used by everyone everywhere – so you still are able to make money being a data quality professional specializing in data matching.
The main reason that the unique citizen identifier is not used all over is of course privacy considerations. As for the unique company identifier the reason is that data quality often are defined as fit for immediate purpose of use.
With the risk of having the comment area on this blog filled up with SQL statements I will follow the track and tone from the last post called Create Table Homo_Sapiens.
In the last post some challenges around modelling people in databases was discussed with focus on uniqueness. Now we will have a look at the same challenges with companies – the other big part of party master data.
Companies often act in the same role as individual people in business processes – not at least in the role as a customer. Companies also behave as persons in a lot of ways like being born (establish), change name, relocate, marry (mergers and acquisitions), divorce (split) and decease (dissolve).
All over the world a lot of people spend the days entering and updating the data held on business partners in numerous databases. The world wide sum of B2B connections between a customer and a vendor each entering and maintaining the data about the other resembles (though less aggressive) the grains on a chessboard story:
- 2 companies both exchanging goodies makes 1+1 customers and 1+1 vendors = 4 rows
- 3 companies all exchanging goodies makes 2+2+2 customers and 2+2+2 vendors = 12 rows
- 4 companies all exchanging goodies makes 3+3+3+3 customers and 3+3+3+3 vendors = 24 rows
- 5 companies all exchanging goodies makes 4+4+4+4+4 customers and 4+4+4+4+4 vendors = 40 rows
- n companies all exchanging goodies makes n*(n-1) customers and n*(n-1) vendors = 2*n*(n-1) rows
Last time I checked the D&B WorldBase held more the 150 millions companies. Some are dissolved and fortunately? everyone doesn’t do business with everyone – but as said, the sum of B2B connections is huge and the work in entering and maintaining the master data seems overwhelming.
If we look at one single company and how it may be represented differently in databases around only taking basic data as name and address into account, there will be lots of variations. Even in the same table the same real world company often occupies several rows spelled differently.
One of the most effective methods for avoiding duplicates of company master data is plugging into a business directory. By using an external sourced company ID as a key in your master data you are able to hold a golden record of that real world entity. As a bonus you are offered updates and access to a lot of additional data you would never be able to collect yourself.