The solution to the single most frequent data quality problem being party master data duplicates is actually very simple. Every person (and every legal entity) gets an unique identifier which is used everywhere by everyone.
Now India jumps the bandwagon and starts assigning a unique ID to the 1.2 billion people living in India. As I understand it the project has just been named Aadhar (or Aadhaar). Google translate tells me this word (आधार) means base or root – please correct if anyone knows better.
In Denmark we have had such an identifier (one for citizens and one for companies) for many years. It is not used by everyone everywhere – so you still are able to make money being a data quality professional specializing in data matching.
The main reason that the unique citizen identifier is not used all over is of course privacy considerations. As for the unique company identifier the reason is that data quality often are defined as fit for immediate purpose of use.
I’m keen to learn more about the Danish experience.
When may a person choose NOT to provide their unique citizen identifier? I assume they MUST provide the number when opening a bank account, dealing with the State etc. ?
Regarding companies, who allocates the unique identifier?
Could you please elaborate on your final sentence ” As for the unique company identifier the reason is that data quality often are defined as fit for immediate purpose of use.”
Thanks Ken for your comment and questions.
Your right, the Danish citizen ID is used by the public sector, financial service as you say when opening a bank account, when getting employed and also often when having a credit for example a post paid phone subscription
The company ID is assigned by a public authority.
My brief remark about fit for immediate purpose of use is aimed at the situation where a public company ID may not be needed for the business process where master data is entered for example inserting a customer to be invoiced. Only later you regret that you didn’t catch that data for example when you want to consolidate and enrich master data from all over the enterprise.
On twitter @ashwinmaslekar says:
Aadhar/ Aadhaar means Support
Even the SSN, which in the US is unique has to be sometimes looked at from a human error perspective, which complicate the data quality in systems.
To draw a parallel here in India we have what is called a PAN – Permanent Account Number – which is typically used for tax filing, DEMAT operations, bank operations etc. Quoting and tracking the millions of employed people in my country was though to be easier. Until, we ran into the problem of having multiple PANs because of address changes or the way the name is written in various forms.
Now, forget MDM and quality of data assigining and identifying an individual is a night mare given that we are 1.3 billion+ in number!
I am adding more info to your post here. We have now embarked on a Unique identification project , details of which you can find here – http://uidai.gov.in – which has the mandate to provide a unique ID to all resident Indians.
But, the amount of difficulty that any MDM implementation will run through given the kind of anomalies we have in Name, address, city, state zip codes etc. will still need a long time to be sorted out. But that said there are a lot of e-governance initiatives in the recent past which are in the process of uniquely identifying assets, locations, addresses etc. But, the UID once available would seamlessly integrate them.
Venkat, thanks for commenting and adding more information on the subject.
As I understand the SSN (Social Security Number) in the US, this number is not used for all citizen roles – like also there is other Tax-ID’s and registration of voters is a separate process with some nasty data quality issues exposed to the whole world every fourth year.
I do imagine the challenges faced by the project in India when such a large number of people in a large diverse country are going to be uniquely identified.
Has anyone ever checked that the Danish social security numbers are in fact unique at person level?
Hi Jane, thanks for joining. It does happen that one individual is assigned more than one ID – not at least in case of repeated immigration. As evidence a status code in the citizen hub (CPR registret) handles exactly that situation.
The idea of a single unique identifier for a person is very seductive. However, the use of a code for this presents major problems.
In a recent post I wrote about the “Data Quality Paradox” which is, that the use of keys in databases as the UID is the single greatest contributor to duplication.
Keys are never UIDs. They cannot be unique identifiers as they do not in any way identify the article to which they refer. They are simply a Code linked to that article.
Say to a person “Bring me Item 12579A” and they will have no idea what you mean. Tell them to bring you a “35cm Red Plastic Ball” and they will know exactly what you mean. For the code 12579A to be of use they would need to have a document somewhere telling them the item for which it is a code. Then they could say, “Aha, you mean a 35cm Red Plastic Ball!”
I call such codes QUACKs = Quick Alternative Code or Key. They are very useful to have in business but they are not identifiers.
The full post can be read at http://www.integrated-modeling-method.com/data-modeling/unique-keys-are-the-primary-cause-of-duplication-in-databases
John, thanks for the comment.
Though I agree with your thoughts in general I must say, that I have no doubt that a unique citizen ID is helping with data quality – not at least the duplication issue with party master data.
For example a single view on private customers is believed to be big issue in financial services in some countries, but it’s not an issue in Scandinavia where you can’t have a private bank account not attached to your unique citizen ID.