Business Entity Identifiers

The least cumbersome way of uniquely identifying a business partner being a company, government body or other form of organization is to use an externally provided number.

However, there are quite a lot of different numbers to choose from.

All-Purpose National Identification Numbers

In some counties, like in Scandinavia, the public sector assigns a unique number to every company to be used in every relation to the public sector and open to be used by the private sector as well for identification purposes.

As reported in the post Single Company View I worked with the early implementation of such a number in Denmark way back in time.

Single-Purpose National Identification Numbers

In most countries there are multiple systems of numbers for companies each with an original special purpose. Examples are registration numbers, VAT numbers and employer identification numbers.

My current UK company has both a registration number and a VAT number and very embarrassing for a data quality and master data geek these two numbers have different names and addresses attached.

Other Numbering Systems

The best known business entity numbering system around the world is probably the DUNS-number used by Dun & Bradstreet. As examined in the post Select Company_ID from External_Source Where Possible the use of DUNS-numbers and similar business directory id’s is a very common way of uniquely identifying business partners.

In the manufacturing and retail world legal entities may, as part of the Global Data Synchronization Network, be identified with a Global Location Number (GLN).

There has been a lot of talk in the financial sector lately around implementing yet a new numbering system for legal entities with an identifier usually abbreviated as LEI. Wikipedia has the details about a Legal Entity Identification for Financial Contracts.

These are only some of the most used numbering systems for business entities.

So, the trend doesn’t seem to be a single source of truth but multiple sources making up some kind of the truth.

Bookmark and Share

Staying in Doggerland

Currently I’m travelling a lot between my present home in London, United Kingdom and Copenhagen, Denmark where I have most of my family and where the iDQ headquarter is.

When flying between London and Copenhagen you pass the southern North Sea. In the old days (8,000 years ago) this area was a land occupied by human beings. This ancient land is known today as Doggerland.

Sometimes I feel like a citizen of Doggerland not really belonging in the United Kingdom or Denmark.

I still have some phone subscriptions in Denmark I use there and my family are using there.  The phone company seems to have a hard time getting a 360 degree customer view as I have two different spellings of my name and two different addresses as seen on the screen when I look up myself in the iDQ service:

Besides having a Customer Relationship Mess (CRM) the phone company has recently shifted their outsourcing partner (from CSC to TCS). This has caused a lot of additional mess, apparently also closing one of my subscriptions due to that they have failed to register my payments. They did however send a chaser they say, but to the oldest of the addresses where I don’t pick up mail anymore.

I called to settle the matter and asked if they could correct the address not in use anymore. They couldn’t. The operator did some kind of query into the citizen hub similar to what I can do on iDQ:

However the customer service guy’s screen just showed that I have no address in Denmark in the citizen hub (called CPR), so he couldn’t change the address.

Apparently the phone company have correctly picked up an accurate address in the citizen hub when I got the subscription but failed to update it (along with the other subscriptions) when I moved to another domestic address and now don’t have an adequate business rule when I’m registered at a foreign address.

So now I’m staying in Doggerland.

Bookmark and Share

Deduplication vs Identity Resolution

When working with data matching you often finds that there basically is a bright view and a dark view.

Traditional data matching as seen in most data quality tools and master data management solutions is the bright view: Being about finding duplicates and making a “single customer view”. Identity resolution is the dark view: Preventing fraud and catching criminals, terrorists and other villains.

These two poles were discussed in a blog post and the following comments last year. The post was called What is Identity Resolution?

While deduplication and identity resolution may be treated as polar opposites and seemingly contrary disciplines they are in my eyes interconnected and interdependent. Yin and Yang Data Quality.

At the MDM Summit in London last month one session was about the Golden Nominal, Creating a Single Record View. Here Corinne Brazier, Force Records Manager at the West Midlands Police in the UK told about how a traditional data quality tool with some matching capabilities was used to deal with “customers” who don’t want to be recognized.

In the post How to Avoid Losing 5 Billion Euros it was examined how both traditional data matching tools and identity screening services can be used to prevent and discover fraudulent behavior.

Deduplication becomes better when some element of identity resolution is added to the process. That includes embracing big reference data in the process. Knowing what is known in available sources about the addresses that is being matched helps. Knowing what is known in business directories about companies helps. Knowing what is known in appropriate citizen directories when deduping records holding data about individuals helps.

Identity Resolution techniques is based on the same data matching algorithms we use for deduplication. Here for example a fuzzy search technology helps a lot compared to using wildcards. And of course the same sources as mentioned above are a key to the resolution.

Right now I’m dipping deep into the world of big reference data as address directories, business directories, citizen directories and the next big thing being social network profiles. I have no doubt about that deduplication and identity resolution will be more yinyang than yin and yang in the future.

Bookmark and Share

How to Avoid Losing 5 Billion Euros

Two years ago I made a blog post about how 5 billion Euros were lost due to bad identity resolution at European authorities. The post was called Big Time ROI in Identity Resolution.

In the carbon trade scam criminals were able to trick authorities with fraudulent names and addresses.

One way of possible discovery of the fraudster’s pattern of interrelated names and physical and digital locations was, as explained in the post, to have used an “off the shelf” data matching tool in order to achieve what is sometimes called non-obvious relationship awareness. When examining the data I used the Omikron Data Quality Center.

Another and more proactive way would have been upstream prevention by screening identity at data capture.

Identity checking may be a lot of work you don’t want to include in business processes with high volume of master data capture, and not at least screening the identity of companies and individuals on foreign addresses seems a daunting task.

One way to help with overcoming the time used on identity screening covering many countries is using a service that embraces many data sources from many countries at the same time. A core technology in doing so is cloud service brokerage. Here your IT department only has to deal with one interface opposite to having to find, test and maintain hundreds of different cloud services for getting the right data available in business processes.

Right now I’m working with such a solution called instant Data Quality (iDQ).

Really hope there’s more organisations and organizations out there wanting to avoid losing 5 billion Euros, Pounds, Dollars, Rupees, Whatever or even a little bit less.

Bookmark and Share

Social MDM and Systems of Engagement

Social Master Data Management has been an interest of mine the last couple of years and last week I have tried to reach out to others in exploring this new era of Master Data Management by creating a group on LinkedIn called Social MDM.

When reading a nice blog with the slogan ”Welcome to the Real (IT) World!” by Max J. Pucher I came across a good illustration by John Mancini showing the history of IT and how the term “Systems of Record” is being replaced (or at least supplemented) by the term “Systems of Engagement”:

Master Data Management (MDM) includes having a System of Record (SOR) describing the core entities that takes part in the transactional systems of record that supports the daily business in every organization. For example a golden MDM record is describing the party that acts as a customer on an order record while the products in the underlying order lines are described in golden MDM records for the things dealt with within the organization.

Social Master Data Management (Social MDM) will be about supplementing that System of Record so we are able to further describe the parties taking part in the new Systems of Engagement and link with the old Systems of Records. These parties are reflected as social network profiles that are owned by the same human beings who are our (prospective) customers, part of the same household or are a contact for a company being a (prospective) customer or any other business partner.

For a guy like me who started in IT in the mainframe era (just after it had ended according to the above illustration) and went on with mini computers, PC’s and the internet it’s very exciting to be moving on into the social and cloud era.

It will be good to be joined by even more data quality and MDM practitioners and anyone else in the LinkedIn Social MDM group.

Bookmark and Share

Finding Me

Many people have many names and addresses. So have I.

A search for me within Danish reference sources in the iDQ tool gives the following result:

Green T is positive in the Danish Telephone Books. Red C is negative in the Danish Citizen hub. Green C is positive in the Danish Citizen Hub.

Even though I have left Denmark I’m still registered with some phone subscriptions there. And my phone company hasn’t fully achieved single customer view yet, as I’m registered there with two slightly different middle (sur)names.

Following me to the United Kingdom I’m registered here with more different names.

It’s not that I’m attempting some kind of fraud, but as my surname contains The Letter Ø, and that letter isn’t part of the English alphabet, my National Insurance Number (kind of similar to the Social Security Number in the US) is registered by the name “Henrik Liliendahl Sorensen”.

But as the United Kingdom hasn’t a single citizen view, I am separately registered at the National Health Service with the name “Henrik Sorensen”. This is due to a sloppy realtor, who omitted my middle (sur)name on a flat rental contract. That name was taken further by British Gas onto my electricity bill. That document is (surprisingly for me) my most important identity paper in the UK, and it was used as proof of address when registering for health service.

How about you, do you also have several identities?

Bookmark and Share

The Taxman: Data Quality’s Best Friend

Collection of taxes has always been a main driver for having registries and means of identifying people, companies and properties.

5,000 years ago the Egyptians made the first known census in order to effectively collect taxes.

As reported on the Data Value Talk blog, the Netherlands have had 200 years of family names thanks to Napoleon and the higher cause of collecting taxes.

Today the taxman goes cross boarder and wants to help with international data quality as examined in the post Know Your Foreign Customer. The US FATCA regulation is about collecting taxes from activities abroad and as said on the Trillium blog: Data Quality is The Core Enabler for FATCA Compliance.

My guess is that this is only the beginning of a tax based opportunity for having better data quality in relation to international data.

In a tax agenda for the European Union it is said: “As more citizens and companies today work and operate across the EU’s borders, cooperation on taxation has become increasingly important.”.

The EU has a program called FISCALIS in the making. Soon we not only have to identify Americans doing something abroad but practically everyone taking part in the globalization.

For that we all need comprehensive accessibility to the wealth of global reference data through “cutting-edge IT systems” (a FISCALIS choice of wording).

I am working on that right now:

Bookmark and Share

Real World Identity

How far do you have to go when checking your customer’s identity?

This morning I read an article on the Danish Computerworld telling about a ferry line now dropping a solution for checking if the passenger using an access card is in fact the paying customer by using a lightweight fingerprint stored on the card. The reason for dropping was by the way due to the cost of upgrading the solution compared to future business value and not any renewed privacy concerns.

I have been involved in some balancing of real world alignment versus fitness for use and privacy in public transport as well as described in the post Real World Alignment. Here it was the question about using a national identification number when registering customers in public transportation.

As citizens of the world we are today used to sometimes having our iris scanned when flying as our passport holds our unique identification that way. Some of the considerations around using biometrics in general public registration were discussed in the post Citizen ID and Biometrics.

In my eyes, or should we say iris, there is no doubt that we will meet an increasing demand of confirming and registering our identification around. Doing that in the fight against terrorism has been there for long. Regulatory compliance will add to that trend as told in the post Know Your Foreign Customer, mentioning the consequences of the FATCA regulation and other regulations.

When talking about identity resolution in the data quality realm we usually deal with strings of text as names, addresses, phone numbers and national identification numbers. Things that reflect the real world, but isn’t the real world.

We will however probably adapt more facial recognition as examined in the post The New Face of Data Matching. We do have access to pictures in the cloud, as you may find your B2C customers picture on FaceBook and your B2B customer contacts picture on LinkedIn or other similar services. It’s still not the real world itself, but a bit closer than a text string. And of course the picture could be false or outdated and thus more suitable for traction on a dating site.

Fingerprint is maybe a bit old fashioned, but as said, more and more biometric passports are issued and the technology for iris and retinal scanning is used around for access control even on mobile devices.

In the story starting this post the business value for reinvesting in a biometric solution wasn’t deemed positive. But looking from the print on my fingers down to my hand lines I foresee some more identity resolution going beyond name and address strings into things closer to the real world as facial recognition and biometrics.

Bookmark and Share

Small Business Owners

A challenge I encounter over and over again within Data Matching and customer Master Data Management is what to do with small business owners.

Examples of small business owners are:

  • Farmers
  • Healthcare professionals with an own clinic
  • Small family driven shop owners
  • Modest membership organisation administrators
  • Local hospitality providers as Basil Fawlty of Fawlty Towers
  • Independent Data Quality consultants as myself

When handling customer master data we often like to divide those into Business-to-consumer (B2C) or Business-to-business (B2B). We may have different source systems, different data models and different data owners and data stewards for each of the two divisions.

But small business owners usually belong to both divisions. In some transactions they act as private persons (B2C) and in some other transactions they act as a business contact (B2B). If you like to know your customer, have a single customer view , engage in social media and all that jazz, you must have a unique view of the person, the business and the household.

In several industries small business owners, the business and the household is a special target group with unique product requirements. This is true for industries as banking, insurance, telco, real estate, law.

So here are plenty of business cases for multi-domain Master Data Management embracing customer master data and product master data.

The capability to handle a single customer view of small business owners is in my experience very poorly fulfilled in Data Quality and Master Data Management solutions around. Here is certainly room for improvement and entrepreneurship.

Bookmark and Share

The Present Birthday

Today (or maybe yesterday) Steve Jones of Capgemeni wrote a blog post called Same name, same birth date – how likely is it? The post examines the likelihood of that two records with the same name and birthday is representing same real world individual. The chance that a match is a false positive is of course mainly depending on the frequency of the name.

Another angle in this context I have observed over and over again is the chance of a false negative if the name and other data are the same, but the birthday is different. In this case you may miss matching two records that are actually reflecting the same real world individual.

One should think that a datum like a birthday usually should be pretty accurate. My practical experience is that it in many cases isn’t.

Some examples:

Running against the time

Every fourth year when we have Olympic Games there is always controversies about if a tiny female athlete really is as old as said.

I have noticed the same phenomenon when I had the chance to match data about contesters from several years of subscription data at a large city marathon in order to identify “returning customers”.

I’m always looking for false positives in data matching and was really surprised when I found several examples of same name and contact data but a birthday been raised one year for each appearance at the marathon.

That’s not my birthday, this is my birthday

Swedish driving license numbers includes the birthday of the holder as the driving license number is the same as the all-purpose national ID that starts with the birthday.

In a database with both a birthday field and a driving license number field there were heaps of records with mismatch between those two fields.

This wasn’t usually discovered because this rule only applies to Swedish driving license numbers and the database also had registrations for a lot of other nationalities.

When investigating the root cause of this there were as usual not a single explanation and the problem could be both that the birthday belonged to someone else and the driving license belonged to someone else.

Using both fields cut down the number of false negatives here.

Today’s date format is?

In the United States and a few other countries it’s custom to use the month-day-year format when typing a date. In most other places we have the correct sequence of either day-month-year or year-month-day.  Once I matched data concerning foreign seamen working on ships in the Danish merchant fleet. When tuning the match process I found great numbers of good matches when twisting the date formats for birthdays, as the same seaman was registered on different ships with different captains and at different ports around the world.

When adding the fact that many birthdays was typed as 1st January of the known year of birth or 1st day in the known month of birth a lot of false positives was saved.

The question about occupation in the merchant fleet was actually a political hot potato at that time and until then the parliament had discussed the matter based on wrong statistics.

PS

I have used birthday synonymously with “date of birth” which of course is a (meta) data quality problem.

Bookmark and Share