Finding the Truth in Social Business Directories

LinkedIn has a section called companies. When browsing around on LinkedIn you are sometimes hinted to follow a company that LinkedIn think will be of interest for you.

The other day my hint included two identical logo’s for the old Master Data Management (MDM) vendor called Siperian. Curiously and data quality geeky as I am I checked and actually there are two Siperians on LinkedIn companies:

Both have an identical head quarter address in California, USA.

So, even MDM vendors have created duplicates.

Also, Siperian was acquired by the Data Integration giant Informatica some years ago, so you should expect that the Siperians was emptied. But that is not the case. Some Siperian folks still claims working for one of the Siperian duplicates (though many also for Imformatica at the same time).

Now, I was not sure about the legal status of the old Siperian company. So I went to another social network called Companybook. On that site the company registry is based on an external business directory.

Here it seems that the Siperian company in Toronto, Canada actually still exist, though marked as owned by Informatica.

So, I’m still looking for that single source of the truth out there. Until then I will mashup the external sources out there with my internal MDM vendor knowledge as told in the post yesterday called Mashing Up Big Reference Data with Internal Master Data.

Bookmark and Share

Sometimes Big Brother is Confused

Google Maps knows a lot. It knows about addresses and it knows about companies on these addresses.

As with most services it seems that Google Maps gets the reference data from different sources.

The other day I went to visit “Channel 4”, the British TV channel that hosted the UK “Big Brother” reality show until lately.

I typed in the address “124 Horseferry Road, London, United Kingdom” and got the point:

However, it seems that there is a large building up to the left called “Channel 4 Television”. Strange. Then I tried with “Channel 4, 124 Horseferry Road, London, United Kingdom”:

Oh, so I will find “Channel Four Television, 124 Horseferry Road” in the “Channel 4 Television” building only 0.2 miles west of “124 Horseferry Rd”:

Bookmark and Share

The Data Quality Tool Vendor Difference

How do analysts look at the data quality tool vendor market? As with everything data quality there are differences and apparently no single source of truth.

Gartner has its magic quadrant. They sell it for money, but usually you are able to get a free copy from the leading vendors.

The Information Difference has its DQ Landscape in the cloud for free.

It is interesting to compare which vendors are included in the latest main pictures, as I have tried below:

The number of x’s is a rough measure of the ability to execute / market strength.

Three smaller vendors are considered by Gartner, but not by The Information Difference and vice versa. Two midsize vendors are included by The Information Difference, but not by Gartner. Experian QAS are included as a big one by The Information Difference, but did not (yet) meet the inclusion criteria used by Gartner.

Bookmark and Share

At Least Two Versions of the Truth

Precisely one year ago I wrote a post called Single Company View examining the challenges of getting a single business partner view in business-to-business (B2B) party master data.

Yesterday Robert Hawker of Vodafone made a keynote at the MDM Summit Europe 2012 telling about supplier master data management.

One of the points was that sometimes you really want the exactly same real world entity to be two golden records in your master data hub, as there may be totally different business activities made with the same legal entity. The Vodafone example was:

  • Having an antenna placed on the top of a building owned by a certain company and thus paying a fee for that
  • Buying consultancy services from the same company

I have met such examples many times when doing data matching as told in the post Entity Revolution vs Entity Evolution.

However at one occasion, many years ago, I worked in a company where not having a single business partner view nearly became a small disaster.

Our company delivered software for membership administration and was at the same time a member of an employer organisation that also happened to be a customer.

A new director got the brilliant idea, that cancelling the membership of the employer organization was an obvious cost reduction.

The cancellation was sent. The employer organisation confirmed the cancellation adding, that they were very sorry that internal business rules at the same time forced them to not being a customer anymore.

Cancellation was cancelled of course and damage control was initiated.

Bookmark and Share

Finding Me

Many people have many names and addresses. So have I.

A search for me within Danish reference sources in the iDQ tool gives the following result:

Green T is positive in the Danish Telephone Books. Red C is negative in the Danish Citizen hub. Green C is positive in the Danish Citizen Hub.

Even though I have left Denmark I’m still registered with some phone subscriptions there. And my phone company hasn’t fully achieved single customer view yet, as I’m registered there with two slightly different middle (sur)names.

Following me to the United Kingdom I’m registered here with more different names.

It’s not that I’m attempting some kind of fraud, but as my surname contains The Letter Ø, and that letter isn’t part of the English alphabet, my National Insurance Number (kind of similar to the Social Security Number in the US) is registered by the name “Henrik Liliendahl Sorensen”.

But as the United Kingdom hasn’t a single citizen view, I am separately registered at the National Health Service with the name “Henrik Sorensen”. This is due to a sloppy realtor, who omitted my middle (sur)name on a flat rental contract. That name was taken further by British Gas onto my electricity bill. That document is (surprisingly for me) my most important identity paper in the UK, and it was used as proof of address when registering for health service.

How about you, do you also have several identities?

Bookmark and Share

Updating a Social Business Directory

Business directories have been around for ages. In the old days it was paper based as in the yellow pages for a phone book. The yellow pages have since made it to be online searchable. We also know commercial business directories as the Dun & Bradstreet WorldBase as well as government operated national wide directories of companies and industry specific business directories.

Such business directories often takes a crucial role in master data quality work as sources for data enrichment in the quest for getting as close as possible to a single version of the truth when dealing with B2B customer master data, supplier master data and other business partner master data.

A classic core data model for Master Data in CRM systems, SCM solutions and Master Data hubs when doing B2B is that you have:

  • Accounts being the BUSINESS entities who are your customers, suppliers, prospects and all kind of other business partners
  • Contacts being the EMPLOYEEs working there and acting in the roles as decision makers, influencers, gate keepers, users and so on

Today we also have to think about social master data management, being exploiting reference data in social media as a supplementary source of external data.

As all social activity this exercise goes two ways:

  • Finding and monitoring your existing and wanted business partners in the social networks
  • Updating your own data

Most business entities in this world are actually one-man-bands. So are mine. Therefore I went to the LinkedIn company pages this morning and updated data about my company Liliendahl Limited: Unlimited Data Quality and Master Data Management consultancy for tool and service vendors.

Bookmark and Share

Inaccurately Accurate

The public administrative practice for keeping track of the citizens within a country is very different between my former country of living being Denmark and my current country of living being the United Kingdom.

In Denmark there is an all-purpose citizen registry where you are registered “once and for all” seconds after you are born as told in the post Citizen ID within Seconds.

In the United Kingdom there are separate registries for different purposes. For example there is a registry dealing with your health care master data and there is a registry, called the electoral roll, dealing with your master data as a voter.

Today I was reading a recent report about data quality within the British electoral roll. The report is called Great Britain’s electoral registers 2011

The report revolves around the two data quality dimensions: Accuracy and completeness.

In doing so, these two bespoke definitions are used:

There is a note about accuracy saying:

 

This is a very interesting precision, so to speak. Having fitness for the purpose of use is indeed the most common approach to data quality.

This does of course create issues when such data are used for other purposes. For example credit risk agencies here in the UK use appearance on the electoral roll as a parameter for their assessment of credit risk related to individuals.

Surely, often there isn’t a single source of the truth as pondered in the post The Big ABC of Reference Data.

However, this mustn’t make us stop in the search for getting high quality data. We just have to realize that we may look in different places in order to mash up a best picture of the real world as explained in the post Reference Data at Work in the Cloud.  

Bookmark and Share

The Big ABC of Reference Data

Reference Data is a term often used either instead of Master Data or as related to Master Data. Reference data is those data defined and (initially) maintained outside a single organisation. Examples from the party master data realm are a country list, a list of states in a given country or postal code tables for countries around the world.

The trend is that organisations seek to benefit from having reference data in more depth than those often modest populated lists mentioned above.

In the party master data realm such reference data may be core data about:

  • Addresses being every single valid address typically within a given country.
  • Business entities being every single business entity occupying an address in a given country.
  • Consumers (or Citizens) being every single person living on an address in a given country.

There is often no single source of truth for such data. Some of the challenges I have met for each type of data are:

Addresses

The depth (or precision if you like) of an address is a common problem. If the depth of address data is at the level of building numbers on streets (thoroughfares) or blocks, you have issues as described in the blog post called Multi-Occupancy.

Address reference data of course have issues with the common data quality dimensions as:

  • Timeliness, because for example new addresses will exist in the real world but not yet in a given address directory.
  • Accuracy, as you are always amazed when comparing two official sources which should have the same elements, but haven’t.

Business Entities

Business directories have been accessible for many years and are often used when handling business-to-business (B2B) customer master data and supplier master data management. Some hurdles in doing this are:

  • Uniqueness, as your view of what a given business entity is occasionally don’t match the view in the business directory as discussed in the post 3 out of 10
  • Conformity, because for example an apparently simple exercise as assigning an industry vertical can be a complex matter as mentioned in the post What are they doing?

Consumers (or Citizens)

In business-to-consumer (B2C) or other activities involving citizens a huge challenge is identifying the individuals living on this planet as pondered in the post Create Table Homo Sapiens. Some troubles are:

  • Consistency isn’t easy, as governments around the world have found 240 (or so) different solutions to balancing privacy concerns and administrative effectiveness.
  • Completeness, as the rules and traditions not only between countries, but also within different industries, certain activities and various channels, are different.

Big Reference Data as a Service

Even though I have emphasized on some data quality dimensions for each type of data, all dimensions apply to all types of data.

For organisations operating multinational and/or multichannel exploiting the wealth and diversity of external reference data is a daunting task.

This is why I see reference data as a service embracing many sources as a good opportunity for getting data quality right the first time. There is more on this subject in the post Reference Data at Work in the Cloud.

Bookmark and Share

The Database versus the Hub

In the LinkedIn Multi-Domain MDM group we have an ongoing discussion about why you need a master data hub when you already got some workflow, UI and a database.

I have been involved in several master data quality improvement programs without having the opportunity of storing the results in a genuine MDM solution, for example as described in the post Lean MDM. And of course this may very well result in a success story.

However there are some architectural reasons why many more organizations than those who are using a MDM hub today may find benefits in sooner or later having a Master Data hub.

Hierarchical Completeness

If we start with product master data the main issue with storing product master data is the diversity in the requirements for which attributes is needed and when they are needed dependent on the categorization of the products involved.

Typical you will have hundreds or thousands of different attributes where some are crucial for one kind of product and absolutely ridiculous for another kind of product.

Modeling a single product table with thousands of attributes is not a good database practice and pre-modeling tables for each thought categorization is very inflexible.

Setting up mandatory fields on database level for product master data tables is asking for data quality issues as you can’t miss either over-killing or under-killing.

Also product master data entities are seldom created in one single insertion, but is inserted and updated by several different employees each responsible for a set of attributes until it is ready to be approved as a whole.

A master data hub, not at least those born in the product domain, is built for those realities.

The party domain has hierarchical issues too. One example will be if a state/province is mandatory on an address, which is dependent on the country in question.

Single Business Partner View

I like the term “single business partner view” as a higher vision for the more common “single customer view”, as we have the same architectural requirements for supplier master data, employee master data and other master data concerning business partners as we have for the of course extremely important customer master data.

The uniqueness dimension of data quality has a really hard time in common database managers. Having duplicate customer, supplier and employee master data records is the most frequent data quality issue around.

In this sense, a duplicate party is not a record with accurately the same fields filled and with accurate the same values spelled accurately the same as a database will see it. A duplicate is one record reflecting the same real world entity as another record and a duplicate group is more records reflecting the same real world entity.

Even though some database managers have fuzzy capabilities they are still very inadequate in finding these duplicates based on including several attributes at one time and not at least finding duplicate groups.

Finding duplicates when inserting supposed new entities into your customer list and other party master data containers is only the first challenge concerning uniqueness. Next you have to solve the so called survivorship questions being what values will survive unavoidable differences.

Finally the results to be stored may have several constructing outcomes. Maybe a new insertion must be split into two entities belonging to two different hierarchy levels in your party master data universe.

A master data hub will have the capabilities to solve this complexity, some for customer master data only, some also for supplier master data combined with similar challenges with product master data and eventually also other party master data.

Domain Real World Awareness

Building hierarchies, filling incomplete attributes and consolidating duplicates and other forms of real world alignment is most often fulfilled by including external reference data.

There are many sources available for party master as address directories, business directories and citizen information dependent on countries in question.

With product master data global data synchronization involving common product identifiers and product classifications is becoming very important when doing business the lean way.

Master data hubs knows these sources of external reference data so you, once again, don’t have to reinvent the wheel.

Bookmark and Share

The trees never grow into heaven

This morning most of digital Denmark was closed. You couldn’t do anything at the online bank, you couldn’t do much at public sector websites and you couldn’t read electronic mail from your employer, pension institution and others.

It wasn’t because someone cut a big cable or a computer virus got a lucky strike. The problem was that the centralized internet login service had a three hour outage. It was a classic single point of failure incident.

In Denmark we have a single sign-on identity solution used by public sector, financial services and other organizations. The service is called NemID (Easy ID) and is based on an all-purpose unique national ID for every citizen.

As more and more interaction with public sector and financial services along with online shopping is taking place in the cloud, we are of course more and more vulnerable to these kind of problems.

The benefits of having a single source of truth about who you are became a single point of failure here.

Well, we have this local saying: “The trees never grow into heaven”. All good things have their limit. Even in instant Identity Resolution.

Bookmark and Share