The Country List

It’s the second day of the MDM Summit Europe 2013 in London today.

The last session I attended today was an expert panel on Reference Data Management (RDM).

Country ListI guess the list of countries on this planet is the prime example of what is reference data and today’s session provided no exception from that.

Even though a list of countries is fairly small and there shouldn’t be everyday changes to the list, maintaining a country list isn’t as simple as you should think.

First of all official sources for a country list aren’t in agreement. The range of countries given an ISO code isn’t the same as the range of countries where for example the Universal Postal Union (UPU) says you can make a delivery.

Another example I have had some challenges with is that for example the D&B WorldBase (a large word-wide business directory) has four country codes for what is generally regarded as the United Kingdom, as the D&B country reference data probably is defined by a soccer fan recognizing the distinct national soccer teams from England, Wales, Scotland and Northern Ireland.

The expert panel moderator, Aaron Zornes, went as far as suggesting that a graph database maybe the best technology for reflecting the complexity in reference data. Oh yes, and in master data too you should think then, though I doubt that the relational database and hierarchy management will be out of fashion for a while.

Bookmark and Share

Crap, Damned Crap, and Big Data

Lately Jim Harris made a thought provoking post on the Mike2 blog. The post is called A Contrarian’s View of Unstructured Data.

Herein Jim wrote:

“My contrarian’s view of unstructured data is that it is, in large part, gigabytes of gossip and yottabytes of yada yada digitized, rumors and hearsay amplified by the illusion-of-truth effect and succumbing to the perception-is-reality effect until the noise amplifies so much that its static solidifies into a signal.”

Indeed, the sound of social data may be like that. Yesterday I wrote a post called Keep It Real, Stupid. Herein I mentioned an apparently fake quote by Albert Einstein saying:

“If you can’t explain it simply, you don’t understand it well enough”.

Today I tried to see how the fake quote was doing on Twitter.

OMG: Going on more than one tweet per minute along with some mutations of the quote saying:

“If you can’t explain it to a six-year-old, you don’t understand it yourself”.

“You do not really understand something unless you can explain it to your grandmother”.

OK folks: Sense-making of social data is not going to be simple. Not even relatively simple.

Simply Einstein Tweets

Simply Einstein Tweet 2

Right:

Simply Einstein Tweet 3

Bookmark and Share

instant Single Customer View

Achieving a Single Customer View (SCV) is a core driver for many data quality improvement and Master Data Management (MDM) implementations.

As most data quality practitioners will agree, the best way of securing data quality is getting it right the first time. The same is true about achieving a Single Customer View. Get it right the first time. Have an instant Single Customer View.

The cloud based solution I’m working with right now does this by:

  • Searching external big reference data sources with information about individuals, companies, locations and properties as well as social networks
  • Searching internal master data with information already known inside the enterprise
  • Inserting really new entities or updating current entities by picking  as much data as possible from external sources

instant Single Customer View

Some essential capabilities in doing this are:

  • Searching is error tolerant so you will find entities even if the spelling is different
  • The receiving data model is real world aligned. This includes:
    • Party information and location information have separate lives as explained in the post called A Place in Time
    • You may have multiple means of contact attached like many phones, email addresses and social identities

How do you achieve a Single Customer View?

Bookmark and Share

The Greenland Problem in MDM

In a recent comment here on this blog the relevance of Master Data Management (MDM) solutions was questioned because in real business life different business units sees master data very differently though the data describes the same real world entity. And it’s not the first time I hear this argument.

The Greenland ProblemThe issue is similar to the Greenland problem in geography. When using the most common projection for visualizing a round earth on a flat map, the Mercator projection, Greenland has a true shape but will look as being of same size as Africa, though Africa is over 10 times as large as Greenland.

As examined in the post Sharing data is key to a single version of the truth this is similar to the problems in fulfilling multiple uses embracing all business units in an enterprise:

  • If a map shows a limited part of the world the difference doesn’t matter that much. This is similar to fitting the purpose of use in a single business unit.
  • If the map shows the whole world we may have all kind of different projections offering different kind of views on the world having some advantages and disadvantages like when we do enterprise MDM.

Today we have new technology coming to the rescue. If you go into Google Earth the world indeed looks round and you may have any high altitude view of an apparently round world. If you go closer the map tends to be more and more flat.

Google EarthMy guess is that the solutions to fit the multiple uses conundrum within MDM also will be offered from the cloud by having innovative solutions reflecting the real world entities and relate those to a variety of business functions used in different business units offering a range of views that supports multiple purposes of use.

Bookmark and Share

What Happened in 1013

At this time of year it is very popular to try to predict what will happen in the next year, being 2013, within your field of expertise.

However, predictions, not at least about the future, may fail. And within data quality we don’t like flaws. So instead I will tell a little bit about what happened in year 1013 with respect to data quality.

1013As always Wikipedia is your friend when seeking knowledge. So I have picked a few of the highlights from the Wikipedia article about 1013:

Diversity

In 1013 the Viking warlord Sweyn Forkbeard replaced Æthelred the Unready as King of England. These were the happy days when the letter Æ was part of the English alphabet. Today Æ only exists in some of the Viking alphabets.

Definition

Kaifeng, capital of China, becomes the largest city of the world in 1013, taking the lead from Córdoba in Al-Andalus. However this is estimation. And even today, as reported by BBC, we actually can’t tell which one is the largest city in the world.

Multiple versions of the truth

The anti-pope John XVI dies in 1013. An anti-pope is a person who, in opposition to the one who is generally seen as the legitimately elected Pope, makes a significantly accepted competing claim to be the Pope. Even today we can’t always establish a single version of the truth.

Bookmark and Share

Beyond True Positives in Deduplication

The most frequent data quality improvement process done around is deduplication of party master data.

A core functionality of many data quality tools is the capability to find duplicates in large datasets with names, addresses and other party identification data.

When evaluating the result of such a process we usually divide the result of found duplicates into:

  • False positives being automated match results that actually do not reflect  real world duplicates
  • True positives being  automated match results reflecting the same real world entity

The difficulties in reaching the above result aside, you should think the rest is easy. Take the true positives, merge into a golden record and purge the unneeded duplicate records in your database.

Well, I have seen so many well executed deduplication jobs ending just there, because there are a lot of reasons for not making the golden records.

Sure, at lot of duplicates “are bad” and should be eliminated.

But many duplicates “are good” and have actually been put into the databases for a good reason supporting different kind of business processes where one view is needed in one case and another view is needed in another case.

Many, many operational applications, including very popular ERP and CRM systems, do have inferior data models that are not able to reflect the complexity of the real world.

Only a handful of MDM (Master Data Management) solutions are able to do so, but even then the solutions aren’t easy as most enterprises have an IT landscape with all kinds of applications with other business relevant functionality that isn’t replaced by a MDM solution.

What I like to do when working with getting business value from true positives is to build a so called Hierarchical Single Source of Truth.

Bookmark and Share

Hierarchical Single Source of Truth

Most data quality and master data management gurus, experts and practitioners agree that achieving a “single source of truth” is a nice term, but is not what data quality and master data management is really about as expressed by Michele Goetz in the post Master Data Management Does Not Equal The Single Source Of Truth.

Even among those people, including me, who thinks emphasis on real world alignment could help getting better data and information quality opposite to focusing on fitness for multiple different purposes of use, there is acknowledgement around that there is a “digital distance” between real world aligned data and the real world as explained by Jim Harris in the post Plato’s Data. Also, different public available reference data sources that should reflect the real world for the same entity are often in disagreement.

When working with improvement of data quality in party master data, which is the most frequent and common master data domain with issues, you encounter the same issues over and over again, like:

  • Many organizations have a considerable overlap of real world entities who is a customer and a supplier at the same time. Expanding to other party roles this intersection is even bigger. This calls for a 360° Business Partner View.
  • Most organizations divide activities into business-to-business (B2B) and business-to-consumer (B2C). But the great majority of business’s are small companies where business and private is a mixed case as told in the post So, how about SOHO homes.
  • When doing B2C including membership administration in non-profit you often have a mix of single individuals and households in your core customer database as reported in the post Household Householding.
  • As examined in the post Happy Uniqueness there is a lot of good fit for purpose of use reasons why customer and other party master data entities are deliberately duplicated within different applications.
  • Lately doing social master data management (Social MDM) has emerged as the new leg in mastering data within multi-channel business. Embracing a wealth of digital identities will become yet a challenge in getting a single customer view and reaching for the impossible and not always desirable single source of truth.

A way of getting some kind of structure into this possible, and actually very common, mess is to strive for a hierarchical single source of truth where the concept of a golden record is implemented as a model with golden relations between real world aligned external reference data and internal fit for purpose of use master data.

Right now I’m having an exciting time doing just that as described in the post Doing MDM in the Cloud.

Bookmark and Share

Business Entity Identifiers

The least cumbersome way of uniquely identifying a business partner being a company, government body or other form of organization is to use an externally provided number.

However, there are quite a lot of different numbers to choose from.

All-Purpose National Identification Numbers

In some counties, like in Scandinavia, the public sector assigns a unique number to every company to be used in every relation to the public sector and open to be used by the private sector as well for identification purposes.

As reported in the post Single Company View I worked with the early implementation of such a number in Denmark way back in time.

Single-Purpose National Identification Numbers

In most countries there are multiple systems of numbers for companies each with an original special purpose. Examples are registration numbers, VAT numbers and employer identification numbers.

My current UK company has both a registration number and a VAT number and very embarrassing for a data quality and master data geek these two numbers have different names and addresses attached.

Other Numbering Systems

The best known business entity numbering system around the world is probably the DUNS-number used by Dun & Bradstreet. As examined in the post Select Company_ID from External_Source Where Possible the use of DUNS-numbers and similar business directory id’s is a very common way of uniquely identifying business partners.

In the manufacturing and retail world legal entities may, as part of the Global Data Synchronization Network, be identified with a Global Location Number (GLN).

There has been a lot of talk in the financial sector lately around implementing yet a new numbering system for legal entities with an identifier usually abbreviated as LEI. Wikipedia has the details about a Legal Entity Identification for Financial Contracts.

These are only some of the most used numbering systems for business entities.

So, the trend doesn’t seem to be a single source of truth but multiple sources making up some kind of the truth.

Bookmark and Share

Beyond Address Validation

The quality of contact master data is the number one data quality issue around.

Lately there has been a lot of momentum among data quality tool providers in offering services for getting at least the postal address in contact data right. The new services are improved by:

  • Being cloud based offering validation services that are implemented at data entry and based on fresh reference data.
  • Being international and thus providing address validation for customer and other party data embracing a globalized world.

Capturing an address that is aligned with the real world may have a significant effect on business outcomes as reported by the tool vendor WorldAddresses in a recent blog post.

However, a valid address based on address reference data only tells you if the address is valid, not if the addressee is (still) on the address, and you are not sure if the name and other master data elements are accurate and complete. Therefore you often need to combine address reference data with other big reference data sources as business directories and consumer/citizen reference sources.

Using business directories is not new at all. Big reference sources as the D&B WorldBase and many other directories have been around for many years and been a core element in many data quality initiatives with customer data in business-to-business (B2B) environments and with supplier master data.

Combining address reference data and business entity reference data makes things even better, also because business directories doesn’t always come with a valid address.

Using public available reference data when registering private consumers, employees and other citizen roles has until now been practiced in some industries and for special reasons. Therefore the big reference data and the services are out there and being used today in some business processes.

Mashing up address reference data, business entity reference data and consumer/citizen reference data is a big opportunity for many organizations in the quest for high quality contact master data, as most organizations actually interact with both companies and private persons if we look at the total mix of business processes.

The next big source is going to be exploiting social network profiles as well. As told in the post Social Master Data Management social media will be an additional source of knowledge about our business partners. Again, you won’t find the full truth here either. You have to mashup all the sources.

Bookmark and Share