Master Data – Page 56 – Liliendahl on Data Quality

Aadhar (or Aadhaar)

2nd May 201021st June 2010Henrik Gabs Liliendahl9 Comments

The solution to the single most frequent data quality problem being party master data duplicates is actually very simple. Every person (and every legal entity) gets an unique identifier which is used everywhere by everyone.

Now India jumps the bandwagon and starts assigning a unique ID to the 1.2 billion people living in India. As I understand it the project has just been named Aadhar (or Aadhaar). Google translate tells me this word (आधार) means base or root – please correct if anyone knows better.

In Denmark we have had such an identifier (one for citizens and one for companies) for many years. It is not used by everyone everywhere – so you still are able to make money being a data quality professional specializing in data matching.

The main reason that the unique citizen identifier is not used all over is of course privacy considerations. As for the unique company identifier the reason is that data quality often are defined as fit for immediate purpose of use.

A user experience

28th April 201021st June 2010Henrik Gabs LiliendahlLeave a comment

As a data quality professional it is a learning experience when you are the user.

During the last years I have worked for a data quality tool vendor with headquarter in Germany. As part of the role of serving partners, prospects and customers in Scandinavia I have been a CRM system user. As a tool vendor own medicine has been taken which includes intelligent real time duplicate check, postal address correction, fuzzy search and other goodies built into the CRM system.

Sounds perfect? Sure, if it wasn’t for a few diversity glitches.

The address doesn’t exist

Postal correction is only activated for Germany. This actually makes some sense since most activity is in Germany and postal correction is not that important in Scandinavia as company (and citizen) information is more available and then usually a better choice. Due to a less fortunate setup during the first years my routine when inserting a new account was to pick correct data from a business directory, paste into the CRM system and then angry override the warning that the address doesn’t exist (in Germany).

Dear worshipful Mr Doctor Oetker

In Germany salutation is paramount. In Scandinavia it is not common to use a prefixed salutation anymore – and if you do, you are regarded as very old fashioned. So having the salutation field for a contact as mandatory is an annoyance and setting up an automated salutation generation mechanism is a complete waste of time.

Merging Customer Master Data

21st April 201025th September 2010Henrik Gabs Liliendahl3 Comments

One of the most frequent assignments I have had within data matching is merging customer databases after two companies have been merged.

This is one of the occasions where it doesn’t help saying the usual data quality mantras like:

Prevention and root cause analysis is a better option
Change management is a critical factor in ensuring long-term data quality success
Tools are not important

It is often essential for the new merged company to have a 360 degree view of business partners as soon as possible in order to maximize synergies from the merger. If the volumes are above just a few thousand entities it is not possible to obtain that using human resources alone. Automated matching is the only realistic option.

The types of entities to be matched may be:

Private customers – individuals and households (B2C)
Business customers (B2B) on account level, enterprises, legal entities and branches
Contacts for these accounts

I have developed a slightly extended version of this typification here.

One of the most common challenges in merging customer databases is that hierarchy management may have been done very different in the past within the merging bodies. When aligning different perceptions I have found that a real world approach often fulfils the different reasoning.

The fuzziness needed for the matching is basically dependent on the common unique keys available in the two databases. These are keys as citizen ID’s (whatever labeled around the world) and public company ID’s (the same applies). Matching both databases with an external source (per entity type) is an option. “Duns Numbering” is probably the most common known type of such an approach. Maintaining a solution for assigning Duns Numbers to customer files from the D&B WorldBase is by the way one of my other assignments as described here.

The automated matching process may be divided into these three steps:

During my many years of practice in doing this I have found that the result from the automated process may vary considerable in quality and speed depending on the tools used.

Data Quality from the Cloud

19th April 201019th July 2010Henrik Gabs Liliendahl11 Comments

One of my favorite data quality bloggers Jim Harris wrote a blog post this weekend called “Data, data everywhere, but where is data quality?”

I believe in that data quality will be found in the cloud (not the current ash cloud, but to put it plainer: on the internet). Many of the data quality issues I encounter in my daily work with clients and partners is caused by that adequate information isn’t available at data entry – or isn’t exploited. But information needed will in most cases already exist somewhere in the cloud. The challenge ahead is how to integrate available information in the cloud into business processes.

Use of external reference data to ensure data quality is not new. Especially in Scandinavia where I live, this has been in use for long because of the tradition with public sector recording data about addresses, citizens, companies and so on far more intensely than done in the rest of the world. The Achilles Heel though has always been how to smoothly integrate external data into data entry functionality and other data capture processes and not to forget, how to ensure ongoing maintenance in order to avoid else inevitable erosion of data quality.

The drivers for increased exploitation of external data are mainly:

Accessibility, which is where the fast growing (semantic) information store in the cloud helps – not at least backed up by the world wide tendency of governments releasing public sector data
Interoperability where increased supply of Service Orientated Architecture (SOA) components will pave the way
Cost; the more subscribers to a certain source, the lower the price – plus many sources will simply be free

As said, smoothly integration into business processes is key – or sometimes even better, orchestrating business processes in a new way so that available and affordable information (from the cloud) is pulled into these business processes using only a minimum of costly on premise human resources.

My Ash Cloud Prediction

18th April 20109th October 2011Henrik Gabs LiliendahlLeave a comment

The Master Data Management Summit Europe 2010 starts tomorrow. I have attended the IRM events in London several times (and also spoken there once). This year I didn’t plan to go to London in April because I predicted the no fly havoc in Northern Europe that would follow the Iceland volcanic eruption given the wind direction. Not?

Beyond Home Improvement

14th April 20106th July 2010Henrik Gabs Liliendahl6 Comments

During my many years in customer master data quality improvement I have worked with a lot of clients having data from several countries. In almost every case the data has been prioritized in two pots:

Master Data referring to domestic customers
Master Data referring to foreign customers

Even though the enterprise defines itself as an international organization, the term domestic still in a lot of cases is easily assigned to the country where a headquarter is situated and where the organization was born.

Signs of this include:

Data formats are designed to fit domestic customers
Internal reference data are richer for domestic locations
External reference data services are limited to domestic customers

The high prioritizing of domestic data is of course natural for historical reasons, because domestic customers almost certainly are the largest group, and because the rules are common to most delegates in a data quality program.

If we accept the fact that improving data quality will be reflected in an improved bottom line, there is still a margin you may improve by not stopping when having optimal procedures for domestic data.

One way of dealing with this in an easy way is to apply general formats, services and rules that may work for data from all over the world, and this approach may in some cases be the best considering costs and benefits.

But I have no doubt that achieving the best data quality with customer master data is done by exploiting the specific opportunities that exist for each country / culture.

Examples are:

The completeness and depth for address (location) data available in each country is very different – so are the rules of the postal service’s operating there
Public sector company and citizen registration practice also differs why the quality of external reference data is different and so are the rules of access to the data.
Using local character sets, script systems, naming conventions and addressing formats besides (or instead of) what applies to that of the headquarter helps with data quality through real world alignment

My guess is that we will see services in cloud in the near future helping us making the global village also come true for master data quality.

Matchback and Master Data Management

10th April 201020th March 2011Henrik Gabs LiliendahlLeave a comment

The term matchback is used by marketers for the process of determining which marketing activity that triggered a given purchase. In these times where multichannel marketing and sale is embraced by more and more companies, doing matchback is becoming more and more complicated.

The core functionality in matchback is the good old data matching, like: Does the name and address in a catalogue sending match (with a certain similarity) the name and address of a new buyer? But you also have to ask questions as: Is this buyer in fact a new buyer or did he buy before – in this channel or in another channel? Was this buyer also included in a concurrent email campaign? If private: Is the new buyer in the same household as an old buyer? If business: Does the new buyer belong to the same company family tree as the old buyer? Was the contact actually a contact at an old business customer?

Answering these questions will be a totally mess if you don’t have a solid party master data management program in place. You need to:

Store (or at least reference) all party entities from all channels in one single so called golden copy
Identify the same real world entities
Build the hierarchies necessary for current and possible future uses of data

Doing matchback is only one of many activities setting the requirements for party master data management program within an enterprise. And by the way: When that is up and running next thing you need is to manage your product master data the same way in order to make further analysis’s – and probably you also need to have a better structure and data quality with your location master data.

I keep my notes about Master Data Management here.

Enterprise Data Mashup and Data Matching

6th April 20107th July 2010Henrik Gabs Liliendahl3 Comments

A mashup is a web page or application that uses or combines data or functionality from two or many more external sources to create a new service. Mashups can be considered to have an active role in the evolution of social software and Web 2.0. Enterprise Mashups are secure, visually rich web applications that expose actionable information from diverse internal and external information sources. So says Wikipedia.

I think that Enterprise Mashups will need data matching – and data matching will improve from data mashups.

The joys and challenges of Enterprise Mashups was recently touched in the post “MDM Mashups: All the Taste with None of the Calories” by Amar Ramakrishnan of Initiate. Data needs to be cleansed and matched before being exposed in an Enterprise Mashup. An Enterprise Mashup is then a fast way to deliver Master Data Management results to the organization.

Party Data Matching has typically been done in these two often separated contexts:

Matching internal data like deduplicating and consolidating
Matching internal data against an external source like address correction and business directory matching

Increased utilization of multiple functions and multiple sources – like a mashup – will help making better matching. Some examples I have tried includes:

If you know whether an address is unique or not this information is used to settle a confidence of an individual or household duplicate.
If you know if an address is a single residence or a multiple residence (like a nursing home or campus) this information is used to settle a confidence of an individual or household duplicate.
If you know the frequency of a name (in a given country) this information is used to settle a confidence of a private, household or contact duplicate.

As many data quality flaws (not surprisingly) are introduced at data entry, mashups may help during data entry, like:

An address may be suggested from an external source.
A business entity may be picked from an external business directory.
Various rules exist in different countries for using consumer/citizen directories – why not use the best available where you do business.

Also the rise of social media adds new possibilities for mashup content during data entry, data maintenance and for other uses of MDM / Enterprise Mashups. Like it or not, your data on Facebook, Twitter and not at least LinkedIn are going to be matched and mashed up.

Reduplication: The next big thing?

1st April 20107th July 2010Henrik Gabs Liliendahl6 Comments

Today I got a very exciting Master Data Management assignment. Usually I do deduplication processes which means that two or more rows in a database are merged into one golden record because the original rows represents the same real world entity.

But in this case we are going to split one row into several rows with random keys (a so called MNUID = Messy Non-Unique IDentifier). Also names and addresses have to be misspelled in different ways so they are not easily recognized as being the same.

My client, the Danish Tax Authorities, has for years tried to develop methods for taxation above 100% and has finally reached this simple but very efficient method. Until now you as one person or one company pay up to 60% tax, but now each duplicate row will pay 60%. Hereby in phase one you may in fact pay 120%, but in later phases this will be extended to larger duplicate groups paying much higher percentages.

Already some foreign tax authorities have shown deep interest in this model (called Intelligent Reduplication for Supertaxation). First of all our Scandinavian neighbors are very interested, but eventually it may spread to the rest of the world.

Which came first, the chicken or the egg?

29th March 20107th April 2012Henrik Gabs Liliendahl3 Comments

The most common symbol for Easter, which is just around the corner in countries with Christian cultural roots, is the decorated egg. What a good occasion to have a little “which came first” discussion.

So, where do you start if you want better information quality: Data Governance or Data Quality improvement?

In order to look at it exemplified with something that is known to nearly everyone’s business, let’s look at party master data where we face the ever recurring questing: What is a customer? Do you have to know the precise answer to that question (which looks like a Data Governance exercise) before correcting your party master data (which often is a Data Quality automation implementation).

I think this question is closely related to the two ways of having high quality data:

Either they are fit for their intended uses
Or they correctly represent the real-world construct to which they refer

In my eyes the first way, make data fit for their intended uses, is probably the best way if you aim for information quality in one or two silos, but the second way, alignment with the real world, is the best and less cumbersome way, if you aim for enterprise wide information quality where data are fit for current and future multiple purposes.

So, starting with Data Governance and then long way down the line applying some Data Quality automation like Data Profiling and Data Matching seems to be the way forward in if you go for intended use.

On the other hand, if you go for real world alignment it may be best that you start with some Data Profiling and Data Matching in order to realize what the state of your data is and make the first corrections towards having your party master data aligned with the real world. From there you go forward with an interactive Data Governance and Data Quality automation (never ending) journey which includes discovering what a customer role really is.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph