Evolution – Liliendahl on Data Quality

At Least Two Versions of the Truth

26th April 201226th April 2012Henrik Gabs LiliendahlLeave a comment

Precisely one year ago I wrote a post called Single Company View examining the challenges of getting a single business partner view in business-to-business (B2B) party master data.

Yesterday Robert Hawker of Vodafone made a keynote at the MDM Summit Europe 2012 telling about supplier master data management.

One of the points was that sometimes you really want the exactly same real world entity to be two golden records in your master data hub, as there may be totally different business activities made with the same legal entity. The Vodafone example was:

Having an antenna placed on the top of a building owned by a certain company and thus paying a fee for that
Buying consultancy services from the same company

I have met such examples many times when doing data matching as told in the post Entity Revolution vs Entity Evolution.

However at one occasion, many years ago, I worked in a company where not having a single business partner view nearly became a small disaster.

Our company delivered software for membership administration and was at the same time a member of an employer organisation that also happened to be a customer.

A new director got the brilliant idea, that cancelling the membership of the employer organization was an obvious cost reduction.

The cancellation was sent. The employer organisation confirmed the cancellation adding, that they were very sorry that internal business rules at the same time forced them to not being a customer anymore.

Cancellation was cancelled of course and damage control was initiated.

Eating the MDM Elephant

27th March 2012Henrik Gabs LiliendahlLeave a comment

The idiom of eating the elephant one bite at time is often used when trying to vision a roadmap for Master Data Management (MDM).

It’s a bit of a contradiction to look at it that way, because the essence of MDM is an enterprise wide single source of truth eventually for all master data domains.

But it may be the only way.

Using a cliché MDM is (as any discipline) about people, processes and technology.

In an earlier post called Lean MDM a data quality and entity resolution technology focused approach to start consuming the elephant was described, here starting with building universal data models for party master data and rationalizing the data within a short frame of time.

I have often encountered that many organizations actually don’t want an entity revolution but are more comfortable with having entity evolution when it comes to entity resolution as examined the post Entity Revolution vs Entity Evolution.

The term “Evolutionary MDM” is used by the MDM vendor Semarchy as seen on this page here called What is Evolutionary MDM?

The idea is to have technology that supports an evolutionary way of implementing MDM. This is in my eyes very important, as people, processes and technology may be prioritized in the said order, but shouldn’t be handled in a serial matter that reveals the opportunities and restrictions related to technology at a very late stage in implementing MDM.

Mutating Platforms or Intelligent Design

16th July 201127th March 2012Henrik Gabs Liliendahl2 Comments

How do we go from single-domain master data management to multi-domain master data management? Will it be through evolution of single-domain solutions or will it require a complete new intelligent design?

The MDM journey

My previous blog post was a book review of “Master Data Management in Practice” by Dalton Servo and Mark Allen – or the full title of the book is in fact “Master Data Management in Practice: Achieving True Customer MDM”.

The customer domain has until now been the most frequent and proven domain for master data management and as said in the book, the domain where most organizations starts the MDM journey in particular by doing what is usually called Customer Data Integration (CDI).

However some organizations do start with Product Information Management (PIM). This is mainly due to the magic numbers being the fact that some organizations have a higher number of products than customers in the database.

Sooner or later most organizations will continue the MDM journey by embracing more domains.

Achieving Multi-Domain MDM

John Owens made a blog post yesterday called “Data Quality: Dead Crows Kill Customers! Dead Crows also Kill Suppliers!” The post explains how some data structures are similar between sales and purchasing. For example a customer and a supplier are very similar as a party.

Customer Data Integration (CDI) has a central entity being the customer, which is a party. Product Information Management (PIM) has an important entity being a supplier, which is a party. The data structures and the workflows needed to Create, Read, Update and perhaps Delete these entities are very similar, not at least in business-to-business (B2B) environments.

So, when you are going from PIM to CDI, you don’t have to start from scratch, not at least in a B2B environment.

The trend in the master data management technology market is that many vendors are working their way from being a single domain vendor to being a multi-domain vendor – and some are promoting their new intelligent design embracing all domains from day one.

Some other vendors are breeding several platforms (often based on acquisition) from different domains into one brand, and some vendors are developing from a single domain into new domains.

Each strategy has its pros and cons. It seems there will be plenty of philosophies to choose from when organizations are going the select the platform(s) to support the multi-domain MDM journey.

Survival of the Fit Enough

29th January 201122nd June 2019Henrik Gabs Liliendahl2 Comments

When working with data quality and master data management at the same time you are constantly met with the challenge that data quality is most often defined as data being fit for the purpose of use, but master data management is about using the same data for multiple purposes at the same time.

Finding the right solution to such a challenge within an organization isn’t easy, because it despite all good intentions is difficult to find someone in the business with an overall answer to that kind of problems as explained in the blog post by David Loshin called Communications Gap? Or is there a Gap between Chasms?

An often used principle for overcoming these issues may (based on Darwin) be seen as “survival of the fittest”. You negotiate some survivorship rules between “competing” data providers and consumers and then the data being the fittest measured by these rules wins. All other data gets the KISS of death. Most such survivorship rules are indeed simple often based on a single dimension as timeliness, completeness or provenance.

Recently the phrase “survival of the fittest” in evolution theory has been suggested to be changed to “survival of the fit enough” because it seems that many times specimens haven’t competed but instead found a way into empty alternate spaces.

It seems that master data management and related data quality is going that way too. Data that is fit enough will survive in the master data hub in alternate spaces where the single source of truth exists in perfect symbioses with multiple realities.

Entity Revolution vs Entity Evolution

18th November 201027th March 2012Henrik Gabs Liliendahl8 Comments

Entity resolution is the discipline of uniquely identifying your master data records, typically being those holding data about customers, products and locations. Entity resolution is closely related to the concept of a single version of the truth.

Questions to be asked during entity resolution are like these ones:

Is a given customer master data record representing a real world person or organization?
Is a person acting as a private customer and a small business owner going to be seen as the same?
Is a product coming from supplier A going to identified as the same as the same product coming from supplier B?
Is the geocode for the center of a parcel the same place as the geocode of where the parcel is bordering a public road?

We may come a long way in automating entity resolution by using advanced data matching and exploiting rich sources of external reference data and we may be able to handle the complex structures of the real world by using sophisticated hierarchy management and hereby make an entity revolution in our databases.

But I am often faced with the fact that most organizations don’t want an entity revolution. There are always plenty of good reasons why different frequent business processes don’t require full entity resolution and will only be complicated by having it (unless drastic reengineered). The tangible immediate negative business impact of an entity revolution trumps the softer positive improvement in business insight from such a revolution.

Therefore we are mostly making entity evolutions balancing the current business requirements with the distant ideal of a single version of the truth.

Out-of-Africa

30th August 201027th March 2012Henrik Gabs Liliendahl4 Comments

Besides being a memoir by Karen Blixen (or the literary double Isak Dinesen) Out-of-Africa is a hypothesis about the origin of the modern human (Homo Sapiens). Of course there is a competing scientific hypothesis called Multiregional Origin of Modern Humans. Besides that there is of course religious beliefs.

The Out-of-Africa hypothesis suggests that modern humans emerged in Africa 150,000 years ago or so. A small group migrated to Eurasia about 60,000 years ago. Some made it across the Bering Strait to America maybe 40,000 years ago or maybe 15,000 years ago. The Vikings said hello to the Native Americans 1,000 years ago, but cross Atlantic movement first gained pace from 500 years ago, when Columbus discovered America again again.

½ year ago (or so) I wrote a blog post called Create Table Homo_Sapiens. The comment follow up added to the nerdish angle with discussing subjects as mutating tables versus intelligent design and MAX(GEEK) counting.

But on the serious side comments also touched the intended subject about making data models reflect real world individuals.

Tables with persons are the most common entity type in databases around. As in the Out-of-Africa hypothesis it could have been as a simple global common same structural origin. But that is not the way of the world. Some of the basic differences practiced in modeling the person entity are:

Cultural diversity: Names, addresses, national ID’s and other basic attributes are formatted differently country by country and in some degree within countries. Most data models with a person entity are build on the format(s) of the country where it is designed.
Intended purpose of use: Person master data are often stored in tables made for specific purposes like a customer table, a subscriber table a contact table and so on. Therefore the data identifying the individual is directly linked with attributes describing a specific role of that individual.
“Impersonal” use: Person data is often stored in the same table as other party master types as business entities, projects, households et cetera.

Many, many data quality struggles around the world is caused by how we have modeled real world – old world and new world – individuals.

Create Table Homo_Sapiens

23rd January 201027th March 2012Henrik Gabs Liliendahl19 Comments

Create Table is a basic statement in the SQL language which is the most widespread computer language used when structuring data in databases.

The most common entity in databases around must be rows representing real world human beings (Homo Sapiens) and the different groups we form. Tables for that could have the name Homo_Sapiens but is usually called Customer, Member, Citizen, Patient, Contact and so on.

The most common data quality issues around is related to accuracy, validity, timeliness, completeness and not at least uniqueness with the data we hold about people.

In databases tables are supposed to have a unique primary key. There are two basic types of primary keys:

Surrogate keys are typically numbers with no relation (and binding) to the real world. They are made invisible to the users of the applications operating on the database.
Natural keys are derived from existing codes or other data identifying an entity in the real world or made for that purpose. They are visible to users and part of electronic, written and verbal communication.

As surrogate keys obviously don’t help with real world uniqueness and there are no common global natural key for all human beings on the earth we have a challenge in creating a good primary key for a Homo Sapiens table.

Inside a given country we have different forms of citizen ID’s (national identification number) with very varying terms of use between the countries. But even in Scandinavia where I live and we have widespread use of unique citizen ID’s most tables that could have the name Homo_Sapiens cannot use a Citizen ID as (unique) primary key for several reasons as well as that data is not present in a lot of situations.

Most often we name the tables holding data about human beings by the role people will act in within the purpose of use for the data we collect. For example Customer Table. A customer may be an individual but also a household or a business entity. A human being may be a private consumer but also an employee at a business making a purchase or a business owner making both private purchases and business purchases.

Every business activity always comes down to interacting with individual persons. But as our data is collected for the different roles that individual may have acted in, we have a need for viewing these data related to single human beings. The methods for facilitating this have different flavours as:

Deduplication is the classic term used for describing processes where records are linked, merged or purged in order to make a golden copy having only one (parent) database row for each individual person (and other legal entities). This is usually done by matching data elements in internal tables with names and addresses within a given organisation.
Identity Resolution is about the same but – if a distinction is considered to exist – uses a wider range of data, rules and functionality to relate collected data rows to real world entities. In my eyes exploiting external reference data will add considerable efficiency in the years to come within deduplication / identity resolution.
Master Data Hierarchy Management again have the same goal of establishing a golden copy of collected data by emphasising on reflecting the complex structure of relationships in the real world as well as the related history.

Next time I am involved in a data modelling exercise I will propose a Homo_Sapiens table. Wonder about the odds for buy in from other business and technical delegates.

	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph
	Henrik Gabs Lilienda… on SAP and Master Data Manag…
	Conrad Greer on SAP and Master Data Manag…
	Henrik Gabs Lilienda… on SAP and Master Data Manag…
	Michael Fieg, Parsio… on SAP and Master Data Manag…
	Asifa on Data Fabric and Master Data…
	Henrik Gabs Lilienda… on Data Fabric and Master Data…