Data model – Page 5 – Liliendahl on Data Quality

Holistic Accuracy

29th March 2011Henrik Gabs LiliendahlLeave a comment

In community economics you have two terms called

Partitive accuracy and
Holistic accuracy

In short, partitive accuracy is the accuracy of a single measure being part of a model while holistic accuracy is the accuracy of the model structure and its use. More information here.

I find these terms being very useful in data quality and master data management as well.

The distinction between partitive accuracy and holistic accuracy resembles the distinction between data quality and information quality.

One problem with the term information quality is that it implies a certain context of use, which makes it hard to prepare data for having high data quality for multiple uses other than assuring the accuracy of the single data elements – being similar to the term partitive accuracy.

One clue for assuring better information quality is looking at the model structure of data – being similar to the term holistic accuracy. Here I am thinking beyond traditional data modeling, which is anchored in the technical world, and into how end users of master data hubs are able to build structures of data (with partitive accuracy) that fits the daily business use.

Examples of such holistic information capabilities in master data management will be building flexible product hierarchies and hierarchies of party master data that at the same time reflects hierarchies in the real world as households and company family trees and hierarchies of related accounts and addresses used within the enterprise.

While a single data element as an address component like a postal code may be partitive accurate, the holistic accuracy is seen as how data elements contribute to a holistic accuracy as a part of a data structure that fits multiple purposes of use.

A Place in Time

22nd January 201129th May 2012Henrik Gabs Liliendahl6 Comments

I remember when I had the first chemistry lesson in high school our teacher told us that we should forget all about the chemistry we had learned in primary school, because this was a too simply model not reflecting how the real world of chemistry actually work.

Since I have started working with data quality and master data management I have a pet peeve in data modeling, namely being the probably most common example of doing data modeling: The classic customer table. Example from a SQL tutorial here:

Compared to how the real world works this example has some diversity flaws, like:

state code as a key to a state table will only work with one country (the United States)
zipcode is a United States description only opposite to the more generic “Postal Code”
fname (First name) and lname (Last name) don’t work in cultures where given name and surname have the opposite sequence
The length of the state, zipcode and most other fields are obviously too small almost anywhere

More seriously we have:

fname and lname (First name and Last name) and probably also phone should belong to an own party entity acting as a contact related to the company
company name should belong to an own party entity acting in the role as customer
address1, address2, city, state, zipcode should belong to an own place entity probably as the current visiting place related to the company

Now I know this is just a simple example from a tutorial where you should not confuse by adding too much complexity. Agreed.

However many home grown solutions in business life and even many commercial ready-made applications use that kind of a data model to describe one of the most important business entities being our customers.

It may be that such a model does fit the purpose of use in some operations. Sometimes yes, sometimes no. But when reusing data from such a model on enterprise level and when adding business intelligence you are in big trouble. That is why we need master data hubs and why we need to transform data coming into the master data hub.

From such a customer record we don’t create just one golden record. We make or link several different related multi-domain entities as:

The contact as a person in our party domain – maybe we knew her before
The company in our party domain – maybe we knew the sister as a supplier before
The address in our place (location) domain – maybe we knew that address as a place in time before

Things Change

16th January 201116th January 2011Henrik Gabs Liliendahl7 Comments

Yesterday I posted a small piece called So I’m not a Capricorn? about how astrology may (also) be completely wrong because something has changed.

On the serious side: Don’t expect that because you get it Right the First Time then everything will be just fine from this day forward. Things change.

The most known example in data quality prevention is probably that it is of course important that when you enter the address belonging to a customer, you get it right. But as people (and companies) relocates you must also have procedures in place tracking those movements by establishing an Ongoing Data Maintenance program in order to ensure the timeliness of your data.

The other thing, so to speak, is that having things right (the first time) is always seen in the context of what was right at that time. Maybe you always asked your customers for a physical postal address, but because your way of doing business has changed, you actually become much more interested in having the eMail address. And, because What’s in an eMail Address, you would actually like to have had all of them. So your completeness went from being just fine to being just awful by following the same procedure as last year.

Predicting accuracy is hard. Expect to deal with Unpredictable Inaccuracy.

Entity Revolution vs Entity Evolution

18th November 201027th March 2012Henrik Gabs Liliendahl8 Comments

Entity resolution is the discipline of uniquely identifying your master data records, typically being those holding data about customers, products and locations. Entity resolution is closely related to the concept of a single version of the truth.

Questions to be asked during entity resolution are like these ones:

Is a given customer master data record representing a real world person or organization?
Is a person acting as a private customer and a small business owner going to be seen as the same?
Is a product coming from supplier A going to identified as the same as the same product coming from supplier B?
Is the geocode for the center of a parcel the same place as the geocode of where the parcel is bordering a public road?

We may come a long way in automating entity resolution by using advanced data matching and exploiting rich sources of external reference data and we may be able to handle the complex structures of the real world by using sophisticated hierarchy management and hereby make an entity revolution in our databases.

But I am often faced with the fact that most organizations don’t want an entity revolution. There are always plenty of good reasons why different frequent business processes don’t require full entity resolution and will only be complicated by having it (unless drastic reengineered). The tangible immediate negative business impact of an entity revolution trumps the softer positive improvement in business insight from such a revolution.

Therefore we are mostly making entity evolutions balancing the current business requirements with the distant ideal of a single version of the truth.

Big Trouble with Big Names

10th October 201010th October 2010Henrik Gabs LiliendahlLeave a comment

An often seen issue in party master data management is handling information about your most active customers, suppliers and other roles of interest. These are often big companies with many faces.

I remember meeting that problem way back in the 80’s when I was designing a solution for the Danish Maritime Authorities.

In relation to a ship there are three different main roles:

The owner of the ship, who has some legal rights and obligations
The operator of ship, who has responsibilities regarding the seaworthiness of the ship
The employer, who has responsibilities regarding the seamen onboard the ship

Sometimes these roles don’t belong to the same company (or person) for a given ship. That real world reality was modeled all right. But even if it practically is the same company, then the roles are materialized very different for each role. I remember this was certainly the case with the biggest ship-owner in Denmark (and also by far the biggest company in Denmark) being the A.P. Moller – Maersk Group.

We really didn’t make a golden record for that golden company in my time on the project.

Business Directory Match: Global versus Local

6th October 201029th May 2012Henrik Gabs LiliendahlLeave a comment

When doing data quality improvement in business-to-business party master data an often used shortcut is matching your portfolio of business customers with a business directory and preferably picking new customers from the directory in the future.

If you are doing business in more than one country you will have some considerations about what business directory to use like engaging with a local business directory for each country or engaging with a single business directory covering all countries in question.

There are pro’s and con’s.

One subject is conformity. I have met this issue a couple of times. A business directory covering many countries will have a standardized way of formatting the different elements like a postal address, whereas a local (national) business directory will use best practice for the particular country.

An example from my home country Denmark:

The Dun & Bradstreet WorldBase is a business directory holding 170 million business entities from all over the world. A Danish street address is formatted like this:

Address Line 1 = Hovedgaden 12 A, 4. th

Observe that Denmark belongs to that half of the earth where house numbers are written after the street name.

In a local business directory (based on the public registry) you will be able to get this format:

Street name = Hovedgaden

Street code = 202 4321

House number = 012A

Floor = 04

Side/door = TH

Here you get an atomized address with metadata for the atomized elements and the unique address coding used in Denmark.

Out-of-Africa

30th August 201027th March 2012Henrik Gabs Liliendahl4 Comments

Besides being a memoir by Karen Blixen (or the literary double Isak Dinesen) Out-of-Africa is a hypothesis about the origin of the modern human (Homo Sapiens). Of course there is a competing scientific hypothesis called Multiregional Origin of Modern Humans. Besides that there is of course religious beliefs.

The Out-of-Africa hypothesis suggests that modern humans emerged in Africa 150,000 years ago or so. A small group migrated to Eurasia about 60,000 years ago. Some made it across the Bering Strait to America maybe 40,000 years ago or maybe 15,000 years ago. The Vikings said hello to the Native Americans 1,000 years ago, but cross Atlantic movement first gained pace from 500 years ago, when Columbus discovered America again again.

½ year ago (or so) I wrote a blog post called Create Table Homo_Sapiens. The comment follow up added to the nerdish angle with discussing subjects as mutating tables versus intelligent design and MAX(GEEK) counting.

But on the serious side comments also touched the intended subject about making data models reflect real world individuals.

Tables with persons are the most common entity type in databases around. As in the Out-of-Africa hypothesis it could have been as a simple global common same structural origin. But that is not the way of the world. Some of the basic differences practiced in modeling the person entity are:

Cultural diversity: Names, addresses, national ID’s and other basic attributes are formatted differently country by country and in some degree within countries. Most data models with a person entity are build on the format(s) of the country where it is designed.
Intended purpose of use: Person master data are often stored in tables made for specific purposes like a customer table, a subscriber table a contact table and so on. Therefore the data identifying the individual is directly linked with attributes describing a specific role of that individual.
“Impersonal” use: Person data is often stored in the same table as other party master types as business entities, projects, households et cetera.

Many, many data quality struggles around the world is caused by how we have modeled real world – old world and new world – individuals.

Same Same But Different

21st August 201015th December 2010Henrik Gabs Liliendahl7 Comments

The two most common master data types are:

Party master data (customers, prospects, suppliers and other business partners)
Product master data

When working with data quality within master data management you may of course encounter some similarities between these two master data types, but you will certainly also meet a range differences.

The basic activities as standardization, consolidation and hierarchy building are the same.

Some of the differences I have learned are:

Multi-cultural issues:

Party master data is often stored in a single global format but should be transformed to embrace multi-cultural diversities.
Product master data may have multi-cultural issues but should be transformed into a single global format (of course embracing multi-language hierarchies and so).

External reference data available:

For party master data the possibilities for real world alignment with external data sources are plenty.
For product master data the possibilities for real world alignment with external data sources are few.

Industry specific requirements:

Requirements for party master data quality are pretty much the same across industries with few variations as B2B (corporate customers) or B2C (private customers) or both being the most prominent.
Requirements for product master data quality vary tremendously across different industries.

Your say:

What are your examples of (similarities and) differences between party master data quality and product master data quality?

What are they doing?

19th August 201019th August 2010Henrik Gabs Liliendahl12 Comments

A core attribute in customer master data when dealing with business entities is assigning values for your customers/prospects industry vertical (or Line-of-Business or market segment or whatever metadata name you like).

When handling this particular data element you will come across many of the classic different options in data and information management.

Unstructured versus structured

Many early CRM (Customer Relationship Management) implementations offered a free text field for the industry vertical. While this approach may have been good for the free flow in data entry it of course has created havoc when business intelligence was applied to the CRM data. Countless cleansing projects have been done (and is going on) around in order to fix this basic mistake.

Most data entry forms today having an industry vertical value has a value list to choose from.

Your list versus an external standard

When having a value list it may be a list of your own creation or be based on an external standard list, for example SIC or NACE codes.

Having a list of your own tends to fulfill the data quality principle of fit for purpose of use while an external standard tends to fulfill the data quality principle of reflecting the real world construct.

The main weaknesses of a list of your own are that it requires continuous manual based maintenance and may cause conflicts. Deep down into a discussion on the Initiate MDM blog Julian Schwarzenbach offered a good example saying:

“I have also come across ‘flip-flop’ data – which is typically subjective data where two users cannot agree what the correct value is and it keeps getting changed between two values. This could be the classification of a customer by market sector where two different territories are reflecting different capabilities in their territories.” – Link here.

The main weaknesses of an external standard are that they seldom offer the granularity you need and for global data the different standards (SIC versions and different national NACE implementations and others) are a pain in the…

One versus several values

Many companies have more than one distinct activity. Catching only one (the primary) value for each company is keeping it simple, stupid. Having more than one value in relevant cases is adding complexity but may lead to better decisions.

Mixed Identities

5th July 20105th July 2010Henrik Gabs Liliendahl4 Comments

A frequent challenge when building a customer master data hub is dealing with incoming records from operational systems where the data in one record belongs to several real world entities.

One situation may be that that a name contains two (or more) real world names. This situation was discussed in the post Splitting names.

Another situation may be that:

The name belongs to real world entity X
The address belongs to real world entity Y
The national identification number belongs to real world entity Z

Fortunately most cases only have 2 different real world representations like X and Y or Y and Z.

An example I have encountered often is when a company delivers a service through another organization. Then you may have:

The name of the 3^rd party organization in the name column(s)
The address of the (private) end user in the address columns

Or as I remember seen once:

The name of the (private) end user in the name column(s)
The address of the (private) end user in the address columns
The company national identification number of the 3^rd party organization in the national ID column

Of course the root cause solution to this will be a better (and perhaps more complex) way of gathering master data in the operational systems. But most companies have old and not so easy changeable systems running core business activities. Swapping to new systems in a rush isn’t something just done either. Also data gathering may take place outside your company making the data governance much more political.

A solution downstream at the data matching gates of the master data hub may be to facilitate complex hierarchy building.

Oftentimes the solution will be that the single customer view in the master data hub will be challenged from the start as the data in some perception is fit for the intended purpose of use.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph