Master Data – Page 50 – Liliendahl on Data Quality

B2C versus B2B Data Quality

8th June 2011Henrik Gabs Liliendahl8 Comments

The data quality issues in doing business with private consumers (business-to-consumer = B2C) and doing business with other business’s (business-to-business = B2B) have a lot of similar challenges but also differs in a lot of ways.

Some of my experiences (and thoughts) related to different master data domains are:

Customer master data

In B2C the number of customers, prospects and leads is usually high and characterized by relatively few interactions with each entity. In B2B you usually have a relatively small number of customers with a high number of interactions.

One of the most automated activities in data quality improvement is matching master data records with information about customers. Many of the examples we see in marketing material, research documents, blog posts and so on is about matching in the B2C realm. This is natural since the high number of records typically with a low attached value calls for automation.

Data matching in the B2B realm is indeed more complex due to numerous challenges like less standardized names of companies and typically more options in what constitutes a single customer. The high value attached to each customer also makes the risk of mistakes a showstopper for too much automation.

So in B2B we see an increasing adaption of creating workflows that insures data quality during data capture often by exploiting external reference data which also in general are more available related to business entities.

Location master data

The location of B2C customers means a lot. Accurate and timely delivery addresses for everything from direct mails to bringing goods to the premises are essential. Location data are used to recognize household relations, assigning demographic stereotypes and in many cases calculating fees of different kind. I had a near disaster experience with a really bad address in my early career.

Even though location data for B2B activities theoretically is just as important, I have often seen that a little less precision is fit for purpose or anyway lower prioritized than more pressing issues.

Product master data

Theoretically there should be no difference between B2C and B2B here, but I guess there is in practice?

The most interesting aspect is probably the multi-domain aspect examining the relations between customers and products.

I had some experiences some years ago with the B2B realm as described in the post What is Multi-Domain MDM?: 1,000 B2B customers buying 1,000 different finished products can be a quite complicated data quality operation.

Within the B2C realm the most predominant multi-domain data quality issues I have met is related to analytics. As discussed in the post Customer/Product Matrix Management it is about typifying your customers correctly and categorizing your products adequately at the same time.

A Business Rule and a Missing Master Data Hub

28th April 2011Henrik Gabs LiliendahlLeave a comment

It seems that the United States of America has a problem with the business rule saying you have to be born in the country to become president and a missing citizen master data hub telling about who’s born in the country.

This is an aspect of a previous blog post called Did They Put a Man on the Moon.

Single Company View

27th April 2011Henrik Gabs Liliendahl2 Comments

Getting a single customer view in business-to-business (B2B) operations isn’t straight forward. Besides all the fuzz about agreeing on a common definition of a customer within each enterprise usually revolving around fitting multiple purposes of use, we also have complexities in real world alignment.

One Number Utopia

Back in the 80’s I worked as a secretary for the committee that prepared a single registry for companies in Denmark. This practice has been live for many years now.

But in most other countries there are several different public registries for companies resulting in multiple numbering systems.

Within the European Union there is a common registry embracing VAT numbers from all member states. The standard format is the two letter ISO country code followed by the different formatted VAT number in each country – some with both digits and letters.

The DUNS-number used by Dun & Bradstreet is the closest we get to a world-wide unique company numbering system.

2-Tier Reality

The common structure of a company is that you have a legal entity occupying one or several addresses.

The French company numbering system is a good example of how this is modeled. You have two numbers:

SIREN is a 9-digit number for each legal entity (on the head quarter address).
SIRET is a 14-digit (9 + 5) number for each business location.

This model is good for companies with several locations but strange for single location companies.

Treacherous Family Trees (and Restaurants)

The need for hierarchy management is obvious when it comes to handling data about customers that belongs to a global enterprise.

Company family trees are useful but treacherous. A mother and a daughter may be very close connected with lots of shared services or it may be a strictly matter of ownership with no operational ties at all.

Take McDonald’s as a not perfectly simple (nor simply perfect) example. A McDonald’s restaurant is operated by a franchisee, an affiliate, or the corporation itself. I’m lovin’ modeling it.

Inside India

7th April 201111th April 2011Henrik Gabs LiliendahlLeave a comment

This is the second post in a series of short blog posts focusing on data quality related to different countries around the world. I am not aiming at presenting a single version of the full truth but rather presenting a few random observations that I hope someone living in or with knowledge about the country are able to clarify in a comment.

Cultural Diversity

India‘s culture is marked by a high degree of syncretism and cultural pluralism. Every state and union territory has its own official languages, and the constitution also recognizes 21 languages.

National Identification Number for 1.2 Billion People

The government of India has initiated a program for assigning a unique citizen ID for the over 1.2 billion people living in India. The program called Aadhaar is the largest of that kind in the world.

A System Integration Superpower

Tata, Satyam, Infosys, Wipro is just some of the many mega system integrators within master data management and data quality with headquarters in India. Add to that that companies like Cognizant and many others have most of their professionals based in India.

Does One Size Fit Anyone?

5th April 2011Henrik Gabs Liliendahl2 Comments

Following up on a recent post about data silos I have been thinking (and remembering) a bit about the idea that one company can have all master data stored in a single master data hub.

Supply Chain Musings

If you for example look at a manufacturer the procurement of raw materials is of course an important business process.

Besides purchasing raw materials the manufacturer also buys machinery, spare parts for the machinery and maintenance services for the machinery.

Like everyone else the manufacturer also buys office supplies – including rare stuff as data quality tools and master data management consultancy.

If you look at the vendor table in such a company the number of “supporting suppliers” are much higher than the number of the essential suppliers of raw materials. The business processes, data structures and data quality metrics for on-boarding and maintaining supplier data and product data are “same same but very different” for these groups of suppliers and the product data involved.

Supply Chain Centric Selling

I remember at one client in manufacturing a bi-function in procurement was selling bi-products from the production to a completely different audience than the customers for the finished products. They had a wonderful multi-domain data silo for that.

Hierarchical Customer Relations

A manufacturer may have a golden business rule saying that all sales of finished products go through channel partners. That will typically mean a modest number of customers in the basic definition being someone who pays you. Here you typically need a complex data structure and advanced workflows for business-to-business (B2B) customer relationship management.

Your channel partners will then have customers being either consumers (B2B2C) or business users within a wider range of companies. I have noticed an increasing interest in keeping some kind of track of the interaction with end users of your products, and I guess embracing social media will only add to that trend. The business processes, data structures and data quality metrics for doing that are “same same but very different” from your basic customer relationship management.

Conclusion

The above musings are revolved around manufacturing companies, but I have met similar ranges of primary and secondary constructs related to master data management in all other industry verticals.

So, can all master data in a given company be handled in a single master data hub?

I think it’s possible, but it has to be an extremely flexible hub either having a lot of different built-in functionality or being open for integration with external services.

Holistic Accuracy

29th March 2011Henrik Gabs LiliendahlLeave a comment

In community economics you have two terms called

Partitive accuracy and
Holistic accuracy

In short, partitive accuracy is the accuracy of a single measure being part of a model while holistic accuracy is the accuracy of the model structure and its use. More information here.

I find these terms being very useful in data quality and master data management as well.

The distinction between partitive accuracy and holistic accuracy resembles the distinction between data quality and information quality.

One problem with the term information quality is that it implies a certain context of use, which makes it hard to prepare data for having high data quality for multiple uses other than assuring the accuracy of the single data elements – being similar to the term partitive accuracy.

One clue for assuring better information quality is looking at the model structure of data – being similar to the term holistic accuracy. Here I am thinking beyond traditional data modeling, which is anchored in the technical world, and into how end users of master data hubs are able to build structures of data (with partitive accuracy) that fits the daily business use.

Examples of such holistic information capabilities in master data management will be building flexible product hierarchies and hierarchies of party master data that at the same time reflects hierarchies in the real world as households and company family trees and hierarchies of related accounts and addresses used within the enterprise.

While a single data element as an address component like a postal code may be partitive accurate, the holistic accuracy is seen as how data elements contribute to a holistic accuracy as a part of a data structure that fits multiple purposes of use.

Happy Uniqueness

27th March 20118th May 2012Henrik Gabs LiliendahlLeave a comment

When making the baseline for customer data in a new master data management hub you often involve heavy data matching in order to de-duplicate the current stock of customer master data, so you so to speak start with a cleansed duplicate free set of data.

I have been involved in such a process many times, and the result has never been free of duplicates. For two reasons:

Even with the best data matching tool and the best external reference data available you obviously can’t settle all real world alignments with the confidence needed and manual verification is costly and slowly.
In order to make data fit for the business purposes duplicates are required for a lot of good reasons.

Being able to store the full story from the result of the data matching efforts is what makes me, and the database, most happy.

The notion of a “golden record” is often not in fact a single record but a hierarchical structure that reflects both the real world entity as far as we can get and the instances of this real world entity in a form that are suitable for different business processes.

Some of the tricky constructions that exist in the real world and are usual suspects for multiple instances of the same real world entity are described in the blog posts:

The reasons for having business rules leading to multiple versions of the truth are discussed in the posts:

I’m looking forward to yet a party master data hub migration next week under the above conditions.

Hierarchical Completeness

22nd March 2011Henrik Gabs Liliendahl7 Comments

A common technique used when assessing data quality is data profiling. For example you may count different measures as number of fields in a table that have null values or blank values, distribution of filled length of a certain field, average values, highest values, lowest values and so on.

If we look at the most prominent entity types in master data management being customers and products you may certainly also profile your customer tables and product tables and indeed many data profiling tutorials use these common sort of tables as examples.

However, in real life profiling an entire customer table or product table will often be quite meaningless. You need to dig into the hierarchies in these data domains to get meaningful measures for your data quality assessment.

Customer master data

In profiling customer master data you must consider the different types of party master data as business entities, department entities, consumer entities and contact entities, as the demands for completeness will be different for each type. If your raw data don’t have a solid categorization in place, a prerequisite for data profiling will often be to make such a categorization before going any further.

If your customer data model isn’t too simple, as explained in post A Place in Time, your location data (like shipping addresses, billing addresses, visiting addresses) will be separated from your customer naming and identification data. This hierarchical structure must be considered in your data profiling.

For international customer data there will also be different demands and possibilities for completeness of customer data elements.

Depending on your industry and way of doing business there may also be different demands for customer data related to different industry verticals, demographic groups and data sourced in different channels. However this may be a slippery ground, as current and not at least future requirements for multiple uses of the same master data may change the picture.

Product master data

For most businesses the requirements for completeness and other data profiling measures will be very different depending on the product type.

Some requirements will only apply to a small range of products; other requirements apply to a broader range of products.

All in all the data profiling requirements is an integrated part of hierarchy management for product master data which make a very strong case for having data profiling capabilities implemented as part of a product information management (PIM) solution.

Multi-Domain Master Data Management

For master data management solutions embracing both customer data integration (CDI) and product information management (PIM) integrated capabilities for profiling customer master data, location master data and product master data as part of hierarchy management makes a lot of sense.

As improving data quality isn’t a one-off activity but a continuous program, so is the part being measuring the completeness of your master data of any kind.

Non-Obvious Entity Relationship Awareness

16th March 201116th March 2011Henrik Gabs Liliendahl2 Comments

In a recent post here on this blog it was discussed: What is Identity Resolution?

One angle was the interchangeable use of the terms “Identity Resolution” and “Entity Resolution”. These terms can be seen as truly interchangeable, as that “Identity Resolution” is more advanced than “Entity Resolution” or as (my suggestion) that “Identity Resolution” is merely related to party master data, but “Entity Resolution” can be about all master data domains as parties, locations and products.

Another term sometimes used in this realm is “Non-Obvious Relationship Awareness”. Also this term is merely related to finding relationships between parties, for example individuals at a casino that seems to do better than the croupiers. Here’s a link to a (rather old) O’Reilly Radar post on Non-Obvious Relationship Awareness.

Going Multi-Domain

So “Non-Obvious Entity Relationship Awareness” could be about finding these hidden relationships in a multi-domain master data scope.

An example could be non-obvious relationships in a customer/product matrix.

The data supporting this discovery will actually not be found in the master data itself, but in transaction data probably being in an Enterprise Data Warehouse (EDW). But a multi-domain master data management platform will be needed to support the complex hierarchies and categorizations needed to make the discovery.

One technical aspect of discovering such non-obvious relationships is how chains of keys are stored in the multi-domain master data hub.

Customer Master Data

The transactions or sums hereof in the data warehouse will have keys referencing customer accounts. These accounts can be stored in staging areas in the master data hub with references to a golden record for each individual or company in the real world. Depending on the identity resolution available the golden records will have golden relations to each other as they are forming hierarchies of households, company family trees, contacts within companies and their movements between companies and so on.

My guess as described in the post Who is working where doing what? is that this will increasingly include social media data.

Product Master Data

Some of the same transactions or sums hereof in the data warehouse will have keys referencing products. These products will exist in the master data hub as members of various hierarchies with different categorizations.

My guess is that future developments in this field will further embrace not just your own products but also competitor products and market data available in the cloud all attached to your hierarchies and categorizations.

Foreign Affairs

11th March 201111th August 2011Henrik Gabs Liliendahl2 Comments

There is a famous poster called The New Yorker. This poster perfectly illustrates the centricity we often have about the town, region or country we live in.

The same phenomenon is often seen in data management.

I mentioned United States centricity as a minor criticism in my recent book review about the excellent book “Master Data Management and Data Governance”.

An example from the book is this statement:

“It is important to differentiate between U.S. domestic addresses and international addresses. This distinction is important for U.S.-centric MDM solutions because U.S. domestic addresses are normally better defined and therefore can be processed in a more automatic fashion, while international addresses require more manual intervention.”

The same fact could be expressed by saying:

“It is important to differentiate between Danish domestic addresses and international addresses. This distinction is important for Danish-centric MDM solutions because Danish domestic addresses are normally better defined and therefore can be processed in a more automatic fashion, while international addresses require more manual intervention.”

Only, the better formatted address in the first case is the messy address in the last case, and the better formatted address in the last case is the messy address in the first case.

If your MDM scope is country-centric it is sensible to concentrate on automation related to that country.

If your MDM scope is international there are two options:

The easy way: The one size fits all option. This is a moderate investment, but also, it only yields moderate results in terms of automation and data quality.
The hard way: You have to implement specialized automation and investigate best external reference data for each country. I made a Danish-centric post on that last year here.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph