Happy Uniqueness

When making the baseline for customer data in a new master data management hub you often involve heavy data matching in order to de-duplicate the current stock of customer master data, so you so to speak start with a cleansed duplicate free set of data.

I have been involved in such a process many times, and the result has never been free of duplicates. For two reasons:

  • Even with the best data matching tool and the best external reference data available you obviously can’t settle all real world alignments with the confidence needed and manual verification is costly and slowly.
  • In order to make data fit for the business purposes duplicates are required for a lot of good reasons.

Being able to store the full story from the result of the data matching efforts is what makes me, and the database, most happy.

The notion of a “golden record” is often not in fact a single record but a hierarchical structure that reflects both the real world entity as far as we can get and the instances of this real world entity in a form that are suitable for different business processes.

Some of the tricky constructions that exist in the real world and are usual suspects for multiple instances of the same real world entity are described in the blog posts:

The reasons for having business rules leading to multiple versions of the truth are discussed in the posts:

I’m looking forward to yet a party master data hub migration next week under the above conditions.

Bookmark and Share

Hierarchical Completeness

A common technique used when assessing data quality is data profiling. For example you may count different measures as number of fields in a table that have null values or blank values, distribution of filled length of a certain field, average values, highest values, lowest values and so on.

If we look at the most prominent entity types in master data management being customers and products you may certainly also profile your customer tables and product tables and indeed many data profiling tutorials use these common sort of tables as examples.

However, in real life profiling an entire customer table or product table will often be quite meaningless. You need to dig into the hierarchies in these data domains to get meaningful measures for your data quality assessment.

Customer master data

In profiling customer master data you must consider the different types of party master data as business entities, department entities, consumer entities and contact entities, as the demands for completeness will be different for each type. If your raw data don’t have a solid categorization in place, a prerequisite for data profiling will often be to make such a categorization before going any further.

If your customer data model isn’t too simple, as explained in post A Place in Time, your location data (like shipping addresses, billing addresses, visiting addresses) will be separated from your customer naming and identification data. This hierarchical structure must be considered in your data profiling.

For international customer data there will also be different demands and possibilities for completeness of customer data elements.    

Depending on your industry and way of doing business there may also be different demands for customer data related to different industry verticals, demographic groups and data sourced in different channels. However this may be a slippery ground, as current and not at least future requirements for multiple uses of the same master data may change the picture.   

Product master data

For most businesses the requirements for completeness and other data profiling measures will be very different depending on the product type.

Some requirements will only apply to a small range of products; other requirements apply to a broader range of products.

All in all the data profiling requirements is an integrated part of hierarchy management for product master data which make a very strong case for having data profiling capabilities implemented as part of a product information management (PIM) solution.

Multi-Domain Master Data Management

For master data management solutions embracing both customer data integration (CDI) and product information management (PIM) integrated capabilities for profiling customer master data, location master data and product master data as part of hierarchy management makes a lot of sense.

As improving data quality isn’t a one-off activity but a continuous program, so is the part being measuring the completeness of your master data of any kind.

Bookmark and Share

Fuzzy Hierarchy Management

When evaluating results from automated data matching your goal is typically to find false positives and false negatives being entities that are matched, but shouldn’t be (false positives) and entities that are not matched, but should have been (false negatives).

However the fuzziness often used in the data matching process also apply to the evaluation of the results as many dubious results isn’t a question about if the matched database rows are reflecting the same real world entity but more a question about if the matched (or not matched) database rows are reflecting different members of a real world hierarchy.

Example 1:

John Smith on 1 Main Street in Anytown
Mary & John Smith on 1 Main Str in Anytown

Example 2:

Anytown Municipality, Technical Dept
Municipality of Anytown

Example 3:

Acme Corporation, Anytown
Acme Corporation, Anywhere

All three examples above may be considered a false positive if matched and a false negative if not matched.

You may say that it depends on the purpose of use, which is true.

But if we are talking master data management we may probably encompass multiple requirements where we simultaneously need the match and don’t want the match, which is why we need to be able to resolve and store the results from fuzzy data matching into hierarchies.

Bookmark and Share

Single Business Partner View

If you search in google for “single customer view” you’ll get over 20,000 hits. If you search for “single business partner view” you’ll get zero – until I just posted this blog post.

Some time ago I wrote about getting a 360° Business Partner View elaborating on extending the 360° Customer View or Single Customer View (SVC) to embrace all sorts of party master data managed within the organization.

In fact there is at least the same amount of similar techniques used between

  • managing supplier master data and business-to business (B2B) customer master data

as there is between

  • managing business-to-business (B2B) customer master data and business-to-consumer (B2C) customer master data.

If you look at Customer Relation Management (CRM) systems almost every package is aimed at managing B2B data as the data model and the functionality supports real world B2B structures and how the sales force and other employees interacts with B2B customers and prospects.

Interacting with B2C customers and prospects is much more diverse and often supported by operational systems specialized for the industry in question like solutions for financial services, healthcare and so on.

A business partner is a party acting in the role as customer, prospect, supplier, reseller, distributor, agent and other forms of partnership. Sometimes the same party is acting in several roles at the same time thus potentially being both on the Sell–side and Buy-side of Master Data Quality management.

As sell side and buy side has intersections within party master data, in some industries we may also go deeper into identity resolution and find intersections between B2B entities and B2C entities. I’ve described these matters in the post So, how about SOHO homes. The business case is that some products in some industries are aimed at the households of business owners and the small businesses at the same time. This is for example true for industries as banking, insurance, telco, real estate and  law.

All in all achieving a single view of business partners is a task going beyond traditional customer data integration (CDI) and stretching into areas traditionally belonging to Product Information Management (PIM). This is a business case for multi-domain master data management.

Bookmark and Share

Customer Product Matrix Management

A customer/product matrix is a way of describing the relationships between customer types and product types/attributes.  


Note: Please find some data quality related product descriptions in the post Data Quality and World Food.

Filling out the matrix may be based on prejudices, gut feelings, assumptions, surveys, focus groups or data.

If we go for data we may do this by collecting available historical data related to sales and inquiries made by persons belonging to each customer type regarding products belonging to each product type.  

In doing that correctly we need two kinds of master data management and data quality assurance in place:

  • Customer Data Integration (CDI) for assigning the accurate customer type in the real world related to the uniquely identified person in transactions coming from all sources – here based on location master data.
  • Product Information Management (PIM) for categorizing the relevant fit for purpose product type.

This reminds me about multi-domain master data management. Customer master data (or shall we say party master data), product master data and location master data used to figure out how to do business. I like it – both the master data management part and the mentioned product types.  

Bookmark and Share

Where is the Business?

In technology enabled disciplines we often like to divide an organization into two distinct parts being IT (Information Technology) and “the business”.

I am aware that we do that to emphasize that our solutions has to be business centric opposite to technology centric. We mustn’t fall into the trap of discussing technology too early and certainly not selecting certain technology brands as the first step of our solutions.

A problem however is where to find “the business” in an organization. The top management surely represents all of the business (including the IT part of the business). But in order to find the so called subject matter experts we are looking down the levels in the organization where people don’t belong to “the business” but to sales, marketing, customer service, purchase, production, human resources, finance and so on.

Some technology enabled disciplines belong to a certain department. But disciplines as (enterprise wide) data quality and master data management are supposed to support most departments. The business. So where do we find the business? And who are we by the way?

Call them?

Assuming it doesn’t matter who we are: Let’s go find “the business”. I guess it doesn’t help calling the reception and ask them to put us through to “the business”. Actually the manned reception probably doesn’t exist today. And it will be surprising to get a machine asking:

  • Do you want to speak with IT? Press 1.
  • Do you want to speak with “the business”? Press 2.

If we are in my home country Denmark we also have a linguistic issue. If I ask google to translate “the business” from English to Danish I get the word “forretningen”. If I ask google to translate “forretningen” from Danish back to English I get the word “shop”. So calling “forretningen” will probably get me to the shop floor. Not a bad place, a true gemba, but maybe not the only one.

Everyone belongs to “the business”

In data quality and master data management there is a question used all over to exemplify a common challenge within these disciplines.

The question is: What is a customer?

The challenge is that people from different departments will have different definitions. Marketing defines a customer one way, sales tend to do it a bit different, finance sees it yet in another way and production has their view point. And the stereotype IT guy defines a customer as a row in the customer table.

So now we are asking for Alexander the Great from “the business” to come cutting the Gordian Knot.

That is probably not going to happen.

More likely someone from any business unit will be able to negotiate a proper conceptual solution covering all requirements from the different business units. And from what I see around it may often be someone who’s human resource master data record is related to the IT part of the business. Or was. The main point is having a holistic view of the business where everyone belongs.    

Bookmark and Share

Electronic Data Processing

A comment on my last blog post took me back to the days when I started working with Information Technology (IT). At that time our métier actually wasn’t called IT but EDP (Electronic Data Processing) – at least that was the case in my home country Denmark where we used the local TLA being EDB (Elektronisk Data Behandling).

I have earlier touched the long standing discussion about if “data quality” should be rebranded as “information quality” for example in the post called new blog name, as this should also require a new name for this blog.

The words data and information are indeed used very randomly around. In MDM (Master Data Management) we have two main domains being Customer Data Integration (CDI) and Product Information Management (PIM). Wonder if customer data is old school and product information is new school?

Bookmark and Share

A Place in Time

I remember when I had the first chemistry lesson in high school our teacher told us that we should forget all about the chemistry we had learned in primary school, because this was a too simply model not reflecting how the real world of chemistry actually work.

Since I have started working with data quality and master data management I have a pet peeve in data modeling, namely being the probably most common example of doing data modeling: The classic customer table. Example from a SQL tutorial here:

Compared to how the real world works this example has some diversity flaws, like:

  • state code as a key to a state table will only work with one country (the United States)
  • zipcode is a United States description only opposite to the more generic “Postal Code”
  • fname (First name) and lname (Last name) don’t work in cultures where given name and surname have the opposite sequence
  • The length of the state, zipcode and most other fields are obviously too small almost anywhere

More seriously we have:

  • fname and lname (First name and Last name) and probably also phone should belong to an own party entity acting as a contact related to the company
  • company name should belong to an own party entity acting in the role as customer
  • address1, address2, city, state, zipcode should belong to an own place entity probably as the current visiting place related to the company

Now I know this is just a simple example from a tutorial where you should not confuse by adding too much complexity. Agreed.

However many home grown solutions in business life and even many commercial ready-made applications use that kind of a data model to describe one of the most important business entities being our customers.

It may be that such a model does fit the purpose of use in some operations. Sometimes yes, sometimes no. But when reusing data from such a model on enterprise level and when adding business intelligence you are in big trouble. That is why we need master data hubs and why we need to transform data coming into the master data hub.

From such a customer record we don’t create just one golden record. We make or link several different related multi-domain entities as:

  • The contact as a person in our party domain – maybe we knew her before
  • The company in our party domain – maybe we knew the sister as a supplier before
  • The address in our place (location) domain – maybe we knew that address as a place in time before

Bookmark and Share

My 2011 To Do List

These days are classic times for predicting something about next year in a blog post. This year I will make some egocentric predictions about what I am going to do next year. Fortunately I think these activities are pretty representative for the trends in the data quality realm.

My three most important challenges in working with data and information quality improvement and master data management will be:

Multi-Domain Master Data Quality

There are some different disciplines and product offerings around as:

  • Data Quality tools
  • Customer Data Integration (CDI) solutions
  • Product Information Management (PIM) platforms

These disciplines and the related software packages used to solve the challenges are constantly maturing and expanded to embrace the problems as a whole.

Find more about the subject in my posts on Multi-Domain MDM.

Exploiting rich external reference data sources in the cloud

Working with external reference sources as a mean to improve data quality has been a focus area of mine for many years.

Recent developments in governments releasing rich sources of data will help with availability here, but new challenges will also arise, like working with conformity across data sources coming from many different countries in many different ways.

Much of the activity here will happen in the cloud.

See my take on the subject on the page Data Quality 3.0 and read about a concrete implementation in instant Data Quality.

Downstream data cleansing

Despite constant improvements with data quality tools and master data management solutions moving us from batch cleansing downstream to upstream prevention there will still be lots of reasons for doing downstream cleansing projects.

Here are the top 5 reasons.

I expect to be involved in at least one of each type next year.

Bookmark and Share

Sell–side vs Buy-side Master Data Quality

The two most prominent domains in master data management and related data quality improvement are:

  • Party master data and
  • Product master data

Party Master Data

Most of the talk about party master data is about customer master data (including prospect master data). This discipline is often called Customer Data Integration (CDI).  Customer data is the sell-side of party master data. The organizations with the biggest pains in this area are mostly organizations with many customers (and prospects). The largest volumes of customer data is related to business-to-consumer (B2C) activities, but certainly we also see many grown customer databases in the business-to-business (B2B) realm.

The buy-side of party master data is supplier data. Fewer organizations have grown supplier databases, but surely big firms with many different departments and subsidiaries have supplier master data issues like the ones we see on the sell-side.

Also many organizations have a surprisingly large intersection of the same parties being both on the sell-side and on the buy-side. I have touched that subject in the post: 360° Business Partner View.

Product Master Data

Product Information Management (PIM) also has a sell-side and a buy-side. Also here the pains grow with the numbers. Opposite to party master data high sell-side numbers is more seldom than high buy-side numbers with product master data.

We often see high sell-side number of products at retailers where the same product also is buy-side at the same time, but where we maybe haven’t the same requirements for entity resolution at the same time. Most organizations don’t have that big issues (like problems with uniqueness) with own produced products.

Else high number of buy-side products is not so much related to buying raw materials as it is to buying things as spare parts and all kind of small equipment and assets of different kind (with software licenses being most close to herding cats I guess).

Multi-Domain Master Data Management

With multi-domain master data management there is of course a connection between sell-side party master data and sell-side product master data with opportunities in analyzing to whom we sell what and discovering cross selling openings and so on.

On the buy-side there are great potentials in looking into from where we buy similar things, looking into discount possibilities and so on.

Same same but different

A while ago I wrote a blog post about similarities and differences between party master data quality and product master data quality called Same Same But Different.

Besides having the differences between party master data and product master data I also find we have differences between sell-side and buy-side making it four different but somewhat similar and connected disciplines in master data management and data quality improvement.

Bookmark and Share