My 2011 To Do List

These days are classic times for predicting something about next year in a blog post. This year I will make some egocentric predictions about what I am going to do next year. Fortunately I think these activities are pretty representative for the trends in the data quality realm.

My three most important challenges in working with data and information quality improvement and master data management will be:

Multi-Domain Master Data Quality

There are some different disciplines and product offerings around as:

  • Data Quality tools
  • Customer Data Integration (CDI) solutions
  • Product Information Management (PIM) platforms

These disciplines and the related software packages used to solve the challenges are constantly maturing and expanded to embrace the problems as a whole.

Find more about the subject in my posts on Multi-Domain MDM.

Exploiting rich external reference data sources in the cloud

Working with external reference sources as a mean to improve data quality has been a focus area of mine for many years.

Recent developments in governments releasing rich sources of data will help with availability here, but new challenges will also arise, like working with conformity across data sources coming from many different countries in many different ways.

Much of the activity here will happen in the cloud.

See my take on the subject on the page Data Quality 3.0 and read about a concrete implementation in instant Data Quality.

Downstream data cleansing

Despite constant improvements with data quality tools and master data management solutions moving us from batch cleansing downstream to upstream prevention there will still be lots of reasons for doing downstream cleansing projects.

Here are the top 5 reasons.

I expect to be involved in at least one of each type next year.

Bookmark and Share

Sell–side vs Buy-side Master Data Quality

The two most prominent domains in master data management and related data quality improvement are:

  • Party master data and
  • Product master data

Party Master Data

Most of the talk about party master data is about customer master data (including prospect master data). This discipline is often called Customer Data Integration (CDI).  Customer data is the sell-side of party master data. The organizations with the biggest pains in this area are mostly organizations with many customers (and prospects). The largest volumes of customer data is related to business-to-consumer (B2C) activities, but certainly we also see many grown customer databases in the business-to-business (B2B) realm.

The buy-side of party master data is supplier data. Fewer organizations have grown supplier databases, but surely big firms with many different departments and subsidiaries have supplier master data issues like the ones we see on the sell-side.

Also many organizations have a surprisingly large intersection of the same parties being both on the sell-side and on the buy-side. I have touched that subject in the post: 360° Business Partner View.

Product Master Data

Product Information Management (PIM) also has a sell-side and a buy-side. Also here the pains grow with the numbers. Opposite to party master data high sell-side numbers is more seldom than high buy-side numbers with product master data.

We often see high sell-side number of products at retailers where the same product also is buy-side at the same time, but where we maybe haven’t the same requirements for entity resolution at the same time. Most organizations don’t have that big issues (like problems with uniqueness) with own produced products.

Else high number of buy-side products is not so much related to buying raw materials as it is to buying things as spare parts and all kind of small equipment and assets of different kind (with software licenses being most close to herding cats I guess).

Multi-Domain Master Data Management

With multi-domain master data management there is of course a connection between sell-side party master data and sell-side product master data with opportunities in analyzing to whom we sell what and discovering cross selling openings and so on.

On the buy-side there are great potentials in looking into from where we buy similar things, looking into discount possibilities and so on.

Same same but different

A while ago I wrote a blog post about similarities and differences between party master data quality and product master data quality called Same Same But Different.

Besides having the differences between party master data and product master data I also find we have differences between sell-side and buy-side making it four different but somewhat similar and connected disciplines in master data management and data quality improvement.

Bookmark and Share

Storing a Single Version of the Truth

An ever recurring subject in the data quality and master data management (MDM) realms is whether we can establish a single version of the truth.

The most prominent example is whether an enterprise can implement and maintain a single version of the truth about business partners being customers, prospects, suppliers and so on.

In the quest for establishing that (fully reachable or not) single version of the truth we use identity resolution techniques as data matching and we are exploiting ever increasing sources of external reference data.

However I am often met with the challenge that despite what is possible in aiming for that (fully reachable or not) single version of the truth, I am often limited by the practical possibilities for storing it.

In storing party master data (and other kind of data) we may consider these three different ways:

Flat files

This “Keep It Simple, Stupid” way of storing data has been on an ongoing retreat – however still common, as well as new inventions of big flat file structures of data are emerging.

Also many external sources of reference data is still flat file like and the overwhelming choice of exchanging reference and master data is doing it by flat files.

Despite lots of work around solutions for storing the complex links of the real world in flat files we basically ends up with using very simplified representations of the real world (and the truth derived) in those flat files.  

Relational databases

Most Customer Relationship Management (CRM) systems are based on a relational data model, however mostly quite basic regarding master data structures making it not straight forward to reflect the most common hierarchical structures of the real world as company family trees, contacts working for several accounts and individuals forming a household.  

Master Data Management hubs are of course built for storing exactly these hierarchical kinds of structures. Common challenges here are that there often is no point in doing that as long as the surrounding applications can’t follow and that you often may restrict your use to a simplified model anyway like an industry model.   

Neural networks

The relations between parties in the real world are in fact not truly hierarchical. That is why we look into the inspiration from the network of biological neurons.

Doing that has been an option I have heard about for many years but still waits to meet as a concrete choice when delivering a single version of the truth.   

Bookmark and Share

Entity Revolution vs Entity Evolution

Entity resolution is the discipline of uniquely identifying your master data records, typically being those holding data about customers, products and locations. Entity resolution is closely related to the concept of a single version of the truth.

Questions to be asked during entity resolution are like these ones:

  • Is a given customer master data record representing a real world person or organization?
  • Is a person acting as a private customer and a small business owner going to be seen as the same?
  • Is a product coming from supplier A going to identified as the same as the same product coming from supplier B?
  • Is the geocode for the center of a parcel the same place as the geocode of where the parcel is bordering a public road?

We may come a long way in automating entity resolution by using advanced data matching and exploiting rich sources of external reference data and we may be able to handle the complex structures of the real world by using sophisticated hierarchy management and hereby make an entity revolution in our databases.

But I am often faced with the fact that most organizations don’t want an entity revolution. There are always plenty of good reasons why different frequent business processes don’t require full entity resolution and will only be complicated by having it (unless drastic reengineered). The tangible immediate negative business impact of an entity revolution trumps the softer positive improvement in business insight from such a revolution.

Therefore we are mostly making entity evolutions balancing the current business requirements with the distant ideal of a single version of the truth.

Bookmark and Share

Bilateral Master Data Management

There is an issue I have come over and over again when creating a master data hub, making a golden copy, establishing a single version of the truth or whatever we like the name to be. The issue is about the scope of data sources.

Basically you take (practically) all the master data sources from within your organization and consolidate these data. Often you match with external sources as business directories and so. But what you often miss is the master data operated by your partners. These are partners like:

  • Your suppliers of products, be that raw materials or finished products for resale
  • Your sales agents and distributors
  • Your service providers as direct marketing agencies and factoring partners

These partners are part of your business processes and they often create and consume master data which are only shared with you in a limited way via some form of interface.

I know that even handling master data from within most organizations is a complex issue. Integrating with external reference data doesn’t add simplicity. But without embracing the master data life at your partners, the hub isn’t complete; the copy is only made of plated gold and the single version of the truth isn’t the only truth.

My guess is that many master data programs in the future will extend to embrace internal (private) data, as well as external (public) data and bilateral data as described on the page about Data Quality 3.0.

Bookmark and Share

Golden Copy Musings

In a recent blog post by Jim Harris called Data Quality is not an Act, it is a Habit the term “golden mean” was mentioned.   

As I commented, mentioning the “golden mean” made me think about the terms “golden copy” and “golden record” which are often used terms in data quality improvement and master data management.

In using these terms I think we mostly are aiming on achieving extreme uniqueness. But we should rather go for symmetry, proportion, and harmony.

The golden copy subject is very timely for me as I this weekend is overseeing the execution of the automated processes that create a baseline for a golden copy of party master data at a franchise operator for a major brand in car rental.

In car rental you are dealing with many different party types. You have companies as customers and prospects and you have individuals being contacts at the companies, employees using the cars rented by the companies and individuals being private renters. A real world person may have several of these roles. Besides that we have cases of mixed identities.

During a series of workshops we have worked with defining the rules for merge and survivorship in the golden copy. Though we may be able to go for extreme uniqueness in identifying real world companies and persons this may not necessary serve the business needs and, like it or not, be capable of being related back into the core systems used in daily business.

Therefore this golden copy is based on a beautiful golden mean exposing symmetry, proportion, and harmony.

Bookmark and Share

Magic Quadrant Diversity

The Magic Quadrants from Gartner Inc. ranks the tool vendors within a lot of different IT disciplines. Related to my work the quadrants for data quality tools and master data management is the most interesting ones.

However, the quadrants examine the vendors in a global scope. But, how are the vendors doing in my country?

I tried to look up a few of the vendors in a local business directory for Denmark provided (free to use on the web) by the local Experian branch.

DataFlux

First up is DataFlux, the (according to Gartner) leading data quality tool vendor.

Result: No hits.

Knowing that DataFlux is owned by SAS Institute will however, with a bit of patience, finally bring you to information about the DataFlux product deep down on the SAS local website.

PS: Though SAS is more known here as the main airline (Scandinavian Airlines System), SAS Institute is actually very successful in Denmark having a much larger part of the Business Intelligence market here than most places else.

Informatica

Next up is Informatica, a well positioned company in both the quadrant for data quality tools and customer master data management.

Result: No Hits.

Here you have to know that Informatica is represented in the Nordic area by a company called Affecto. You will find information about the Informatica products deep down on the Affecto website – along with the competing product FirstLogic owned by Business Objects (owned by SAP) also historically represented by Affecto.

Stibo Systems

Stibo Systems may not be as well known as the two above, but is tailing the mega vendors in the quadrant for Product Master Data Management, as mentioned recently in a blog post by Dan Power.

Result: Hit:

They are here with over 500 employees – at least in the legal entity called Stibo where Stibo Systems is an alternate name and brand. And it’s no kidding; I visited them last month at the impressive head quarter near Århus (the second largest city in Denmark).

Bookmark and Share

Big Trouble with Big Names

An often seen issue in party master data management is handling information about your most active customers, suppliers and other roles of interest. These are often big companies with many faces.

I remember meeting that problem way back in the 80’s when I was designing a solution for the Danish Maritime Authorities.  

In relation to a ship there are three different main roles:

  • The owner of the ship, who has some legal rights and obligations
  • The operator of ship, who has responsibilities regarding the seaworthiness of the ship
  • The employer, who has responsibilities regarding the seamen onboard the ship

Sometimes these roles don’t belong to the same company (or person) for a given ship. That real world reality was modeled all right. But even if it practically is the same company, then the roles are materialized very different for each role. I remember this was certainly the case with the biggest ship-owner in Denmark (and also by far the biggest company in Denmark) being the A.P. Moller – Maersk Group.

We really didn’t make a golden record for that golden company in my time on the project.

Bookmark and Share

Business Directory Match: Global versus Local

When doing data quality improvement in business-to-business party master data an often used shortcut is matching your portfolio of business customers with a business directory and preferably picking new customers from the directory in the future.

If you are doing business in more than one country you will have some considerations about what business directory to use like engaging with a local business directory for each country or engaging with a single business directory covering all countries in question.

There are pro’s and con’s.

One subject is conformity. I have met this issue a couple of times. A business directory covering many countries will have a standardized way of formatting the different elements like a postal address, whereas a local (national) business directory will use best practice for the particular country.

An example from my home country Denmark:

The Dun & Bradstreet WorldBase is a business directory holding 170 million business entities from all over the world. A Danish street address is formatted like this:

Address Line 1 = Hovedgaden 12 A, 4. th

Observe that Denmark belongs to that half of the earth where house numbers are written after the street name.

In a local business directory (based on the public registry) you will be able to get this format:

Street name = Hovedgaden
Street code = 202 4321
House number = 012A
Floor = 04
Side/door = TH

Here you get an atomized address with metadata for the atomized elements and the unique address coding used in Denmark.

Bookmark and Share

What is Multi-Domain MDM?

Doing master data management with several different entity types is most often seen as the federated discipline of handling Customer Data Integration (CDI) and Product Information Management (PIM) with the same software brand.

And sure, doing this (including making that software) is a challenge as there are basic differences between the two disciplines as discussed in the post Same Same But Different.

But doing both well at the same time is only a starting point. Making business value from the intersection between the two disciplines is the real challenge.

I learned that 20 years ago when I started a new client relationship (which also was before MDM, CDI and PIM was household TLA’s).

The client’s head quarter was in the southern outskirts of Copenhagen, so on a good summer day I could go there on my bike. They imported else wasted peels from oranges grown in the endless South American citrus plantations to be used for our morning juice and else useless seaweed harvested in the hot waters around the countless Philippine islands.

Along with a few other raw materials the peels and seaweed were made into approximately a hundred different semi-finished products. Based on customer orders these were blended into not much more than a thousand different defined finished products being valuable ingredients for food and pharmaceutical production.

The number of different customers was also modest, as I remember not much more than a thousand different worldwide customers.

So, managing 1,000 different customers buying 1,000 different products shouldn’t be much of a MDM case. Of course customer data management with global diverse entities had its challenges and not at least product information handling with rising regulatory demands in the food and pharmaceutical segment wasn’t a walk over either.

But some big hurdles were sure in the intersection between customer master data and product master data and solving the issues did almost always involve data quality related to core transactions referencing the entities described in the master data.

Bookmark and Share