B2C versus B2B Data Quality

The data quality issues in doing business with private consumers (business-to-consumer = B2C) and doing business with other business’s (business-to-business = B2B) have a lot of similar challenges but also differs in a lot of ways.

Some of my experiences (and thoughts) related to different master data domains are:

Customer master data

In B2C the number of customers, prospects and leads is usually high and characterized by relatively few interactions with each entity.  In B2B you usually have a relatively small number of customers with a high number of interactions.

One of the most automated activities in data quality improvement is matching master data records with information about customers. Many of the examples we see in marketing material, research documents, blog posts and so on is about matching in the B2C realm. This is natural since the high number of records typically with a low attached value calls for automation.

Data matching in the B2B realm is indeed more complex due to numerous challenges like less standardized names of companies and typically more options in what constitutes a single customer. The high value attached to each customer also makes the risk of mistakes a showstopper for too much automation.

So in B2B we see an increasing adaption of creating workflows that insures data quality during data capture often by exploiting external reference data which also in general are more available related to business entities.

Location master data

The location of B2C customers means a lot. Accurate and timely delivery addresses for everything from direct mails to bringing goods to the premises are essential. Location data are used to recognize household relations, assigning demographic stereotypes and in many cases calculating fees of different kind. I had a near disaster experience with a really bad address in my early career.

Even though location data for B2B activities theoretically is just as important, I have often seen that a little less precision is fit for purpose or anyway lower prioritized than more pressing issues.

Product master data

Theoretically there should be no difference between B2C and B2B here, but I guess there is in practice?

The most interesting aspect is probably the multi-domain aspect examining the relations between customers and products.   

I had some experiences some years ago with the B2B realm as described in the post What is Multi-Domain MDM?: 1,000 B2B customers buying 1,000 different finished products can be a quite complicated data quality operation.

Within the B2C realm the most predominant multi-domain data quality issues I have met is related to analytics. As discussed in the post Customer/Product Matrix Management it is about typifying your customers correctly and categorizing your products adequately at the same time.

Bookmark and Share

Holistic Accuracy

In community economics you have two terms called

  • Partitive accuracy and
  • Holistic accuracy

In short, partitive accuracy is the accuracy of a single measure being part of a model while holistic accuracy is the accuracy of the model structure and its use. More information here.

I find these terms being very useful in data quality and master data management as well.

The distinction between partitive accuracy and holistic accuracy resembles the distinction between data quality and information quality.

One problem with the term information quality is that it implies a certain context of use, which makes it hard to prepare data for having high data quality for multiple uses other than assuring the accuracy of the single data elements – being similar to the term partitive accuracy.

One clue for assuring better information quality is looking at the model structure of data – being similar to the term holistic accuracy. Here I am thinking beyond traditional data modeling, which is anchored in the technical world, and into how end users of master data hubs are able to build structures of data (with partitive accuracy) that fits the daily business use.

Examples of such holistic information capabilities in master data management will be building flexible product hierarchies and hierarchies of party master data that at the same time reflects hierarchies in the real world as households and company family trees and hierarchies of related accounts and addresses used within the enterprise.

While a single data element as an address component like a postal code may be partitive accurate, the holistic accuracy is seen as how data elements contribute to a holistic accuracy as a part of a data structure that fits multiple purposes of use.

Bookmark and Share

Pick Any Two

The project triangle expresses the dilemma about that you probably want your project to be good, fast and cheap, but in practice you are only able to prioritize two of these three desirable options, in short:

Good, fast, cheap – pick any two

The pick any two among three theme can be related to a lot of other activities thus stating three terms with only two combinations possible in real life.

So what could be the pick any two among three themes for data quality?

Of course the good, fast, cheap dilemma also goes for data quality projects. But as data quality management isn’t just a project but an ongoing program, what else?

I have one suggestion:

Fit for purpose, real world alignment, fix it as we go – pick any two

The term “fit for purpose” has become more or less synonymous with “high quality data” and thus here chosen to express the good angle of data quality.

Some data, especially those we call master data, is used for multiple purposes within an organization. Therefore some kind of real world alignment is often used as a fast track to improving data quality where you don’t spend time analyzing how data may fit multiple purposes at the same time in your organization. Real world alignment also may fulfill future requirements regardless of the current purposes of use.

Managing data both being fit for multiple purpose and aligned with the real world is not something you just do in a cheap way by fixing it as we go. You may pick any two options in these combinations:

  • Make some data fit for purpose by fixing it as the pains shows up.
  • Align data with the real world typically by exploiting external reference data as the prices go down.
  • Lay out a thorough plan for having fit for multiple-purpose data aligned with the real world.

Bookmark and Share

Miracle Food for Thought

We all know the headlines in the media about food and drink and your health. One day something is healthy, the next day it will kill you. You are struck with horror when you learn that even a single drop of alcohol will harm your body until you are relieved by the wise words saying that a glass (or two) of red wine a day keeps the doctor away.

These misleading, exaggerated and contradictory headlines are now documented in a report called Miracle Food, Myth and the Media.

It’s the same with data quality, isn’t it?

Sometimes some data are fit for purpose. At another time at another place the very same data are rubbish.

As said as an excerpt from the Miracle Food report:

“The facts about the latest dietary discoveries are rarely as simple as the headlines imply. Accurately testing how any one element of our diet may affect our health is fiendishly difficult. And this means scientists’ conclusions, and media reports of them, should routinely be taken with a pinch of salt.”

It’s about the same with data quality, isn’t it?

Accurately testing how any one element of our data may affect our business is fiendishly difficult. So predictions of return of investment (ROI) from data quality improvement are unfortunately routinely taken with a big spoon of salt.

Bon appétit.

Bookmark and Share

Customer Product Matrix Management

A customer/product matrix is a way of describing the relationships between customer types and product types/attributes.  


Note: Please find some data quality related product descriptions in the post Data Quality and World Food.

Filling out the matrix may be based on prejudices, gut feelings, assumptions, surveys, focus groups or data.

If we go for data we may do this by collecting available historical data related to sales and inquiries made by persons belonging to each customer type regarding products belonging to each product type.  

In doing that correctly we need two kinds of master data management and data quality assurance in place:

  • Customer Data Integration (CDI) for assigning the accurate customer type in the real world related to the uniquely identified person in transactions coming from all sources – here based on location master data.
  • Product Information Management (PIM) for categorizing the relevant fit for purpose product type.

This reminds me about multi-domain master data management. Customer master data (or shall we say party master data), product master data and location master data used to figure out how to do business. I like it – both the master data management part and the mentioned product types.  

Bookmark and Share

Fitness Data

About a month ago I wrote about how my personal data was on-boarded in the local fitness club in the post called Right the First Time.

Since then I have actually succeeded in visiting the gym twice a week and used the amazing technology necessary to get me in action.

As a complete data geek I of course use the full TV screen on the machine not to watch TV but to display the full dashboard with key performance indicators related to my workout. These include:

  • Time done / remaining
  • Pulse with red alert when I’m over the healthy threshold for my age
  • Distance I would have gone if I wasn’t in the same fixed position
  • Calories burned

As with many data presentations we here have a mix of hard facts, like the time done, and then some assumed figures like calories burned. The machine doesn’t really measure the actual accurate burning but calculates the assumed burning as a function of power level, speed, my weight and age.  

It’s actually a question if I really want to know about the calories burned. My conclusion is yes. The time done is wasted anyway, the high pulse doesn’t last and the distance is virtual. So the calories burned fit the purpose of use. It keeps me going.   

Bookmark and Share

Survival of the Fit Enough

When working with data quality and master data management at the same time you are constantly met with the challenge that data quality is most often defined as data being fit for the purpose of use, but master data management is about using the same data for multiple purposes at the same time.

Finding the right solution to such a challenge within an organization isn’t easy, because it despite all good intentions is difficult to find someone in the business with an overall answer to that kind of problems as explained in the blog post by David Loshin called Communications Gap? Or is there a Gap between Chasms?

An often used principle for overcoming these issues may (based on Darwin) be seen as “survival of the fittest”. You negotiate some survivorship rules between “competing” data providers and consumers and then the data being the fittest measured by these rules wins. All other data gets the KISS of death. Most such survivorship rules are indeed simple often based on a single dimension as timeliness, completeness or provenance.

Recently the phrase “survival of the fittest” in evolution theory has been suggested to be changed to “survival of the fit enough” because it seems that many times specimens haven’t competed but instead found a way into empty alternate spaces.

It seems that master data management and related data quality is going that way too. Data that is fit enough will survive in the master data hub in alternate spaces where the single source of truth exists in perfect symbioses with multiple realities.

Bookmark and Share

Right the First Time

Since I have just relocated (and we have just passed the new year resolution point) I have become a member of the nearby fitness club.

Guess what: They got my name, address and birthday absolutely right the first time.

Now, this could have been because the young lady at the counter is a magnificent data entry person. But I think that her main competency actually rightfully is being a splendid fitness instructor.

What she did was that she asked for my citizen ID card and took the data from there. A little less privacy yes, but surely a lot better for data quality – or data fitness (credit Frank Harland) you might say.

Bookmark and Share

Superb Bad Data

When working with data and information quality we often use words as rubbish, poor, bad and other negative words when describing data that need to be enhanced in order to achieve better data quality. However, what is bad may have been good in the context where a particular set of data originated.

Right now I have some fun with author names.

An example of good and bad could be with an author I have used several times on this blog, namely the late fairy tale writer called in full name:

Hans Christian Andersen

When gazing through data you will meet his name represented this way:

Andersen, Hans Christian

This representation is fit for purpose of use for example when looking for a book by this author at a library, where you sort the fictional books by the surname of the author.

The question is then: Do you want to have the one representation, the other representation or both?

You may also meet his name in another form in another field than the name field. For example there is a main street in Copenhagen called:

H. C. Andersens Boulevard

This is the representation of the real world name of the street holding a common form of the authors name with only initials.

Bookmark and Share

Top 5 Reasons for Downstream Cleansing

I guess every data and information quality professional agrees that when fighting bad data upstream prevention is better than downstream cleansing.

Nevertheless most work in fighting bad data quality is done as downstream cleansing and not at least the deployment of data quality tools is made downstream were tools outperforms manual work in heavy duty data profiling and data matching as explained in the post Data Quality Tools Revealed.

In my experience the top 5 reasons for doing downstream cleansing are:

1) Upstream prevention wasn’t done

This is an obvious one. At the time you decide to do something about bad data quality the right way by finding the root causes, improving business processes, affect people’s attitude, building a data quality firewall and all that jazz you have to do something about the bad data already in the databases.

2) New purposes show up

Data quality is said to be about data being fit for purpose and meeting the business requirements. But new purposes will show up and new requirements have to be met in an ever changing business environment.  Therefore you will have to deal with Unpredictable Inaccuracy.

3) Dealing with external born data

Upstream isn’t necessary in your company as data in many cases is entered Outside Your Jurisdiction.

4) A merger/acquisition strikes

When data from two organizations having had different requirements and data governance maturity is to be merged something has to be done.  Some of the challenges are explained in the post Merging Customer Master Data.

5) Migration happens

Moving data from an old system to a new system is a good chance to do something about poor data quality and start all over the right way and oftentimes you even can’t migrate some data without improving the data quality. You only have to figure out when to cleanse in data migration.

Bookmark and Share