Hierarchical Single Source of Truth

Most data quality and master data management gurus, experts and practitioners agree that achieving a “single source of truth” is a nice term, but is not what data quality and master data management is really about as expressed by Michele Goetz in the post Master Data Management Does Not Equal The Single Source Of Truth.

Even among those people, including me, who thinks emphasis on real world alignment could help getting better data and information quality opposite to focusing on fitness for multiple different purposes of use, there is acknowledgement around that there is a “digital distance” between real world aligned data and the real world as explained by Jim Harris in the post Plato’s Data. Also, different public available reference data sources that should reflect the real world for the same entity are often in disagreement.

When working with improvement of data quality in party master data, which is the most frequent and common master data domain with issues, you encounter the same issues over and over again, like:

  • Many organizations have a considerable overlap of real world entities who is a customer and a supplier at the same time. Expanding to other party roles this intersection is even bigger. This calls for a 360° Business Partner View.
  • Most organizations divide activities into business-to-business (B2B) and business-to-consumer (B2C). But the great majority of business’s are small companies where business and private is a mixed case as told in the post So, how about SOHO homes.
  • When doing B2C including membership administration in non-profit you often have a mix of single individuals and households in your core customer database as reported in the post Household Householding.
  • As examined in the post Happy Uniqueness there is a lot of good fit for purpose of use reasons why customer and other party master data entities are deliberately duplicated within different applications.
  • Lately doing social master data management (Social MDM) has emerged as the new leg in mastering data within multi-channel business. Embracing a wealth of digital identities will become yet a challenge in getting a single customer view and reaching for the impossible and not always desirable single source of truth.

A way of getting some kind of structure into this possible, and actually very common, mess is to strive for a hierarchical single source of truth where the concept of a golden record is implemented as a model with golden relations between real world aligned external reference data and internal fit for purpose of use master data.

Right now I’m having an exciting time doing just that as described in the post Doing MDM in the Cloud.

Bookmark and Share

Data that is not aligned with the real world usually provides bad information

The shortcomings of data being fit for some purpose of use compared to data that is aligned with the real world is a repeating topic on this blog latest in the post “Fitness for Use” is Dead.

Today I had a reminder of that when waiting for baggage at Copenhagen Airport.

There is an information screen telling when your baggage will start rolling in. What actually seems to happen is that a fixed time is assigned to every flight and then it starts counting down the minutes. Most baggage then starts rolling in (and this is showed on the screen) before zero minutes is reached. If it, as with my flight, happens that zero minutes is reached without delivery, the information screen shows that the baggage from this flight is delayed – but not how long.

So, the information provided is when you could expect your baggage probably according to some service level goal. OK, fit for that purpose. But in fact that doesn’t help you as a passenger a lot and doesn’t help at all when that goal isn’t reached.

End of rant.

Bookmark and Share

”Fitness for Use” is Dead

The definition of data quality as being ”fitness for use” is challenged. “Real world alignment” or similar expressions are gaining traction.

Back in May Malcolm Chisholm made a tweet about the shortcomings of the “fitness for use” definition reported here on the blog in the post The Problem with Multiple Purposes of Use.

Last week the tweet was elaborated on the Information Management article called Data Quality is Not Fitness for Use. Today Jim Harris has a follow post called Data and its Relationships with Quality.

When working with data quality in the domain with far the most data quality issues being the quality of contact data (customer, supplier, employee and other party master data) I have many times experienced that making data fit for more than a single purpose of use almost always is about better real world alignment. Having data that actually represents what it purports to represent always helps with making data fit for use, even with more than one purpose of use.

In practice that in the contact data realm for example means:

  • Getting a standardized address at contact data entry makes it possible for you to easily link to sources with geo codes, property information and other location data for multiple purposes.
  • Obtaining a company registration number or other legal entity identifier (LEI) at data entry makes it possible to enrich with a wealth of available data held in public and commercial sources making data fit for many use cases.
  • Having a person’s name spelled according to available sources for the country in question helps a lot with typical data quality issues as uniqueness and consistency.

Also, making data real world aligned from the start is a big help when maintaining data as the real world will change over time.

Data quality tools will in my eyes also have to apply to this trend as discussed with Gartner in the post Quality of Data behind the Data Quality Magic Quadrant.

Bookmark and Share

The Problem with Multiple Purposes of Use

Today I noticed this tweet by Malcolm Chisholm:

I agree.

The problem with the “fitness for use” or “fit for the purpose of use” definition of data quality has been a recurring subject on this blog starting with the post Fit for What Purpose? through to lately the post Inaccurately Accurate discussing the data quality of the British electoral roll seen from either a strict electoral point of view and the point of view from external use of the electoral roll.

The problem with “fitness of use” becomes clear when data quality has to be addressed within master data management. Master data has, per definition so to say, many uses.

My thesis is that there is a breakeven point when including more and more purposes where it will be less cumbersome to reflect the real world object rather than trying to align all known purposes.

Today Jim Harris made an (as ever) excellent post related to how data actually represents what it purports to represent – now and tomorrow too. Find the post called Syncing versus Streaming on the Data Roundtable.

Bookmark and Share

Inaccurately Accurate

The public administrative practice for keeping track of the citizens within a country is very different between my former country of living being Denmark and my current country of living being the United Kingdom.

In Denmark there is an all-purpose citizen registry where you are registered “once and for all” seconds after you are born as told in the post Citizen ID within Seconds.

In the United Kingdom there are separate registries for different purposes. For example there is a registry dealing with your health care master data and there is a registry, called the electoral roll, dealing with your master data as a voter.

Today I was reading a recent report about data quality within the British electoral roll. The report is called Great Britain’s electoral registers 2011

The report revolves around the two data quality dimensions: Accuracy and completeness.

In doing so, these two bespoke definitions are used:

There is a note about accuracy saying:


This is a very interesting precision, so to speak. Having fitness for the purpose of use is indeed the most common approach to data quality.

This does of course create issues when such data are used for other purposes. For example credit risk agencies here in the UK use appearance on the electoral roll as a parameter for their assessment of credit risk related to individuals.

Surely, often there isn’t a single source of the truth as pondered in the post The Big ABC of Reference Data.

However, this mustn’t make us stop in the search for getting high quality data. We just have to realize that we may look in different places in order to mash up a best picture of the real world as explained in the post Reference Data at Work in the Cloud.  

Bookmark and Share

Yin and Yang Data Quality

The old Chinese concept of yin and yang, or simply yīnyáng, is used to describe how polar opposites or seemingly contrary forces are interconnected and interdependent in the natural world. The concept is probably best known materialized as sweet and sour sauce.

Lately we had a debate in the data quality community on social media about if data quality is a journey or a destination, nicely summarized by Jim Harris in the post Quo Vadimus. I guess the prevailing sentiment is that it is kind of both a journey and a destination.

We also have the good old question about if data are of high quality if they are “fit for the purpose of use” or “aligned with the real world”. Sometimes these benchmarks go in opposite directions and we like to fulfill both goals at the same time.

The Data Quality discipline is tormented by belonging to both the business side and the technology side of practice. These sides are often regarded as contrary, but in my experience we get the best sauce by having both sides represented.

And oh yes, do we actually have to call it one of two diametrically different terms being Data Quality or Information Quality. Bon appetit.

Bookmark and Share

Fit for repurposing

Reading a blog post by David Loshin called Data Governance and Quality: Data Reuse vs. Data Repurposing I was, perhaps a bit off topic, inspired to pose the question about if data are of high quality if they are:

  • Fit for the purpose of use
  • Fit for repurposing

The first definition has been around for many years and has been adapted by many data quality practitioners. I have however often encountered situations where the reuse of data for other purposes than the original purpose has raised data quality issues with else cleared data. One of my first pieces on my own blog discussed that challenge in a post called Fit for what purpose?

Not at least within master data management where data are maintained for multiple uses, this problem is very common.

Data in a master data hub may either:

  • Be entered directly into the hub where multiple uses is handled
  • Be loaded from other sources where data capture was done

In the latter case the data governance necessary to ensure fitness for multiple uses must stretch to the ingestion in these sources.

Now, if repurposing is seen as a future not yet discovered purpose of use, what can you then do to ensure that data today are fit for future repurposing?

The only answer is probably real world alignment as discussed here on a page called Data Quality 3.0. Make sure your data are reflecting the real world as close as we can when captured and make sure data can be maintained in order to keep that alignment. And make sure this is done and facilitated where data are entered.

Bookmark and Share

Single Customer Hierarchy View

One of the things I do over and over again as part of my work is data matching.

There is a clear tendency that the goal of the data matching efforts increasingly is a master data consolidation taking place before the launch of a master data management (MDM) solution. Such a goal makes the data matching requirements considerably more complex than if the goal is a one-shot deduplication before a direct marketing campaign.

Hierarchy Management

In the post Fuzzy Hierarchy Management I described how requirements for multiple purposes of use of customer master data makes the terms false positive and false negative fuzzy.

As I like to think of a customer as a party role there are essentially two kinds of hierarchies to be aware of:

  • The hierarchies the involved party is belonging to in the real world. This is for example an individual person seen as belonging to a household or a company belonging at a place in a company family tree.
  • The hierarchies of customer roles as seen in different business functions and by different departments. For example two billing entities may belong to the same account in a CRM system in one example, but in another example two CRM accounts have the same billing entity. 

The first type of hierarchy shouldn’t be seen differently between enterprises. You should reach the very same result in data matching regardless of what your organization is doing. It may however be true that your business rules and the regularity requirements applying to your industry and geography may narrow down the need for exploration.

In the latter case we must of course examine the purpose of use for the customer master data within the organization.

Single Customer View

It is in my experience much easier to solve the second case when the first case is solved. This approach was evaluated in the post Lean MDM.

The same approach also applies to continuous data quality prevention as part of a MDM solution. Aligning with the real world and it’s hierarchies as part of the data capture makes solving the customer roles as seen in different business functions and by different departments much easier.  The benefits of doing this is explained in the post instant Data Quality.

It is often said that a “single customer view” is an illusion. I guess it is. First of all the term “single customer view” is a vision, but a vision worth striving at. Secondly customers come in hierarchies. Managing and reflecting these hierarchies is a very important aspect of master data management. Therefore a “single customer view” often ends up as having a “single customer hierarchy view”.    

Bookmark and Share

Party On

The most frequent data domain addressed in data quality improvement and master data management is parties.

Some of the issues related to parties that keeps on creating difficulties are:

  • Party roles
  • International diversity
  • Real world alignment

Party roles

Party data management is often coined as customer data management or customer data integration (CDI).

Indeed, customers are the lifeblood of any enterprise – also if we refer to those who benefit from our services as citizens, patients, clients or whatever term in use in different industries.

But the full information chain within any organization also includes many other party roles as explained in the post 360° Business Partner View. Some parties are suppliers, channel partners and employees. Some parties play more than one role at the same time.

The classic question “what is a customer?” is of course important to be answered in your master data management and data quality journey. But in my eyes there is lot of things to be solved in party data management that don’t need to wait for the answer to that question which anyway won’t be as simple as cutting the Gordian Knot as said in the post Where is the Business.

International diversity

As discussed in the post The Tower of Babel more and more organizations are met with multi-cultural issues in data quality improvement within party data management.

Whether and when an organization has to deal with international issues is of course dependent on whether and in what degree that organization is domestic or active internationally. Even though in some countries like Switzerland and Belgium having several official languages the multi-cultural topic is mandatory. Typically in large countries companies grows big before looking abroad while in smaller countries, like my home country Denmark, even many fairly small companies must address international issues with data quality.

However, as Karen Lopez recently pondered in the post Data Quality in The Wild, Some Where …, actually everyone, even in the United States, has some international data somewhere looking very strange if not addressed properly.

Real world alignment

I often say that real world alignment, sometimes as opposed to the common definition of data quality as being fit for purpose, is the short cut to getting data quality right related to party master data.

It is however not a straight forward short cut. There are multiple challenges connected with getting your business-to-business (B2B) records aligned with the real world as discussed in the post Single Company View.  When it comes to business-to-consumer (B2C) or government-to-citizen (G2C) I think the dear people who sometimes comments on this blog did a fine job on balancing mutating tables and intelligent design in the post Create Table Homo_Sapiens.

Bookmark and Share

When a Cloudburst Hit

Some days ago Copenhagen was hit by the most powerful cloudburst ever measured here.

More powerful cloudbursts may be usual in warmer regions on the earth, but this one was very unusual at 55 degrees north.

Fortunately there was only material damage, but the material damage was very extensive. When you take a closer look you may divide the underground constructions into two categories.

The first category is facilities constructed with the immediate purpose of use in mind. Many of these facilities are still out of operation.

The second category is facilities constructed with the immediate purpose of use in mind but also designed to resist heavy pouring rain. These facilities kept working during the cloudburst. One example is the metro. If the metro was constructed for only the immediate purpose of use, being circling trains below ground, it would have been flooded within minutes, with the risk of lost lives and a standstill for months.

We have the same situation in data management. Things may seem just fine if data are fit for the immediate purpose of use. But when a sudden change in conditions hit, then you know about data quality.

Bookmark and Share