Unmaintainability

Following up on my post about word quality and inspired by a blog post by Joyce Norris-Montanari called “Things That Don’t Work So Well – Doing Analytics Before Their Time” in which the word “unmaintainable” is used I want to challenge my English spell checker even further with the rare and apparently not really existing word but frequent issue of unmaintainability.

I have previously on this blog pondered that you can’t expect that because you get it Right the First Time then everything will be just fine from this day forward. Things change.

This argument is about the data as plain data.

But there is also a maintainability (this is apparently a real word) issue around how we store data. I have many times conducted data quality exercises as deduplication and matching with and enriching from external reference data in order to reach a single version of the truth as far as it goes.

An often encountered problem is that this kind of data processing can get us somewhere close to a single version of the truth. But then there is a huge obstacle: You can’t get these great results back to the daily databases without destroying some of the correctness because the data structures don’t allow you to do that.

Such kind of unmaintainability is in my eyes a good argument for looking into master data management platforms that allows you to maintain your master data in the complexity that supports the business rules that make your company more competitive.

Bookmark and Share

The 20 Million Rupees Question

Here we go again. The same old question: “What is the definition of customer?”  Latest Informatica (a data quality, master data management and data integration firm) has hired David Loshin to find out – started in the blog post The Most Dangerous Question to Ask Data Professionals.

Shortly, my take is that this question in practice has two major implications for data quality and master data management but in theory, it should only have one:

  • The first one is real world alignment. In theory real world alignment is independent of the definition of a customer as it is about the party behind the customer.
  • The second is party roles. It’s actually here we can have an endless discussion.

In practice we of course mix things up as discussed in the post Entity Revolution vs Entity Evolution.

And Now for Something Completely Different

Instead of saying that “What is the definition of customer?”  is the million dollar question it’s probably more like the 20 million rupees question as most data management these days are taking place in India.

The amount of money involved is taken from the film Slumdog Millionaire where 20 million rupees is the top prize in the local “Who Wants to Be a Millionaire?” (Kaun Banega Crorepati), which by the way has the same jingle and graphics as all over the world.

And oh, how much is 20 million rupees? It’s near ½ million US dollars or 300.000 euro (with a dot as thousand separator). But a lot in buying power for a local customer. Exactly 2 crores (2,00,00,000 rupees).  

Party on.

Bookmark and Share

Five Moments of Truth

Within Customer Relationship Management (CRM) and related Master Data Management (MDM) the party behind the business-to-business (B2B) customer is an important entity.

It is often said that the data capture is the most important moment where it is essential to get data quality right. However with a complex entity as a B2B customer, there are of course several moments of truth within the life circle for such an entity.     

These are probably the five most important ones:

  • A lead is born
  • Engaging a prospect
  • One more customer
  • Churn happens
  • Win-Back happiness

A lead is born

Leads are born in many different ways: A business card obtained from a little chit-chat on a conference, buying a list of leads or even an engagement in social media as the new way of doing things.

One of the most important things to do when capturing the data at this point is ensuring if you already have the party somewhere in the customer life circle or maybe even in other party roles as examined in the post 360° Business Partner View.

Engaging a prospect

When a lead is qualified as a new prospect and you typically engage in a one-to-one dialogue this process includes capturing more data.

Such new data may include adding a visit address to the first captured mail address or vice versa and expanding the firmographic collection of data.  

As explained in the post What are they doing? there are a lot of data quality issues in capturing such data as:

  • Unstructured versus structured data
  • Internal versus external reference data
  • One versus several values

One more customer

After a successful sales process a new customer can be added to the customer list often with more data being captured as adding a billing address and stating credit risk as credit limit and terms of payment.

This is the point where many party entities are split into data silos. Maybe the current customer master data lives on in the CRM system while new customer data are reentered and enriched in an ERP system and even other business applications.

Keeping these data silos aligned is the classic customer master data challenge as discussed in the post Boiling Data Silos.

Churn happens

There are actually two kind of churns (loss of customers):

  • A customer stops a subscription, a service contract or tell you that further buying will be at your competitors or that there is no further need for the products and services in question
  • A customer dissolves

Sometimes you don’t even discover the latter one. So your data isn’t very useful or valuable if you don’t practice Ongoing Data Maintenance.

Win-Back happiness

In the first kind of churn you may work hard (or be lucky) and win back the customer.

Be sure to build on the data from the first engagement and not start from scratch again capturing master data and history. Avoiding this covers up for some of the 55 reasons to improve data quality related to party master data uniqueness.

Bookmark and Share

How long is a Marathon?

Many large cities around the world have a yearly marathon event. Today it’s Copenhagen (and possibly other cities too).

The marathon distance today is 42,195 kilometers (if I use comma as decimal point) which resembles 26 miles and 385 yards or 26.22 miles (if I use a dot as decimal point).

So even if we today agree about the distance we might represent that distance in various ways. The distance has however varied during history as seen in the table with the length of the Olympic marathons.

What about real world alignment?

Well, if the Greek runner called Pheidippides (sometimes spelled Phidippides or Philippides) took the long but flat Southern route from Marathon to Athens it would have been around 42 kilometers. If he took the shorter but steeper Northern route it would only have been around 35 kilometers.

What about me? Oh, I’ll go for 42,195 kilometers – on the bike.   

Bookmark and Share

Notes about the North Pole

This is the seventh post in a series of short blog posts focusing on data quality related to different countries around the world. However, today we will be at a place not belonging to any country (so far) and only reachable on foot because it is in the middle of an ocean covered by ice (so far).

Who lives on the North Pole?

Obviously no one – except of course that according to tradition in some Western countries the North Pole is described as the residence of Santa Claus. Actually the Canada Post as assigned the postal code “H0H 0H0” to the North Pole. So it’s a good data quality question if “H0H 0H0” is a valid Canadian postal code.

Also Santa Claus may have several other residences, as the Finnish claims the correct address is “Santa Claus Village, FIN-96930 Arctic Circle, Finland” and in Denmark we believe the correct address of Santa Claus to be “Box 1615, DK-3900 Nuuk, Greenland”.

If you are interested in identity resolution covering multiple countries, there is a discussion going on in the LinkedIn Data Matching Group.

Where is the North Pole?

The latitude is 90° – but there is no longitude. So if you don’t accept null in the longitude attribute of your geocodes you might get a data quality issue when Santa Claus becomes a customer and you believe the Canada Post is the only single version of the truth.

Previous Data Quality World Tour blog posts:

Quotes not originally about Data Quality

Yesterday I was looking for some quotations for a data quality presentation.

I stumbled upon these ones by Niels Bohr:

An expert is a person who has made all the mistakes which can be made in a very narrow field

I found that this quote is most often used this way:

“An expert is a man who has made all the mistakes which can be made in a very narrow field”.

I am pretty sure Bohr said person – not man. There are just as many female experts as male experts around.

And indeed: Learning from mistakes is the path to expertise in data quality.

There are two sorts of truth: Trivialities, where opposites are obviously absurd and profound truths, recognized by the fact that the opposite is also a profound truth

Bohr was into quantum mechanics. I think data quality is very much like quantum mechanics. Sometimes there is a simple single version of the truth; sometimes there are several great versions of a complex truth.

Anyone who is not shocked by quantum theory has not understood it

Anyone who is not shocked by the actual quality of data has probably not measured it (yet).

Bookmark and Share

A Business Rule and a Missing Master Data Hub

It seems that the United States of America has a problem with the business rule saying you have to be born in the country to become president and a missing citizen master data hub telling about who’s born in the country.

This is an aspect of a previous blog post called Did They Put a Man on the Moon.

Bookmark and Share

Does One Size Fit Anyone?

Following up on a recent post about data silos I have been thinking (and remembering) a bit about the idea that one company can have all master data stored in a single master data hub.

Supply Chain Musings

If you for example look at a manufacturer the procurement of raw materials is of course an important business process.

Besides purchasing raw materials the manufacturer also buys machinery, spare parts for the machinery and maintenance services for the machinery.

Like everyone else the manufacturer also buys office supplies – including rare stuff as data quality tools and master data management consultancy.

If you look at the vendor table in such a company the number of “supporting suppliers” are much higher than the number of the essential suppliers of raw materials. The business processes, data structures and data quality metrics for on-boarding and maintaining supplier data and product data are “same same but very different” for these groups of suppliers and the product data involved.

Supply Chain Centric Selling

I remember at one client in manufacturing a bi-function in procurement was selling bi-products from the production to a completely different audience than the customers for the finished products. They had a wonderful multi-domain data silo for that.

Hierarchical Customer Relations

A manufacturer may have a golden business rule saying that all sales of finished products go through channel partners. That will typically mean a modest number of customers in the basic definition being someone who pays you. Here you typically need a complex data structure and advanced workflows for business-to-business (B2B) customer relationship management.

Your channel partners will then have customers being either consumers (B2B2C) or business users within a wider range of companies. I have noticed an increasing interest in keeping some kind of track of the interaction with end users of your products, and I guess embracing social media will only add to that trend. The business processes, data structures and data quality metrics for doing that are “same same but very different” from your basic customer relationship management.

Conclusion

The above musings are revolved around manufacturing companies, but I have met similar ranges of primary and secondary constructs related to master data management in all other industry verticals.   

So, can all master data in a given company be handled in a single master data hub?

I think it’s possible, but it has to be an extremely flexible hub either having a lot of different built-in functionality or being open for integration with external services.

Bookmark and Share

Using X Factor in Data Quality

Lately I have been experimenting with the X Factor (or Idol) approach to data quality – and I must say, with very promising results.

The basic idea with the X Factor approach to data quality is that it is not about accuracy of data, but all about data appeal.

Data appeal is initially measured by a panel of judges in a data audition. Usually you have 3 or 4 judges, where at least one judge is unbelievably nice and friendly and at least one judge is extremely rude (aka honest). After a following rootcamp the surviving data records are knocked out one by one by the users until we have a golden record as the winner. A secret data steward is usually hosting the show. 

The great thing about the X Factor approach is that the so called “xingle version of the truth” doesn’t last very long. Soon we will have a new season where data is going through the same process again with a completely new golden record as the winner.

Wonder about what Simon says?   

Bookmark and Share

Boiling Data Silos

Yesterday there where some blog posts dealing with data silos.

Graham Rhind posted: Data silos – learn to live with them.

Rob Karel posted: Stop trying to put a monetary value on data – it’s the wrong path. Though not being the main subject there was a remark saying: “Attempting to boil the ocean and trying to solve Customer, Product, or Financial data for all processes and decisions across the whole organization is too big an effort destined to fail before it starts”.  

Mark Montgomery made a comment on Rob’s post saying: “I also have trouble with the boil the ocean metaphor, which is used too often these days to justify all kinds of protectionist policies in the enterprise. You can’t have it both ways in the enterprise– either you have data silos or you don’t, and I argue that increasingly the world cannot afford them, albeit in highly secure formats in most situations”.

I guess we have to go for the golden mean on this one also. We shouldn’t accept data silos but we must expect them. We could go for eliminating them probably not in one big bang but slice by slice as we climb up the levels in an information maturity model.

I would definitely expect to see fewer and smaller data silos at the top level of an information maturity model than on a bottom level of a data quality immaturity model.

Bookmark and Share