Turning a Blind Eye to Data Quality

The idiom turning a blind eye originates from the sea battle at Copenhagen where Admiral Nelson ignored a signal with permission to withdraw by raising the telescope to his blind eye and say “I really do not see the signal”.

Nelson went on and won the battle.

As a data quality practitioner you are often amazed by how enterprises turns the blind eye to data quality challenges and despite horrible data quality conditions keeps on and wins the battle by growing as a successful business.

The evidence about how poor data quality is costing enterprises huge sums has been out there for a long time. But business success are made over and again despite of bad data. There may be casualties, but the business goals are met anyway. So, the poor data quality is just something that makes the fight harder, not impossible.

I guess we have to change the messaging about data quality improvement away from the doomsday prophesies, which make decision makers turn a blind eye to data quality challenges, and be more specific on maybe smaller but tangible wins where data quality improvement and business efficiency goes hand in hand.        

Bookmark and Share

Book Review: Cervo and Allen on MDM in Practice

Master Data Management is becoming increasingly popular and so are writing books about Master Data Management.

Last month Dalton Cervo and Mark Allen published their contribution to the book selection. The book is called “Master Data Management in Practice: Achieving True Customer MDM”.

As disclosed in the first part of the title, the book emphasizes on the practical aspects of implementing and maintaining Master Data Management and as disclosed in the second part of the title, the book focuses on customer MDM, which, until now, is the most frequent and proven domain in MDM.  

In my opinion the book has succeeded very well in keeping a practical view on MDM. And I think that limiting the focus to customer MDM supports the understanding of the issues discussed in a good way, though, as the authors also recognizes in the final part, that multi-domain MDM is becoming a trend.   

Mastering customer master data is a huge subject area. In my eyes this book addresses all the important topics with a good balance, both in the sense of embracing business and technology angels with equal weight and not presenting the issues in a too simple way or in a too complex way.  

I like how the authors are addressing the ROI question by saying: “Attempts to try to calculate and project ROI will be swag at best and probably miss the central point that MDM is really an evolving business practice that is necessary to better manage your data, and not a specific project with a specific expectation and time-based outcome that can be calculated up front”.

In the final summary the authors say: “The journey through MDM is a constantly learning, churning and maturing experience. Hopefully, we have contributed with enough insight to make your job easier”. Yep, Dalton and Mark, you have done that.

Bookmark and Share

Data Quality as Competitive Advantage

I always wanted to make the above headline, but unfortunately one of the hardest things to do is documenting the direct link between data quality improvement and competitive advantage. Apart from the classic calculation of the cost of returned direct mails most other examples have circumstantial evidences, but there is no smoking gun.

Then yesterday I stumbled upon an example with a different angle. A travel company issued a press release about that new strict rules requires that your name on the flight ticket have to be exactly spelled the same and hold the same name elements as in your passport. So if you made a typo or missed a middle name on your self registration you have to make a correction. Traditional travel companies do that for free, but low-cost airlines may charge up to 100 Euros (often more than the original ticket price) for making the correction.

So traditional travel companies invokes a competitive advantage in allowing better data quality – and the low-cost airlines are making profit from bad data quality.  

Bookmark and Share

The Value of Used Data

Motivated by a comment from Larry Dubov on the Data Quality ROI page on this blog I looked up the term Information Economics on Wikipedia.

When discussing information quality a frequent subject is if we can compare quality in manufacturing (and the related methodology) with information and data quality. The predominant argument against this comparison is that raw data can be reused multiple times while raw materials can’t.

Information Economics circles around that difference as well.

The value of data is very much dependent on how the data is being used and in many cases the value increases with the times the data is being used.

Data quality will probably increase with multiple uses as the accuracy and timeliness is probed with each use, a new conformity requirement may be discovered and the completeness may be expanded.

The usefulness of data (as information) may also be increased by each new use as new relations to other pieces of data are recorded.

In my eyes the value of (used) data is very much relying on how well you are able to capture the feedback from how data is used in business processes. This is actually the same approach as in continuous quality improvement (Kaizen) in manufacturing, only here the improvement is only good for the next goods to be produced. In data management we have the chance to improve the quality and value of already used data.    

Bookmark and Share

All that glisters is not gold

As William (not Bill) Shakespeare wrote in the play The Merchant of Venice:

All that glisters is not gold;
Often have you heard that told

I was reminded about that phrase when commenting on a comment from John Owens in my recent post called Non-Obvious Entity Relationship Awareness.

Loraine Lawson wrote a piece on IT Business Edge yesterday called Adding Common Sense to Data Quality. That post relates to a post by Phil Simon on Mike 2.0 called Data Error Inequality. That post relates to a post on this blog called Pick Any Two.

Anyway, one learning from all this glistering relationship fuzz is that when looking for return on investment (Gold) in data quality improvement and master data management perfection I agree with adding some common sense.

One of the first posts on this blog actually was Data Quality and Common Sense.  

Bookmark and Share

Miracle Food for Thought

We all know the headlines in the media about food and drink and your health. One day something is healthy, the next day it will kill you. You are struck with horror when you learn that even a single drop of alcohol will harm your body until you are relieved by the wise words saying that a glass (or two) of red wine a day keeps the doctor away.

These misleading, exaggerated and contradictory headlines are now documented in a report called Miracle Food, Myth and the Media.

It’s the same with data quality, isn’t it?

Sometimes some data are fit for purpose. At another time at another place the very same data are rubbish.

As said as an excerpt from the Miracle Food report:

“The facts about the latest dietary discoveries are rarely as simple as the headlines imply. Accurately testing how any one element of our diet may affect our health is fiendishly difficult. And this means scientists’ conclusions, and media reports of them, should routinely be taken with a pinch of salt.”

It’s about the same with data quality, isn’t it?

Accurately testing how any one element of our data may affect our business is fiendishly difficult. So predictions of return of investment (ROI) from data quality improvement are unfortunately routinely taken with a big spoon of salt.

Bon appétit.

Bookmark and Share

Multi-Channel Data Quality

When I hear terms as multi-channel marketing, multi-channel retailing, multi-channel publishing and other multi-channel things I can’t resist thinking that there also must be a term called multi-channel data quality.

Indeed we are getting more and more channels where we do business. It stretches from the good old brick and mortar offline shop over eCommerce and the latest online touch points as mobile devices and social media.

Our data quality is challenged by how the way of the world changes. Customer master data is coming from these disparate channels with various purposes and in divergent formats. Product master data is exposed through these channels in different ways.     

We have to balance our business processes between having a unique single customer view and a unified product information basis and the diverse business needs within each channel.  

Some customer data may be complete and timely in one channel but deficient and out of date in another channel. Some product data may be useful here but inaccurate there.

I think the multi-channel things makes yet a business case for multi-domain (or multi-entity) master data management. Even if it is hard to predict the return on investment for the related data quality and master data management initiatives I think it is easy to foresee the consequences of doing nothing.

Bookmark and Share

The Value of Free Address Data

In yesterdays blog post I wrote about Free and Open Sources of Reference Data. As mentioned we have had some discussions in my home country Denmark about fees for access to public sector data.

However since 2002 basic Danish public sector data about addresses has been free without a fee. This summer a report about the benefits from this practice was released. Link in Danish here.

I’ll quote the key findings:

  • The direct economic gains for the Danish community in the last five years 2005-2009 is approximately 471 million DKK (63 million EUR). The total cost until 2009 has been about 15 million DKK (2 million EUR).
  • Approximately 30% of the profits are made in the public sector and approximately 70% at the private actors.

I think this is a fine example of the win-win situation we’ll get when sharing data between public sector and private sector.

Bookmark and Share

Returns from Investing in a Data Quality Tool

The classic data quality business case is avoiding sending promotion letters and printed materials to duplicate prospects and customers.

Even as e-commerce moves forward and more complex data quality business cases as those related to multi-purpose master data management becomes more important I will like to take a look at the classic business case by examining some different kind of choices for a data quality tool.

As you may be used to all different kind of currencies as EUR, USD, AUD, GBP and so on I will use the fictitious currency SSB (Simple Stupid Bananas).

Let’s say we have a direct marketing campaign with these facts:

  • 100,000 names and addresses, ½ of them also with phone number
  • Cost per mail is 3 SSB
  • Response is 4,500 orders with an average profit of 100 SSB

From investigating a sample we know that 10% of the names and addresses are duplicates with slightly different spellings.

So from these figures we know that the cost of a false negative (a not found actual duplicate) is 3 SSB. Savings of a true positive is then also 3 SSB.

The cost of a false positive (a found duplicate that actually isn’t a duplicate) is a possible missing order worth: 4,500 / (100,000 * 90 %) * 100 SSB = 5 SSB.

Now let’s examine 3 options for tools for finding duplicates:

A: We already have Excel

B: Buying the leader of the pack data quality tool

C: Buying an algorithm based dedupe tool

A: We already have Excel

You may first sort 100,000 rows by address and look for duplicates this way. Say you find 2,000 duplicates. Then sort 98,000 rows by surname and look for duplicates. Say you find 1,000 duplicates. Then sort 97,000 rows by given name. Say you find 1,000 duplicate. Finally sort 48,000 rows by phone number. Say you find 1,000 duplicates.

If a person can look for duplicates in 1,000 rows per hour (without making false positives) we will browse totally 343,000 sorted rows in 343 hours.

Say you hire a student for that and have the Subject Matter Expert explaining, controlling and verifying the process using 15 hours.

Costs are:

  • 343 student hours each 15 SSB = 5.145 SSB
  • 15 SME hours each 50 SSB = 750 SSB

Total costs are 5.895 SSB.

Total savings are 5,000 true positives each 3 SSB = 15.000 SSB, making a positive ROI = 9.105 SSB in each campaign.

Only thing is that it will take one student more than 2 months (without quitting) to do the job.

B: Buying the leader of the pack data quality tool

Such a tool may have all kind of data quality monitoring features, may be integrated smoothly with ETL functionality and so on. For data matching it may use so called match codes. Doing that we may expect that the tool will find 7,500 duplicates where 7,000 are true positives and 500 are false positives.

Costs may be:

  • Tool license fee is 50.000 SSB
  • Training fee is 7.000 SSB
  • 80 hours external consultancy each 125 SSB  = 10.000 SSB
  • 60 IT hours for training and installation each 50 SSB = 3.000 SSB
  • 100 SME hours for training and configuration each 50 SSB = 5.000 SSB

Total costs are 75.000 SSB

Savings per campaign are 7,000 * 3 SSB – 500* 5 SSB = 18.500 SSB.

A positive ROI will show up after the 5th campaign.

C: Buying an algorithm based dedupe tool

By using algorithm based data matching such a tool depending on the threshold setting may find 9,100 duplicates where 9,000 are true positives and 100 are false positives.

Costs may be:

  • Tool license fee is 5.000 SSB
  • 8 hours external consultancy for a workshop each 125 SSB  = 1.000 SSB
  • 15 SME hours for training, configuration and pushing the button each 50 SSB = 750 SSB

Total costs are 6.750 SSB

Savings per campaign are 9,000 * 3 SSB – 100* 5 SSB = 26.500 SSB

A remarkable ROI will show up in the 1st campaign.

Bookmark and Share