Warning about warnings

In the two months where I have been living now I have seen as many warnings about ”wet floor” and ”slippery ground” as I had until then in my entire life. And I’m not that young.

Given the amount of these warnings all over makes me think that the message is: “Yes, we know that you may tilt and hurt yourself. Actually we don’t care and we don’t intend to do anything about it. But at least now you can’t say, that we didn’t warn you”.

It also makes me think about what is being done about poor data quality all over. There are lots of warnings out there and lots of ways and methodology available about how to measure bad data. But when it comes to actually doing something to solve the problems, well, warning signs seems to be the most preferred remedy.

I’m as guilty as anyone else I guess. I have even proposed a data quality immaturity model once.

Doing something about “wet floor” and “slippery ground” often have a short term workaround and a long term solution. And actually “wet floor” is often due to a recent cleaning action.

A common saying is: “Don’t Bring Me Problems—Bring Me Solutions!”.

Let’s try to put up fewer warning signs and work on having less slippery ground including immediately after a cleaning action.

Bookmark and Share

Who Is Not Using Data Quality Magic?

The other day the latest Gartner Magic Quadrant for Data Quality Tools was released.

If you are interested in knowing what it says, it’s normally possible to download a copy from the leading vendors’ website.

Among the information in the paper you will find some estimated numbers of customers who has purchased the tools from the vendors included in the quadrant.

If you sum up these numbers, then it is estimated that 16,540 organizations worldwide is a customer at an included vendor.

So, if I matched that compiled customer list with the Dun & Bradstreet WorldBase holding at least 100 million active business entities worldwide, I will have a group of at least 99,983,460 companies who is not using magical data quality tools.

And that is probably falsely excluding that there are customers who has more than one vendor.

Anyway, what do all the others do then?

Well, of course the overwhelming number of companies will be too small to have any chance of investing in a data quality tool from a vendor that made it to the quadrant.

The quadrant also list a range of other vendors of data quality tools typically operating locally around the world. These vendors also have customers and probably more customers in numbers but not at the size of the companies who chooses a vendor in the quadrant.   

A lot of data quality technology is also used by service providers who either use a tool from a data quality tool vendor or has made a homegrown solution. So a lot of companies benefit from such services when processing large number of data records to be standardized, deduplicated and enriched.

Then we must not forget that technology doesn’t solve all your data quality issues as stated by the founder of DataQualityPro Dylan Jones in a recent post on a data quality forum operated by the (according to Gartner) leading data quality tool vendor. The post is called Finding the Passion for Data Quality.

My take is that it’s totally true that data quality tools doesn’t solve most of your data quality issues, but those issues addressed, typically data profiling and data matching, are hard to solve without a tool. So there is still a huge market out there currently covered by the true leader in the data quality market: Laissez-Faire.

Bookmark and Share

What’s best: Safe or sorry?

As I have now moved much closer to downtown I have now also changed my car accordingly, so two month ago I squeezed myself into a brand new city car, the Fiat Nuova Cinquecento.

(Un)fortunately the car dealer’s service department called the other day and said some part of the motor had to be replaced because there could be a problem with that part. The manufacturer must have calculated that it’s cheaper (and may be a better customer experience) to be proactive rather than being reactive and deal with the problem if it should occur with my car later.  

(Un)fortunately that’s not the way we usually do it with possible data problems. So, back to work again. Someone’s direct marketing data just crashed in the middle of a campaign.    

Bookmark and Share

Multi-Channel Data Quality

When I hear terms as multi-channel marketing, multi-channel retailing, multi-channel publishing and other multi-channel things I can’t resist thinking that there also must be a term called multi-channel data quality.

Indeed we are getting more and more channels where we do business. It stretches from the good old brick and mortar offline shop over eCommerce and the latest online touch points as mobile devices and social media.

Our data quality is challenged by how the way of the world changes. Customer master data is coming from these disparate channels with various purposes and in divergent formats. Product master data is exposed through these channels in different ways.     

We have to balance our business processes between having a unique single customer view and a unified product information basis and the diverse business needs within each channel.  

Some customer data may be complete and timely in one channel but deficient and out of date in another channel. Some product data may be useful here but inaccurate there.

I think the multi-channel things makes yet a business case for multi-domain (or multi-entity) master data management. Even if it is hard to predict the return on investment for the related data quality and master data management initiatives I think it is easy to foresee the consequences of doing nothing.

Bookmark and Share

Seeing Is Believing

One of my regular activities as a practice manager at a data quality tool vendor is making what we call a ”Test Report”.

Such a “Test Report” is a preferable presale activity regardless of if we are against a competitor or the option of doing nothing (or no more) to improve data quality. In the latter case I usually name our competitor “Laissez-Faire”.

The most test reports I do is revolving around the most frequent data quality issue being duplicates in party master data – names and addresses.

Looking at what an advanced data matching tool can do with your customer master data and other business partner registries is often the decisive factor for choosing to implement the tool.

I like to do the test with a full extract of all current party master data.

A “Test Report” has two major outcomes:

  • Quantifying the estimated number of different types of duplicates, which is the basis for calculating expected Return on Investment for implementing such an advanced data matching tool.
  • Qualifying both some typical and some special examples in order to point at the tuning efforts needed both for an initial match and the recommended ongoing prevention.

When participating in follow up meetings I have found that discussions around what a tool can do (and not do) is much more sensible when backed up by concrete numbers and concrete examples with your particular data.

Bookmark and Share

The Next Level

A quote about data quality from Thomas Redman says:

“It is a waste of effort to improve the quality (accuracy) of data no one ever uses.”

I have learned the quote from Jim Harris who mentioned the quote latest in his post: DQ-Tip: “There is no point in monitoring data quality…”

In a comment Phil Simon said: I love that. I’m jealous that I didn’t think of something so smart.

I’m guessing Phil was into some irony. If so, I can see why. The statement seems pretty obvious and at first glance you can’t imagine anyone taking the opposite stance: Let’s cleanse some data no one ever uses.

Also I think it was meant as being obvious in Redman’s book: Data Driven.

Well, taking it to the next level I can think of the following elaboration:

  1. If you found some data that no one ever uses you should not only avoid improving the quality of that data, you should actually delete the data and make sure that no one uses time and resources for entering or importing the same data in the future.
  2. That is unless the reason that no one ever uses the data is that the quality of the data is poor.  Then you must compare the benefits of improving the data against the costs of doing so. If costs are bigger, proceed with point 1. If benefits are bigger, go to point 3.
  3. It is not  a waste of effort to improve the quality of some data no one ever uses.

Bookmark and Share

Big Time ROI in Identity Resolution

Yesterday I had the chance to make a preliminary assessment of the data quality in one of the local databases holding information about entities involved in carbon trade activities. It is believed that up to 90 percent of the market activity may have been fraudulent with criminals pocketing 5 billion Euros. There is a description of the scam here from telegraph.co.uk.

Most of my work with data matching is aimed at finding duplicates. In doing this you must avoid finding so called false positives, so you don’t end up merging information about to different real world entities. But when doing identity resolution for several reasons including preventing fraud and scam you may be interested in finding connections between entities that are not supposed to be connected at all.

The result from making such connections in the carbon trade database was quite astonishing. Here is an example where I have changed the names, addresses, e-mails and phones, but such a pattern was found in several cases:

Here we have an example of a group of entities where the name, address, e-mail or phone is shared in a way that doesn’t seem natural.

My involvement in the carbon trade scam was initiated by a blog post yesterday by my colleague Jan Erik Ingvaldsen based on the story that journalists by merely gazing the database had found addresses that simply doesn’t exist.

So the question is if authorities may have avoided losing 5 billion taxpayer Euros if some identity resolution including automated fuzzy connection checks and real world checks was implemented. I know that you are so much more enlightened on what could have been done when the scam is discovered, but I actually think that there may be a lot of other billions of Euros (Pounds, Dollars, Rupees) to avoid losing out there by making some decent identity resolution.

Bookmark and Share

Data Quality Milestones

milestoneI have a page on this blog with the heading “Data Quality 2.0”. The page is about what the near future in my opinion will bring in the data quality industry. In recent days there were some comments on the topic. My current summing up on the subject is this:

Data Quality X.X are merely maturity milestones where:

Data Quality 0.0 may be seen as a Laissez-faire state where nothing is done.

Data Quality 1.0 may be seen as projects for improving downstream data quality typically using batch cleansing with national oriented techniques in order to make data fit for purpose.

Data Quality 2.0 may be seen as agile implementation of enterprise wide and small business data quality upstream prevention using multi-cultural combined techniques exploiting cloud based reference data in order to maintain data fit for multiple purposes.