Laissez faire – Liliendahl on Data Quality

Warning about warnings

30th November 2011Henrik Gabs Liliendahl3 Comments

In the two months where I have been living now I have seen as many warnings about ”wet floor” and ”slippery ground” as I had until then in my entire life. And I’m not that young.

Given the amount of these warnings all over makes me think that the message is: “Yes, we know that you may tilt and hurt yourself. Actually we don’t care and we don’t intend to do anything about it. But at least now you can’t say, that we didn’t warn you”.

It also makes me think about what is being done about poor data quality all over. There are lots of warnings out there and lots of ways and methodology available about how to measure bad data. But when it comes to actually doing something to solve the problems, well, warning signs seems to be the most preferred remedy.

I’m as guilty as anyone else I guess. I have even proposed a data quality immaturity model once.

Doing something about “wet floor” and “slippery ground” often have a short term workaround and a long term solution. And actually “wet floor” is often due to a recent cleaning action.

A common saying is: “Don’t Bring Me Problems—Bring Me Solutions!”.

Let’s try to put up fewer warning signs and work on having less slippery ground including immediately after a cleaning action.

Who Is Not Using Data Quality Magic?

2nd August 20112nd August 2011Henrik Gabs Liliendahl1 Comment

The other day the latest Gartner Magic Quadrant for Data Quality Tools was released.

If you are interested in knowing what it says, it’s normally possible to download a copy from the leading vendors’ website.

Among the information in the paper you will find some estimated numbers of customers who has purchased the tools from the vendors included in the quadrant.

If you sum up these numbers, then it is estimated that 16,540 organizations worldwide is a customer at an included vendor.

So, if I matched that compiled customer list with the Dun & Bradstreet WorldBase holding at least 100 million active business entities worldwide, I will have a group of at least 99,983,460 companies who is not using magical data quality tools.

And that is probably falsely excluding that there are customers who has more than one vendor.

Anyway, what do all the others do then?

Well, of course the overwhelming number of companies will be too small to have any chance of investing in a data quality tool from a vendor that made it to the quadrant.

The quadrant also list a range of other vendors of data quality tools typically operating locally around the world. These vendors also have customers and probably more customers in numbers but not at the size of the companies who chooses a vendor in the quadrant.

A lot of data quality technology is also used by service providers who either use a tool from a data quality tool vendor or has made a homegrown solution. So a lot of companies benefit from such services when processing large number of data records to be standardized, deduplicated and enriched.

Then we must not forget that technology doesn’t solve all your data quality issues as stated by the founder of DataQualityPro Dylan Jones in a recent post on a data quality forum operated by the (according to Gartner) leading data quality tool vendor. The post is called Finding the Passion for Data Quality.

My take is that it’s totally true that data quality tools doesn’t solve most of your data quality issues, but those issues addressed, typically data profiling and data matching, are hard to solve without a tool. So there is still a huge market out there currently covered by the true leader in the data quality market: Laissez-Faire.

What’s best: Safe or sorry?

15th June 2011Henrik Gabs Liliendahl5 Comments

As I have now moved much closer to downtown I have now also changed my car accordingly, so two month ago I squeezed myself into a brand new city car, the Fiat Nuova Cinquecento.

(Un)fortunately the car dealer’s service department called the other day and said some part of the motor had to be replaced because there could be a problem with that part. The manufacturer must have calculated that it’s cheaper (and may be a better customer experience) to be proactive rather than being reactive and deal with the problem if it should occur with my car later.

(Un)fortunately that’s not the way we usually do it with possible data problems. So, back to work again. Someone’s direct marketing data just crashed in the middle of a campaign.

Multi-Channel Data Quality

8th February 20117th March 2011Henrik Gabs Liliendahl4 Comments

When I hear terms as multi-channel marketing, multi-channel retailing, multi-channel publishing and other multi-channel things I can’t resist thinking that there also must be a term called multi-channel data quality.

Indeed we are getting more and more channels where we do business. It stretches from the good old brick and mortar offline shop over eCommerce and the latest online touch points as mobile devices and social media.

Our data quality is challenged by how the way of the world changes. Customer master data is coming from these disparate channels with various purposes and in divergent formats. Product master data is exposed through these channels in different ways.

We have to balance our business processes between having a unique single customer view and a unified product information basis and the diverse business needs within each channel.

Some customer data may be complete and timely in one channel but deficient and out of date in another channel. Some product data may be useful here but inaccurate there.

I think the multi-channel things makes yet a business case for multi-domain (or multi-entity) master data management. Even if it is hard to predict the return on investment for the related data quality and master data management initiatives I think it is easy to foresee the consequences of doing nothing.

Seeing Is Believing

1st July 20101st July 2010Henrik Gabs LiliendahlLeave a comment

One of my regular activities as a practice manager at a data quality tool vendor is making what we call a ”Test Report”.

Such a “Test Report” is a preferable presale activity regardless of if we are against a competitor or the option of doing nothing (or no more) to improve data quality. In the latter case I usually name our competitor “Laissez-Faire”.

The most test reports I do is revolving around the most frequent data quality issue being duplicates in party master data – names and addresses.

Looking at what an advanced data matching tool can do with your customer master data and other business partner registries is often the decisive factor for choosing to implement the tool.

I like to do the test with a full extract of all current party master data.

A “Test Report” has two major outcomes:

Quantifying the estimated number of different types of duplicates, which is the basis for calculating expected Return on Investment for implementing such an advanced data matching tool.
Qualifying both some typical and some special examples in order to point at the tuning efforts needed both for an initial match and the recommended ongoing prevention.

When participating in follow up meetings I have found that discussions around what a tool can do (and not do) is much more sensible when backed up by concrete numbers and concrete examples with your particular data.

Big Time ROI in Identity Resolution

8th May 20105th May 2012Henrik Gabs Liliendahl8 Comments

Yesterday I had the chance to make a preliminary assessment of the data quality in one of the local databases holding information about entities involved in carbon trade activities. It is believed that up to 90 percent of the market activity may have been fraudulent with criminals pocketing 5 billion Euros. There is a description of the scam here from telegraph.co.uk.

Most of my work with data matching is aimed at finding duplicates. In doing this you must avoid finding so called false positives, so you don’t end up merging information about to different real world entities. But when doing identity resolution for several reasons including preventing fraud and scam you may be interested in finding connections between entities that are not supposed to be connected at all.

The result from making such connections in the carbon trade database was quite astonishing. Here is an example where I have changed the names, addresses, e-mails and phones, but such a pattern was found in several cases:

Here we have an example of a group of entities where the name, address, e-mail or phone is shared in a way that doesn’t seem natural.

My involvement in the carbon trade scam was initiated by a blog post yesterday by my colleague Jan Erik Ingvaldsen based on the story that journalists by merely gazing the database had found addresses that simply doesn’t exist.

So the question is if authorities may have avoided losing 5 billion taxpayer Euros if some identity resolution including automated fuzzy connection checks and real world checks was implemented. I know that you are so much more enlightened on what could have been done when the scam is discovered, but I actually think that there may be a lot of other billions of Euros (Pounds, Dollars, Rupees) to avoid losing out there by making some decent identity resolution.

Data Quality Milestones

25th July 20095th January 2011Henrik Gabs Liliendahl2 Comments

I have a page on this blog with the heading “Data Quality 2.0”. The page is about what the near future in my opinion will bring in the data quality industry. In recent days there were some comments on the topic. My current summing up on the subject is this:

Data Quality X.X are merely maturity milestones where:

Data Quality 0.0 may be seen as a Laissez-faire state where nothing is done.

Data Quality 1.0 may be seen as projects for improving downstream data quality typically using batch cleansing with national oriented techniques in order to make data fit for purpose.

Data Quality 2.0 may be seen as agile implementation of enterprise wide and small business data quality upstream prevention using multi-cultural combined techniques exploiting cloud based reference data in order to maintain data fit for multiple purposes.

	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph
	Henrik Gabs Lilienda… on SAP and Master Data Manag…
	Conrad Greer on SAP and Master Data Manag…
	Henrik Gabs Lilienda… on SAP and Master Data Manag…
	Michael Fieg, Parsio… on SAP and Master Data Manag…
	Asifa on Data Fabric and Master Data…
	Henrik Gabs Lilienda… on Data Fabric and Master Data…