Whether Weather Forecasting or Not

Predicting ROI from a data quality program (and many other business initiatives) is like predicting the weather. Probably you are able to guess if it is going to be good or bad, but most often you don’t exactly guess how well or bad it actually turned out.

Chances for predicting the weather right varies along with the time of year and your location. I have the pleasure of living in a place (Denmark) where the weather is pretty unpredictable.

Well, winter is usually cold and summer is warm.

We also know that if we have easterly winds coming in from the Russian Steppe during winter, it turns very cold. In summer that wind will make beautiful hot sunny days. Westerly winds in the winter coming in from the Atlantic Ocean means temperatures above freezing. In summer that wind often has some chill and rain with it.

But these are the main scenarios. Between those rough generalizations there is a myriad of factors, events and not fully understood processes that makes weather forecasting a chaotic discipline.

Making business cases for data quality programs have the same challenges. Well, at some spots on the globe (in some parts of the year) you can wake up every morning and be certain that it is going to be a hot sunny day. Likewise a lot of business activities will without any doubt benefit from better data quality – no further forecasting needed. In other cases it may be uncertain. Here you may rely on previous experiences (case studies by others) and your position. You may outline a business case and you could be right.

This morning at my place was forecasted to be mostly cloudy but dry. It is damned cloudy and raining a bit.

Big Time ROI in Identity Resolution

Yesterday I had the chance to make a preliminary assessment of the data quality in one of the local databases holding information about entities involved in carbon trade activities. It is believed that up to 90 percent of the market activity may have been fraudulent with criminals pocketing 5 billion Euros. There is a description of the scam here from telegraph.co.uk.

Most of my work with data matching is aimed at finding duplicates. In doing this you must avoid finding so called false positives, so you don’t end up merging information about to different real world entities. But when doing identity resolution for several reasons including preventing fraud and scam you may be interested in finding connections between entities that are not supposed to be connected at all.

The result from making such connections in the carbon trade database was quite astonishing. Here is an example where I have changed the names, addresses, e-mails and phones, but such a pattern was found in several cases:

Here we have an example of a group of entities where the name, address, e-mail or phone is shared in a way that doesn’t seem natural.

My involvement in the carbon trade scam was initiated by a blog post yesterday by my colleague Jan Erik Ingvaldsen based on the story that journalists by merely gazing the database had found addresses that simply doesn’t exist.

So the question is if authorities may have avoided losing 5 billion taxpayer Euros if some identity resolution including automated fuzzy connection checks and real world checks was implemented. I know that you are so much more enlightened on what could have been done when the scam is discovered, but I actually think that there may be a lot of other billions of Euros (Pounds, Dollars, Rupees) to avoid losing out there by making some decent identity resolution.

Bookmark and Share

55 reasons to improve data quality

The business value in data quality improvement is an ever recurring topic in the realm of data quality.

In the following I will list the first 55 reasons that comes to my mind for improving data quality related to the single most frequent data quality issue around, which is duplicates (and unresolved hierarchies) in party master data – names and addresses.

It goes like this:

1.  It’s a waste of money sending the same printed material twice or more times to the same individual consumer.

2.  Allowing the same customer enter twice or more times for an introduction offer challenges the return of investment in such campaigns.

3.  When measuring churn and win-back two or more unrelated accounts for the same business hierarchy will produce an incomplete result leading to a wrong decision.

4.  Sending the same promotion eMail twice or more times to the same individual consumer looks like spam even if different eMail addresses are used. Spam has more offending than selling power.

5.  It’s probably a waste of money sending the same printed material with presentation and offerings to a household already having a customer.

6.  Assigning different credit terms for two or more unrelated accounts for the same business hierarchy will make uncontrolled financial risk.

7.  When measuring cross selling results two or more unrelated accounts for the same household will produce an incomplete result leading to a wrong decision.

8.  When measuring life time value two or more unrelated accounts for the same individual consumer will produce a wrong result leading to a wrong decision.

9.  It’s probably a waste of money sending the same printed material twice or more times to the same household.

10.  When measuring life time value two or more unrelated accounts for the same individual being a consumer and a business owner will produce an incomplete result leading to a wrong decision.

11.  When wanting a 1-1 dialogue two or more unrelated accounts for the same individual consumer will not lead to a 1-1 dialogue.

12.  Having companies represented in two or more unrelated accounts for the same company with a different line-of-business assigned will produce an incomplete segmentation.

13.  When trying to point at your best customers being households in order to find similar households two or more unrelated accounts for the same household will produce an incomplete segmentation.

14.  When measuring cross selling results two or more unrelated accounts for the same individual consumer will produce a wrong result leading to a wrong decision.

15.  It’s a waste of money sending printed material with presentation and offerings to an individual consumer already being a customer.

16.  When wanting a 1-1 dialogue two or more unrelated accounts for the same business hierarchy will not lead to a complete 1-1 dialogue.

17.  When measuring life time value two or more unrelated accounts for the same business hierarchy will produce an incomplete result leading to a wrong decision.

18.  Assigning different credit terms for two or more unrelated accounts for the same individual consumer will increase financial risk.

19.  When measuring cross selling results two or more unrelated accounts for the same individual being a consumer and a business owner will produce only an incoherent result leading to a wrong decision.

20.  When wanting a 1-1 dialogue two or more unrelated accounts for the same household will not lead to a true 1-1 dialogue.

21.  Assigning different credit terms for two or more unrelated accounts for the same business entity could increase financial risk.

22.  Having activities related to companies attached to two or more unrelated accounts for the same company will show an incomplete customer history with the risk of taking damaging actions.

23.  It’s a waste of money and credibility sending printed material with presentation and offerings to an individual business decision maker in a business entity already being a customer.

24.  When buying from a supplier having two or more unrelated accounts despite being the same business entity you may miss discount opportunities.

25.  Having companies represented in two or more unrelated accounts for the same company with a different lead source assigned will produce a false measure of marketing and sales performance.

26.  Sending the same promotion eMail or newsletter twice or more times to the same individual business decision maker looks like spam even if different eMail addresses are used. Spam has more offending than selling power.

27.  When measuring  churn and win-back two or more unrelated accounts for the same household will produce an incomplete result leading to a wrong decision.

28.  Having activities related to influencers attached to two or more unrelated business contact records for the same person will show an incomplete business partner history with the risk of retaking already made actions.

29.  When buying from a supplier having two or more unrelated accounts despite they are belonging the same business hierarchy you could miss discount opportunities.

30.  Having activities related to households attached to two or more unrelated accounts for the same household will show an incomplete customer history with the risk of taking insufficient  actions.

31.  When trying to point at your best customers being individual consumers in order to find similar individuals two or more unrelated accounts for the same individual consumer will produce a wrong segmentation.

32.  Having companies represented in two or more unrelated accounts for the same company with a different address assigned will produce an incomplete segmentation.

33.  When measuring life time value two or more unrelated accounts for the same business entity will produce a false result leading to a wrong decision.

34.  Having activities related to decision makers in companies attached to two or more unrelated contacts for the same person will show an incomplete customer contact history with the risk of not taking appropriate actions.

35.  When wanting a 1-1 dialogue two or more unrelated accounts for the same business entity will not lead to a real 1-1 dialogue.

36.  When trying to point at your best customers being companies in order to find similar companies two or more unrelated accounts for the same company will produce a false segmentation.

37.  Maintaining data related to two or more unrelated accounts for the same real world entity will probably be more costly than necessary when exploiting external reference data.

38.  It’s probably a waste of money sending printed material with presentation and offerings to a business entity already being a customer at a higher or lower hierarchy level.

39.  Having individual consumers represented in two or more unrelated accounts for the same individual consumer with a different lead source assigned will produce a wrong measure of marketing and sales performance.

40.  Allowing the same customer re-enter for an offer already turned down (e.g. credit services) will create unnecessary double validation work.

41.  When measuring churn and win-back two or more unrelated accounts for the same business entity will produce a false result leading to a wrong decision.

42.  When wanting a 1-1 dialogue two ore more unrelated accounts for the same individual being a consumer and a business owner will not lead to a sensible 1-1 dialogue.

43.  When measuring cross selling results two or more unrelated accounts for the same business entity will produce a false result leading to a wrong decision.

44.  Having activities related to individual consumers attached to two or more unrelated accounts for the same individual consumer will show an incomplete customer history with the risk of taking wrong actions.

45.  When measuring life time value two or more unrelated accounts for the same household will produce an incomplete result leading to a wrong decision.

46.  Having activities related to customers attached to two or more unrelated accounts for the same real world entity may lead to that different sales representatives are working against each other.

47.  Allowing sales representatives creating new accounts for already existing customers may create time consuming commission disputes.

48.  Having households represented in two or more unrelated accounts for the same household with a different lead source assigned will produce an incomplete measure of marketing and sales performance.

49.  Maintaining data related to two or more unrelated accounts for the same real world entity will consume more manual work than necessary.

50.  When measuring churn and win-back two or more unrelated accounts for the same individual consumer will produce a wrong result leading to a wrong decision.

51.  When buying from a supplier having two or more unrelated accounts despite being the same business entity you may have multiple unnecessary inventory costs.

52.  It’s a waste of money and credibility sending the same printed material twice or more times to the same individual business decision maker.

53.  When measuring churn and win-back two or more unrelated accounts for the same individual being a consumer and a business owner will produce only an incoherent result leading to a wrong decision.

54.  Assigning different credit terms for two or more unrelated accounts for the same household may increase financial risk.

55.  When measuring cross selling results two or more unrelated accounts for the same business hierarchy will produce an incomplete result leading to a wrong decision.

Bookmark and Share

The Statue of Liberty versus The Little Mermaid

Statue_of_Liberty_NYThe Statue of Liberty in New York harbor is 46 metres (151 ft) high – 93 metres (305 ft) with foundation and pedestal.

The Little Mermaid sits on a rock in the Copenhagen harbour. The relatively small size of the statue typically surprises tourists visiting for the first time. The Little Mermaid statue is only 1.25 metres (4 ft) high.

Little_Mermaid_CopenhagenActually most things in Denmark are smaller than in the US – also the size of companies. Of course there are Maersk, Carlsberg and Lego, but most of companies from there are SMB’s (Small and Medium sized Business’s) in a global sense.

As Graham Rhind points out in his blog http://grcdi.blogspot.com/2009/05/what-about-rest-of-data.html most literature about data quality is fixed completely on data held in large corporate entities. Statistically the relative number of SMB’s are probably close to the same – but having only a few large companies somehow shifts the focus more to the SMB’s in my country (and our Nordic neighbours).

This is why I have actually worked with data quality improvement both at SMB’s and at large companies.

Most significant differences as I have seen is probably not surprising on the data governance part, where you have to use much more agile (guerrilla) approaches with the SMB’s.

The technology part is pretty much the same – but ROI is king as ever. With SMB’s results must show up almost immediately, there is no room for months of tuning. Software must be user friendly, there is no room for excessive consultancy.

I can recommend all data quality professionals to do a SMB implementation in order to sharpen your skills and tools.

Bookmark and Share