The classic data quality business case is avoiding sending promotion letters and printed materials to duplicate prospects and customers.
Even as e-commerce moves forward and more complex data quality business cases as those related to multi-purpose master data management becomes more important I will like to take a look at the classic business case by examining some different kind of choices for a data quality tool.
As you may be used to all different kind of currencies as EUR, USD, AUD, GBP and so on I will use the fictitious currency SSB (Simple Stupid Bananas).
Let’s say we have a direct marketing campaign with these facts:
- 100,000 names and addresses, ½ of them also with phone number
- Cost per mail is 3 SSB
- Response is 4,500 orders with an average profit of 100 SSB
From investigating a sample we know that 10% of the names and addresses are duplicates with slightly different spellings.
So from these figures we know that the cost of a false negative (a not found actual duplicate) is 3 SSB. Savings of a true positive is then also 3 SSB.
The cost of a false positive (a found duplicate that actually isn’t a duplicate) is a possible missing order worth: 4,500 / (100,000 * 90 %) * 100 SSB = 5 SSB.
Now let’s examine 3 options for tools for finding duplicates:
A: We already have Excel
B: Buying the leader of the pack data quality tool
C: Buying an algorithm based dedupe tool
A: We already have Excel
You may first sort 100,000 rows by address and look for duplicates this way. Say you find 2,000 duplicates. Then sort 98,000 rows by surname and look for duplicates. Say you find 1,000 duplicates. Then sort 97,000 rows by given name. Say you find 1,000 duplicate. Finally sort 48,000 rows by phone number. Say you find 1,000 duplicates.
If a person can look for duplicates in 1,000 rows per hour (without making false positives) we will browse totally 343,000 sorted rows in 343 hours.
Say you hire a student for that and have the Subject Matter Expert explaining, controlling and verifying the process using 15 hours.
Costs are:
- 343 student hours each 15 SSB = 5.145 SSB
- 15 SME hours each 50 SSB = 750 SSB
Total costs are 5.895 SSB.
Total savings are 5,000 true positives each 3 SSB = 15.000 SSB, making a positive ROI = 9.105 SSB in each campaign.
Only thing is that it will take one student more than 2 months (without quitting) to do the job.
B: Buying the leader of the pack data quality tool
Such a tool may have all kind of data quality monitoring features, may be integrated smoothly with ETL functionality and so on. For data matching it may use so called match codes. Doing that we may expect that the tool will find 7,500 duplicates where 7,000 are true positives and 500 are false positives.
Costs may be:
- Tool license fee is 50.000 SSB
- Training fee is 7.000 SSB
- 80 hours external consultancy each 125 SSB = 10.000 SSB
- 60 IT hours for training and installation each 50 SSB = 3.000 SSB
- 100 SME hours for training and configuration each 50 SSB = 5.000 SSB
Total costs are 75.000 SSB
Savings per campaign are 7,000 * 3 SSB – 500* 5 SSB = 18.500 SSB.
A positive ROI will show up after the 5th campaign.
C: Buying an algorithm based dedupe tool
By using algorithm based data matching such a tool depending on the threshold setting may find 9,100 duplicates where 9,000 are true positives and 100 are false positives.
Costs may be:
- Tool license fee is 5.000 SSB
- 8 hours external consultancy for a workshop each 125 SSB = 1.000 SSB
- 15 SME hours for training, configuration and pushing the button each 50 SSB = 750 SSB
Total costs are 6.750 SSB
Savings per campaign are 9,000 * 3 SSB – 100* 5 SSB = 26.500 SSB
A remarkable ROI will show up in the 1st campaign.

55.580294
12.282991
Like this:
Like Loading...