Yesterday I had the chance to make a preliminary assessment of the data quality in one of the local databases holding information about entities involved in carbon trade activities. It is believed that up to 90 percent of the market activity may have been fraudulent with criminals pocketing 5 billion Euros. There is a description of the scam here from telegraph.co.uk.
Most of my work with data matching is aimed at finding duplicates. In doing this you must avoid finding so called false positives, so you don’t end up merging information about to different real world entities. But when doing identity resolution for several reasons including preventing fraud and scam you may be interested in finding connections between entities that are not supposed to be connected at all.
The result from making such connections in the carbon trade database was quite astonishing. Here is an example where I have changed the names, addresses, e-mails and phones, but such a pattern was found in several cases:
Here we have an example of a group of entities where the name, address, e-mail or phone is shared in a way that doesn’t seem natural.
My involvement in the carbon trade scam was initiated by a blog post yesterday by my colleague Jan Erik Ingvaldsen based on the story that journalists by merely gazing the database had found addresses that simply doesn’t exist.
So the question is if authorities may have avoided losing 5 billion taxpayer Euros if some identity resolution including automated fuzzy connection checks and real world checks was implemented. I know that you are so much more enlightened on what could have been done when the scam is discovered, but I actually think that there may be a lot of other billions of Euros (Pounds, Dollars, Rupees) to avoid losing out there by making some decent identity resolution.
Excellent post Henrik,
As you know, one of the most common objections to data cleansing efforts is that they often produce considerable costs without delivering tangible and significant ROI.
One of the ways that the ROI of removing duplicates can be measured is the cost savings on redundant postal deliveries, which although sometimes significant (with high duplicate rates) doesn’t exactly wow executives.
Identity resolution, especially in situations of fraud detection, can not only deliver more significant ROI, but can make the need more obvious for defect prevention using integrated real-time matching services where data originates.
Whether performing duplicate identification or identity resolution, false positives remain a justifiable concern, but the negative impact of false negatives can be much more damaging with identity resolution.
Thanks for providing a great real-world example of this challenging problem.
Thanks for the comment Jim.
We are often short of success stories about data quality because no one measures what could have happened if we didn’t prevent it to happen. Instead we can merely present these trainwrecks and hope someone learns from these.
In the case described here you could possibly not beforehand state that if you don’t apply any kind of identity resolution you will lose 5 billion Euros. But I think common sense could have told you that not doing so is like leaving the door to the treasury open at night.
Henrik, another great story, and real world….
Like Jim mentioned, the cost of Data Profiling, and Identity resolution does have up front costs, and little to show for immediate ROI. While you can see tangible dollars from Jim’s example (redundant mailings to the same address), also look at sharing with the business group the realistic view into their customer base.
Looking at Banks and Retail, being able to see real trends based on real customer bases is powerful, analytically. Not to mention Fraud, where finding the trends of erroneous data based on validation techniques (or persistent searches) can avoid lots of risk in the near future. (try to avoid the train wreck based on what we have learned before).
Thanks Guys, for sharing, and I hope this will spawn a few more entries where we an share / learn and communicate the real value of good DQ.
I only wish that examples like this were used more prevalently. I have seen way too many people dismiss DQ efforts because they didn’t see any benefit.
Henrik – you just proved them wrong.
Garnie and Phil, thanks for the kind words.
This is the nice information, actually any one can make to their identity only one time in the life, so keep on to this right & good way.
Thanks Thomas. I have been following InfoGlide for some years.