Yesterday I had the chance to make a preliminary assessment of the data quality in one of the local databases holding information about entities involved in carbon trade activities. It is believed that up to 90 percent of the market activity may have been fraudulent with criminals pocketing 5 billion Euros. There is a description of the scam here from telegraph.co.uk.
Most of my work with data matching is aimed at finding duplicates. In doing this you must avoid finding so called false positives, so you don’t end up merging information about to different real world entities. But when doing identity resolution for several reasons including preventing fraud and scam you may be interested in finding connections between entities that are not supposed to be connected at all.
The result from making such connections in the carbon trade database was quite astonishing. Here is an example where I have changed the names, addresses, e-mails and phones, but such a pattern was found in several cases:
Here we have an example of a group of entities where the name, address, e-mail or phone is shared in a way that doesn’t seem natural.
My involvement in the carbon trade scam was initiated by a blog post yesterday by my colleague Jan Erik Ingvaldsen based on the story that journalists by merely gazing the database had found addresses that simply doesn’t exist.
So the question is if authorities may have avoided losing 5 billion taxpayer Euros if some identity resolution including automated fuzzy connection checks and real world checks was implemented. I know that you are so much more enlightened on what could have been done when the scam is discovered, but I actually think that there may be a lot of other billions of Euros (Pounds, Dollars, Rupees) to avoid losing out there by making some decent identity resolution.