As reported in the post Crap, Damned Crap, and Big Data there are data quality issues with big data.
The mentioned issue is about the use of quotes in social data: A famous person apparently said something apparently clever and the one who makes an update with the quote gets an unusual large amount of likes, retweets, +1s and other forms of recognition.
But many quotes weren’t actually said by that famous person. Maybe it was said by someone else and in many cases there is no evidence that the famous person said it. Some quotes, like the Einstein quote in the Crap post, actually contradicts what they apparently also has said.
As I have worked a lot with data entry functionality checking for data quality around if a certain address actually exist, if a typed in phone number is valid or an eMail address will bounce I think it’s time to make a quote checker to be plugged in on LinkedIn, Twitter, Facebook, Google Plus and other social networks.
So anyone else out there who wants to join the project – or has it already been said by someone else?