We are often talking about big data as if it is one kind of data while in fact we need separate approaches to handling for example data quality issues with different sorts of big data.

In the following I will go through some different types of big data and share some observations related to data quality.
Social data
The most mentioned type of big data I guess is social data and the opportunity to listen to Twitter streams and Facebook status updates in order to get better customer insight is an often stated business case for analyzing big data.
However, everyone who listens to those data will be aware of the tremendous data quality problems in doing that as told in the post Crap, Damned Crap and Big Data.
Sensor data
Another often mentioned type of big data is sensor data and as examined in the post Social Data vs Sensor Data these are somewhat different from social data with less complex data quality issues but not in all free of data quality flaws as reported in the post Going in the Wrong Direction.
Web logs
Following the clicks from people surfing the internet is a third type of big data. This kind of big data shares characteristics from both social data and sensor data as they are human generated as social data but more fact oriented as sensor data.
Big transaction data
Even traditional transaction data in huge volume are treated as big data but of course inherits the same data quality challenges as all transaction data as even that data are structured we may have trouble with having the right relations to the who, what, where and when in the transactions. And that isn’t easier with large volumes.
Big reference data
When reference data grows big we also meet big complexity. Try for example to build a reference data set with all the valid postal addresses in the world. Several standardizing bodies have a hard time making a common model for that right now. Learn about other examples of big reference data and the related complexity in the post Big Reference Data Musings.
Rephrase “web logs” as log files in general. Your cluster log files, available, searchable.
Hi Henrik, I also think there are other types of data including video, image (photos, images, scans), sound/recorded media from telephony, media devices, spatial, mapping and I suspect others depending on industry.
Thanks vstrien and Ed for commenting.
On the Big Data Quality LinkedIn group Kunju Kashalikar also commented.
“Location data which seems to be classified under sensor data , could be considered at the intersection of social / sensor.”
All valid points and I wasn’t trying to make a complete classification of big data types but only point to the very different approaches we need when exploiting big data here seen in a data quality perspective.
Great blog Henrik….I like the way you are peeling back the ‘onion’ layers to dig into the Big Data problem….I wonder what your thoughts are on Graph Data Quality?
Thanks Mike. Graph database data quality is a great subject for a coming post. Links into social MDM as well.