Five Flavors of Big Data

We are often talking about big data as if it is one kind of data while in fact we need separate approaches to handling for example data quality issues with different sorts of big data.

Big Data Quality
Join the Big Data Quality group on LinkedIn

In the following I will go through some different types of big data and share some observations related to data quality.

Social data

The most mentioned type of big data I guess is social data and the opportunity to listen to Twitter streams and Facebook status updates in order to get better customer insight is an often stated business case for analyzing big data.

However, everyone who listens to those data will be aware of the tremendous data quality problems in doing that as told in the post Crap, Damned Crap and Big Data.

Sensor data

Another often mentioned type of big data is sensor data and as examined in the post Social Data vs Sensor Data these are somewhat different from social data with less complex data quality issues but not in all free of data quality flaws as reported in the post Going in the Wrong Direction.

Web logs

Following the clicks from people surfing the internet is a third type of big data. This kind of big data shares characteristics from both social data and sensor data as they are human generated as social data but more fact oriented as sensor data.

Big transaction data

Even traditional transaction data in huge volume are treated as big data but of course inherits the same data quality challenges as all transaction data as even that data are structured we may have trouble with having the right relations to the who, what, where and when in the transactions. And that isn’t easier with large volumes.

Big reference data

When reference data grows big we also meet big complexity. Try for example to build a reference data set with all the valid postal addresses in the world. Several standardizing bodies have a hard time making a common model for that right now. Learn about other examples of big reference data and the related complexity in the post Big Reference Data Musings.

Bookmark and Share

6 thoughts on “Five Flavors of Big Data

  1. vstrien 24th September 2013 / 17:17

    Rephrase “web logs” as log files in general. Your cluster log files, available, searchable.

  2. Ed Wrazen 26th September 2013 / 15:18

    Hi Henrik, I also think there are other types of data including video, image (photos, images, scans), sound/recorded media from telephony, media devices, spatial, mapping and I suspect others depending on industry.

  3. Henrik Liliendahl Sørensen 26th September 2013 / 17:47

    Thanks vstrien and Ed for commenting.

    On the Big Data Quality LinkedIn group Kunju Kashalikar also commented.

    “Location data which seems to be classified under sensor data , could be considered at the intersection of social / sensor.”

    All valid points and I wasn’t trying to make a complete classification of big data types but only point to the very different approaches we need when exploiting big data here seen in a data quality perspective.

  4. Mike Ferguson 26th September 2013 / 22:53

    Great blog Henrik….I like the way you are peeling back the ‘onion’ layers to dig into the Big Data problem….I wonder what your thoughts are on Graph Data Quality?

  5. Henrik Liliendahl Sørensen 26th September 2013 / 23:26

    Thanks Mike. Graph database data quality is a great subject for a coming post. Links into social MDM as well.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s