How important is big data quality?

Along with the rise of big data the question about quality of big data and the importance of taking data quality into consideration when analyzing big data is raised again and again.

We had a poll in the LinkedIn Big Data Quality group. The results are as shown below:

Big Data Important

So, some people consider data quality to be more important for big data than for small data (the data we have analyzed until the rise of big data), some people consider data quality to be less important with big data, but the majority of people who voted (included yours truly), consider the quality of big data to be equally important as it has been with small data.

As expressed in some comments voting “the same” is often an aggregate of some things that are more important and other things that are less important.

Also some people have voted “mu”  (wrong question) and in the comments explained that you really can’t compare small data with big data.

A repeated sentiment in the comments is that data quality for small data is going to be more important with the rise of big data as examined in the post Small Data with Big Impact.

Bookmark and Share

4 thoughts on “How important is big data quality?

  1. Dylan Jones 10th April 2013 / 09:37

    I think to truly leverage Big Data you need to have your existing enterprise data in tip-top shape, you can’t really derive any value from Big Data until you’ve connected back to the core data assets that drive your business – customer, contract, services, locations etc. If that data is bad then it dramatically reduces the benefits of Big Data processing.

  2. Henrik Liliendahl Sørensen 10th April 2013 / 11:40

    Thanks for adding in Dylan. Jeff Willingham expresses a similar sentiment on Google+ by writing:

    “The comments section included several valid points, and I may have to join that LinkedIn group. I too feel it was the wrong question. The analogy of the Grand Unified Theory makes sense, as a nexus between the two data types is being researched. More often than not, big data does not afford the luxury of structure … answering a specific set of questions. Small data, on the other hand, is often driven by model-assisted design. Both types require quality, yet the methodologies for “cleansing” the data differ. If anything, small data is useful for filtering big data.”

  3. Richard Ordowich 10th April 2013 / 13:26

    If an organization has not established a behavior of quality, data quality big or small will stall. If an organization has not established quality in its product, customer support, business process and IT domains (failures of IT projects continue at the same rate as 20 years ago), the organization is not mature enough to embark on data quality.

    That said, big data doesn’t change the nature of data quality best practices. What is lacking in most organizations is data literacy. Many people working with data are “data pushers”, moving data from one setting to another. Few of these people understand the fundamentals of data such as semantics, ontology and taxonomy. Fewer still appreciate the fact that data is a symbol not “reality”. Data pushers don’t spend much time considering the quality of the data they use or the data they produce. Because the data is in a computer, its “assumed” it’s correct. Put data into a spreadsheet and it acquires an aura of accuracy. Put data into a graph and it takes on existential qualities.

  4. Jeff Willingham 10th April 2013 / 16:38

    Actually, the best practices do differ. As a person who has worked with both, roughly 80% of my time is spent preparing the data for analysis … definitely not a pusher. More often than not, you need to treat big data like those obtained through observational studies. Whereas small data can be treated with the basic “textbook” approaches used in experimental, scientific studies.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s