Small Data with Big Impact

In an ongoing discussion on LinkedIn there are some good points on: How important is data quality for big data compared to data quality for small data?

A repeated sentiment in the comments is that data quality for small data is going to be more important with the rise of big data.

The small data we are talking about here is first and foremost master data.

Master Data Challenges with Big Data

As with traditional transaction data master data is also describing the who, what, where and when of big data.

If we are having issues with completeness, timeliness and uniqueness in our master data any prediction based on big data matched with master data is going to be as chaotic as weather forecasts.

big small dataWe also need to expand the range of entities embraced by our master data management implementations as exemplified in the post Social MDM and Future Competitive Intelligence.

Matching Big Data with Master Data

Some of the issues in matching big data with master data I have stumbled upon are:

  • Who: How do we link the real world entities reflected in our traditional systems of record with the real world entities behind who’s talking in systems of engagement? This question was touched in post Making Sense with Social MDM.
  • What: How do we manage our product hierarchies and product descriptions so they fulfill both (different) internal purposes and external usage? More on this in the post Social PIM.
  • Where: How do we identify a given place? If you think this is easy, why not read the post Where is the Spot?
  • When: Date and time comes in many formats and relating events to the wrong schedule may have us  Going in the Wrong Direction.

How: You may for example follow this blog. Subscription is in the upper right corner 🙂

Bookmark and Share

One thought on “Small Data with Big Impact

  1. Richard Ordowich 28th March 2013 / 13:36

    I was giving this some additional thought to “big data” and reading some of the marketing materials (blogs, white papers etc.) as well as reading some of the more academically inclined research papers and magazines such as the recent issue of the IEEE publication dedicated to the topic, The Internet of Things.

    The marketing hype has described big data as “Volume, Velocity, Variety, and Veracity”. What’s missing in this description? Validation and Verification. As we create more data, there is more noise, politely referred to as big data. Volume, Velocity, Variety, and Veracity fulfill the age old cliché “garbage in, garbage out”. I suggest we add to this description Validation, Verification and Certification (VV & C). This concept is not new. It has been around for decades.

    Considering the more recent major contributors of data such as social media and sensor data we must also consider numerous other aspects of data quality that are not part of our current tool kit. Consider the sources of social media data for example. Humans! Inexact, frequently irrational, short memories, subjective creatures. What is the quality of this “source data”?

    Social media input are not controlled transactions subject to principles such as The Perfect Order (supply chain) or other quality control processes. Social media are the random thoughts and hallucinations of people in space, time and context. Look at the quality of eye witness reports of crimes. Not very reliable. That’s social media. These are the same data sources social media is using as data. Imagine yourself defining metadata for social media. It’s a significant challenge defining metadata to business data and that is more or less structured.

    Next consider sensor data. What are the quality dimensions of sensor data? Is quality specified in the design of the sensor? Are they specified in the context within which the sensors maybe used? The crash of the Air France jet where both airspeed sensors (pitot tubes) failed, provide a wake up example of sensor data failing in context.

    Before considering big data we have to consider how data is changing and how those aspects will affect our systems, jobs and perceptions about data. We didn’t consider these aspects when we launched into our current venture of data quality and so the tools and techniques we have are limited. Time to think “outside” the system box. It’s a social network of data not system data. We have to factor in the human factors into data quality. Let’s profile humans not data (not in the sense of racial or cultural profiling)!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s