The Intersections of Big Data, Data Quality and Master Data Management

This blog has since 2009 been very much about the intersection between Master Data Management (MDM) and data quality. These two disciplines are closely related as the vast majority of work with data quality improvement going on is related to master data taking some slightly different forms depending on if we are fighting with party master data, product master data, location master data or other master data domains.

Big Data Quality MDMIn mid 2011 the term big data became more popular than data quality as reported in post Data Quality vs Big Data. After initial euphoria about big data and focus on the analytical side of big data the question about big data quality has fortunately gained traction. Apart from the quality of the algorithms used in big data analytics the quality of the big data is definitely a factor to be taken very serious when deciding to act on the outcomes of big data analytics.

There are questions about the quality of the big data itself as for example told in the post Crap, Damned Crap, and Big Data. This story is about social data and how crappy these data streams may be. Another prominent flavor of big data is sensor data where there also may be issues of data quality as in the example mentioned in the post Going in the Wrong Direction.

As examined in the latter example the quality of big data will in many cases have to be measured by how well big data relates to internal master data and external reference data. You may find more examples of that in the post Big Data and Multi-Domain Master Data Management.

Bookmark and Share

3 thoughts on “The Intersections of Big Data, Data Quality and Master Data Management

  1. Dennis Moore 15th March 2014 / 22:48

    Henrik –

    I hope all is well – sorry we missed each other at the Gartner MDM conference.

    Over the past weeks and months I spoke to quite a large number of customers using or planning projects where MDM enhances Big Data efforts. Data quality tools and techniques have a role to play in transactional data (yes, sometimes sensors are broken), but master data management tools and techniques are quite crucial. How can your data scientist understand a customer’s purchasing behaviors or responses to offers if the customer has different ID’s in different systems (e.g., web store and physical store)? MDM connects the various streams of related data so that data science tools (including machine learning algorithms) can find the true patterns in the data.

    Of course, we also have customers using Hadoop for speeding up processing of extremely large batches of data.

    All the best …

    – Dennis
    Informatica

  2. Gauthier Vasseur 19th March 2014 / 23:36

    Greetings Henrik,

    I will definitely quote this post in my next class at Stanford. Last quarter, my class speakers and I pushed in this same direction and a few students felt disappointed that (Big) Data was not pristine and usable from the start. From my class speakers at Cloudera, Google, PredPol, Shazam, HortonWorks, DiamondStream or Oracle Team USA, a (Big) data project is usually 70%-80% of data wrangling. And without proper solution to sustain constant cleansing, enriching, matching, consolidating and governance, every time there is a new batch, it is another batch of manual efforts to make data usable.

  3. Henrik Liliendahl Sørensen 20th March 2014 / 07:39

    Thanks Dennis and Gauthier for adding in.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s