Back in 2015 Gartner, within a Magic Quadrant for MDM, described two different ways observed in how you may connect big data and master data management as reported in the post Two Ways of Exploiting Big Data with MDM.
In short, the two ways observed were:
- Capabilities to perform MDM functions directly against copies of big data sources such as social network data copied into a Hadoop environment. Gartner then found that there have been very few successful attempts (from a business value perspective) to implement this use case, mostly as a result of an inability to perform governance on the big datasets in question.
- Capabilities to link traditionally structured master data against those sources. Gartner then found that this use case is also sparse, but more common and more readily able to prove value. This use case is also gaining some traction with other types of unstructured data, such as content, audio and video.
In my eyes the ability to perform governance on big datasets is key. In fact, master data will tend to be more externally generated and maintained, just like big data usually is. This will change our ways of doing information governance as for example discussed in the post MDM and SCM: Inside and outside the corporate walls.
Eventually, we will see use cases of intersections of MDM and big data. The one I am working with right now is about how you can improve sharing of product master data (product information) between trading partners. While this quest may be used for analytical purposes, which is the said aim with big data, this service will fundamentally serve operational purposes, which is the predominant aim with master data management.
This big data, or rather data lake, approach is about how we by linking metadata connects different perceptions of product information that exists in cross company supply chains. While everyone being on the same standard at the same time would be optimal, this is quite utopic. Therefore, we must encourage pushing product information (including rich textual content, audio and video) with the provider’s standard and do the “schema-on-read” stuff when each of the receivers pulls the product information for their purposes.
If you want to learn more about how that goes, you can follow Product Data Lake here.