Big Data Quality, Santa Style

Previous years close to Christmas posts on this blog has been about Multi-Domain MDM, Santa Style and Data Governance, Santa Style.

julemandenSo this year it may be the time to have a closer look at big data quality, Santa style, meaning how we can imagine Santa Claus is joining the raise of big data while observing that exploiting data, big or small, is only going to add real value if you believe in data quality. Ho ho ho.

At the Santa Claus organization they have figured out, that there is a close connection between excellence in working with big data and excellence in multi-domain Master Data Management (MDM) and data governance.

Here are some of the findings in the big data paper that the Chief Data Elf just signed off:

  • The feasibility of the new algorithms for naughty or nice marking using social media listening combined with our historical records is heavily dependent on unique, accurate and timely boys and girls master data. The party data governance elf gathering will be accountable for any nasty and noisy issues.
  • Implementation of the automated present buying service based on fuzzy matching between our supplier self-service based multi-lingual product catalogue and the wish list data lake must be done in a phased schedule. The product data governance elf committee are responsible for avoiding any false positives (wrong present incidents) and decreasing the number of false negatives (someone not getting what could be purchaed within the budget).
  • Last year we had and an 12.25 % overspend on reindeers due to incorrect and missing chimney positions. This year the reliance on crowdsourced positions will be better balanced with utilizing open government property data where possible. The location data governance elves will consult with the elves living on the roof at each head of state in order make them release more and better quality of any such data (the Gangnam Project).

Two Ways of Exploiting Big Data with MDM

MDM Wordle
This is not the quadrant, just some vendor names

The Gartner 2015 Magic Quadrant for Master Data Management of Customer Data Solutions is out. One way of getting the report without being a Gartner customer is through this link on the Informatica site.

Successful providers of Mater Data Management (MDM) solutions will sooner or later need to offer ways of connecting MDM with big data.

In the Customer MDM quadrant Gartner, without mentioning if this relates to customer MDM only or multi-Domain MDM in general, mentions two ways of connecting MDM with big data:

  • Capabilities to perform MDM functions directly against copies of big data sources such as social network data copied into a Hadoop environment. Gartner have found that there have been very few successful attempts (from a business value perspective) to implement this use case, mostly as a result of an inability to perform governance on the big datasets in question.
  • Capabilities to link traditionally structured master data against those sources. Gartner have found that this use case is also sparse, but more common and more readily able to prove value. This use case is also gaining some traction with other types of unstructured data, such as content, audio and video.

My take is that these ways applies to the other MDM domains (supplier, product, location, asset …) as well – just as I think Gartner sooner or later will need to make only one MDM quadrant as pondered in the post called The second part of the Multi-Domain MDM Magic Quadrant is out.

Also I think the ability to perform governance on big datasets is key. In fact, in my eyes master data will tend to be more externally generated and maintained, just like big data usually is. This will change our ways of doing information governance as discussed in my previous post on this blog. This post was by the way inspired by the Gartner product MDM person. The post is called MDM and SCM: Inside and outside the corporate walls.

MDM and SCM: Inside and outside the corporate walls

QuadrantIn my journey through the Master Data Management (MDM) landscape, I am currently working from a Supply Chain Management (SCM) perspective. SCM is very exciting as it connects the buy-side and the sell-side of a company. In that connection we will be able to understand some basic features of multi-domain MDM as touched in a recent post about the MDM ancestors called Customer Data Integration (CDI) and Product Information Management (PIM). The post is called CDI, PIM, MDM and Beyond.

MDM and SCM 1.0: Inside the corporate walls

Traditional Supply Chain Management deals with what goes on from when a product is received from a supplier, or vendor if you like, to it ends up at the customer.

In the distribution and retail world, the product physically usually stays the same, but from a data management perspective we struggle with having buying views and selling views on the data.

In the manufacturing world, we sees the products we are going to sell transforming from raw materials over semi-finished products to finished goods. One challenge here is when companies grow through acquisitions, then a given real world product might be seen as a raw material in one plant but a finished good in another plant.

Regardless of the position of our company in the ecosystem, we also have to deal with the buy side of products as machinery, spare parts, supplies and other goods, which stays in the company.

MDM and SCM 2.0: Outside the corporate walls

SCM 2.0 is often used to describe handling the extended supply chain that is a reality for many businesses today due to business process outsourcing and other ways of collaboration within ecosystems of manufacturers, distributors, retailers, end users and service providers.

From a master data management perspective the ways of handling supplier/vendor master data and customer master data here melts into handling business-partner master data or simply party master data.

For product master data there are huge opportunities in sharing most of these master data within the ecosystems. Usually you will do that in the cloud.

In such environments, we have to rethink our approach to data / information governance. This challenge was, with set out in cloud computing, examined by Andrew White of Gartner (the analyst firm) in a blog post called “Thoughts on The Gathering Storm: Information Governance in the Cloud”.

CDI, PIM, MDM and Beyond

The TLAs (Three Letter Acronyms) in the title of this blog post stands for:

  • Customer Data Integration
  • Product Information Management
  • Master Data Management

CDI and PIM are commonly seen as predecessors to MDM. For example, the MDM Institute was originally called the The Customer Data Integration Institute and still have this website: http://www.tcdii.com/.

Today Multi-Domain MDM is about managing customer, or rather party, master data together with product master data and other master data domains as visualized in the post A Master Data Mind Map. Some of the most frequent other master domains are location master data and asset master data, where the latter one was explored in the post Where is the Asset? A less frequent master data domain is The Calendar MDM Domain.

QuadrantYou may argue that PIM (Product Information Management) is not the same as Product MDM. This question was examined in the post PIM, Product MDM and Multi-Domain MDM. In my eyes the benefits of keeping PIM as part of Multi-Domain MDM are bigger than the benefits of separating PIM and MDM. It is about expanding MDM across the sell-side and the buy-side of the business eventually by enabling wide use of customer self-service and supplier self-service.

The external self-service theme will in my eyes be at the centre of where MDM is going in the future. In going down that path there will be consequences for how we see data governance as discussed in the post Data Governance in the Self-Service Age. Another aspect of how MDM is going to be seen from the outside and in is the increased use of third party reference data and the link between big data and MDM as touched in the post Adding 180 Degrees to MDM.

Besides Multi-Domain MDM and the links between MDM and big data a much mentioned future trend in MDM is doing MDM in the cloud. The latter is in my eyes a natural consequence of the external self-service themes and increased use of third party reference data which all together with the general benefits of the SaaS (Software as a Service) and DaaS (Data as a Service) concepts will make MDM morph into something like MDaaS (Master Data as a Service) – an at least nearly ten year old idea by the way, as seen in this BeyeNetwork article by Dan E Linstedt.

Bookmark and Share

To-Be Business Rules and MDM

checklistAn important part of implementing Master Data Management (MDM) is to capture the business rules that exists within the implementing organization and build those rules into the solution. In addition, and maybe even more important, is the quest of crafting new business rules that helps making master data being of more value to the implementing organization.

Examples of such new business rules that may come along with MDM implementations are:

  • In order to open a business account you must supply a valid Legal Entity Identifier (like Company Registration Number, VAT number or whatever applies to the business and geography in question)
  • A delivery address must be verified against an address directory (valid for the geography in question)
  • In order to bring a product into business there is a minimum requirement for completeness of product information.

Creating new business rules to be part of the to-be master data regime highlights the interdependency of people, process and technology. New technology can often be the driver for taking on board such new business rules. Building on the above examples such possibilities may be:

  • The ability to support real time pick and check of external identifiers
  • The ability to support real time auto completion and check of postal addresses
  • The ability to support complex completeness checks of a range of data elements

Bookmark and Share

Data Governance, Santa Style

Multi-Domain MDM, Santa Style, is the title of a post on this blog made a couple of years ago. This post was about how a multi-domain Master Data Management solution could look like in an organization doing business as we think Santa Claus do.

Many organizations around the world who has recently embraced Master Data Management (MDM) has added Data Governance as an imperative parallel or integrated initiative. I guess Santa could have followed that path too.

Below are some thoughts about data governance considerations at the Santa Claus place based on a concept from The Data Governance Institute mentioned in the post Data Governance: Day 2:

Proactive Rules

What does it mean to be naughty or nice? I guess that must be a key question faced by Santa and his team every day. Santa is probably in no better position here than your are in many real world organizations. It is challenging to document key principles that everyone refer to every day as it turns really hard when you try to put them in a common shared business glossary.

RudoplhOngoing Services

Is Santa able to implement data governance based on the roles that the elves and the reindeers have had in daily operations around Christmas or does he have to ask Rudolph to head up a Data Governance Office or even become Chief Data Officer?

Reactive Issue Resolution

What should Santa do when the logistic elves insist on chimney positions in UTM and the reindeers can only use WGS84 coordinates? If Santa’s data governance programme does not solve this one you better watch out when Santa Claus is coming to Town.

Bookmark and Share

The Countryside Data Quality Journey Through 2015

I guess this is the time for blog posts about big things that is going to happen in 2015. But you see, we could also take a route away from the motorways and highways and see how the traditional way of life is still unfolding the data quality landscape.

LostWhile the innovators and early adopters are fighting with big data quality the late majority are still trying get the heads around how to manage small data. And that is a good thing, because you cannot utilize big data without solving small data quality problems not at least around master data as told in the post How important is big data quality?

ShittertonSolving data quality problems is not just about fixing data. It is very much also about fixing the structures around data as explained in a post, featuring the pope, called When Bad Data Quality isn’t Bad Data.

No Mans LandA common roadblock on the way to solving data quality issues is that things that what are everybody’s problem tends to be no ones problem. Implementing a data governance programme is evolving as the answer to that conundrum. As many things in life data governance is about to think big and start small as told in the post Business Glossary to Full-Blown Metadata Management or Vice Versa.

UgleyData governance revolves a lot around peoples roles and there are also some specific roles within data governance. Data owners have been known for a long time, data stewards have been around some time and now we also see Chief Data Officers emerge as examined in the post The Good, the Bad, and the Ugly Data Governance Role.

As experienced recently, somewhere in the countryside, while discussing how to get going with a big and shiny data governance programme there is however indeed still a lot to do with trivial data quality issues as fields being too short to capture the real world as reported in the post Everyday Year 2000 Problems.

Wales

Bookmark and Share

Be Prepared

Working with data governance and data quality can be a very backward looking quest. It often revolves around how to avoid a recent data disaster or catching up with the organizational issues, the process orchestration and new technology implementations needed to support current business objectives with current data types in a better way.

This may be hard enough. But you must also be prepared for the future.

open-doorThe growth of available data to support your business is a challenge today. Your competitors take advantage of new data sources and better exploitation of known data sources while you are sleeping. New competitors emerge with business ideas based on new ways of using data.

The approach to inclusion of new data sources, data entities, data attributes and digital assets must be a part of your data governance framework and data quality capability. If you are not prepared for this, your current data quality will not only be challenged by decay of current data elements but also of not sufficiently governed new data elements or lack of business agility because you can’t include new data sources and elements in a safe way.

Some essentials in being prepared for inclusion of new kinds of data are:

  • A living business glossary that facilitates a shared understanding of new data elements within your organization including how they relate to or replaces current data elements.
  • Configurable data quality measurement facilities, data profiling functionality and data matching tools so on-boarding every new data element doesn’t require a new data quality project.
  • Self-service and automation being the norm for data capture and data consumption. Self-service must be governed both internally in your organization and externally as explained in the post Data Governance in the Self-Service Age.

Bookmark and Share

Data Governance: Day 2

Much of the talking and writing about data governance these days is about how to start a data governance programme. This includes the roadmap, funding, getting stakeholders interested and that kind of stuff. Focussing on how to get a data governance programme off the ground is natural, as this is the struggle right now in many organizations.

But hopefully when, and not if, the data governance programme has left the ground and is a reality, what does the daily life look like then? I think this drawing can be a good illustration:

Daily Data Governance

The drawing is taken from the Data Governance Institute Framework provided by Gwen Thomas.

As a fan of agile approaches within most disciplines including data governance, it is worth remarking that the daily life should not be seen as an end result of a long implementation. It should certainly be seen as the above concept being upgraded over time in more and more mature versions probably starting with a very basic version addressing main pain points within your organization.

When starting a data governance programme there is typically a lot of existing business rules to be documented in a consistent way. That is one thing. Another thing is to establish the process that deals with data aspects of changing business rules and taking on new business rules as touched in the post Two Kinds of Business Rules within Data Governance.

The ongoing service and the issue resolution part is very much relying on some kind of organizational structure. This could include one of my favourites being collaboration fora between data stewards, maybe a data governance office and usually a data governance council of some name. And perhaps having a Chief Data Officer (CDO) as mentioned in post The Good, the Bad, and the Ugly Data Governance Role.

Bookmark and Share

The Matrix

The data governance discipline, the Master Data Management (MDM) discipline and the data quality discipline are closely related and happens to be my fields of work as told in the post Data Governance, Data Quality and MDM.

Every IT enabled discipline has an element of understanding people, orchestrating business processes and using technology. The mix may vary between disciplines. This is also true for the three above-mentioned disciplines.

But how important is people, process and technology within these three disciplines? Are the disciplines very different in that perspective? I think so.

When assigning a value from 1 (less important) to 5 (very important) for Data Governance (DG), Master Data Management (MDM) and Data Quality (DQ) I came to this result:

The Matrix

A few words about the reasoning for the highs and lows:

Data governance is in my experience a lot about understanding people and less about using technology as told in the post Data Governance Tools: The New Snake Oil?

I often see arguments about that data quality is all about people too. But:

  • I think you are really talking about data governance when putting the people argument forward in the quest for achieving adequate data quality.
  • I see little room for having the personal opinion of different people dictating what adequate data quality is. This should really be as objective as possible.

Now I am ready for your relentless criticism.

Bookmark and Share