Data Governance, Data Quality and MDM

The data governance discipline, the data quality discipline and the Master Data Management (MDM) discipline are closely related and happens to be my fields of work.

Data quality improvement is important within data governance and MDM. Furthermore you seldom see an MDM implementation without a (master) data governance work stream today.

Information Ven

Over time it has often been suggested that data quality should rightfully be named information quality as told in the post New Blog Name. In addition, data governance could be referred to as information governance as suggested in the Mike2 Open Methodology here.

Within MDM we have the term Product Information Management (PIM) which is partly,  but maybe not fully,  the same as Product MDM,  as examined by Monica McDonnell of Informatica in the post PIM is Not Product MDM – Product MDM is not PIM.

Product is one of several domains within MDM, where customer (or rather party), location and asset are other domains going into multi-domain MDM as reported in the post Multi-Entity MDM vs Multidomain MDM.

While replacing the term data with the term information for data quality, data governance and for that matter (multi-domain) master data management has had limited success outside academic circles, I do see it very suitable for being part of a term covering these three disciplines as a whole.

So what should these three disciplines be called as a whole? Have you noticed any good terms or smart hypes out there? Or are they just three out of more disciplines within data or information management?

Bookmark and Share

American Exceptionalism in Data Management

The term American exceptionalism is born in the political realm but certainly also applies to other areas including data management.

As a lot of software and today cloud services are made in the USA, the rest of world has some struggle with data standards that only or in high degree applies to the United States.

Some of the common ones are:

celcius fahrenheitFahrenheit

In the United States Fahrenheit is the unit of temperature. The rest of the world (with a few exceptions) use Celsius. Fortunately many applications has the ability of switching between those two, but it certainly happens to me once in a while that I uninstall a new exciting app because it only shows temperature in Fahrenheit, and to me 30 degrees is very hot weather.

Month-Day-Year

The Month-Day-Year date format is another American exceptionalism in data management. When dates are kept in databases there is no problem, as databases internally use a counter for a date. But as soon as the date slips into a text format and are used in an international sense, no one can tell if 10/9/2014 is the 10th September as it is seen outside the United States or 9th October as it is seen inside the United States. For example it took LinkedIn years before the service handled the date format accordingly to their international spread, at there are still mix-ups.

State

Having a state as part of a postal address is mandatory in the United States and only shared with a few other countries as Australia and Canada, though the Canadians calls the similar concept a province. The use of a mandatory state field with only US states present is especially funny when registering online for a webinar about an international data quality solution.

Bookmark and Share

Data Quality Dimensions and Real World Alignment

Real world alignment is often seen as a competing measure of data quality opposite to the popular approach of data quality being seen as fitness for purpose of use.

When we try to narrow down what constitutes quality of data we may use data quality dimensions. So, how does data quality dimensions look like in the light of real world alignment? Here is a few thoughts:

  • Uniqueness is probably the data quality dimension that most closely relates to real world alignment as the opposite of uniqueness is duplication which in the data quality world means that two or more different data records describes the same real world entity.
  • Accuracy is best measured as in what degree data describes something in the real world.
  • Credibility was recently proposed as an important data quality dimension by Malcolm Chisholm on Information Management in the article called Data Credibility: A New Dimension of Data Quality? Here credibility is if data is without any malicious manipulation performed to fulfill an evil purpose of use.
Some data quality dimensions
Some data quality dimensions

Bookmark and Share

Winning by Sharing Data

When I changed my laptop a few months ago, it was the easiest migration to a new computer ever.

Basically I just had to connect to all the services in the cloud I had been using before and for many services the path was to get connected to Google+, Twitter and FaceBook and then connect to many other services via these connections.

ShareThis was a personal win.

Most of the teams I am working with are sharing their data with me in the cloud. As in the bad old days I do not have to call and ask for progress on this and that. I can check the status myself and even get notifications on my phablet when a colleague completes a task.

ShareThis is a shared win.

Within my profession being data quality improvement and Master Data Management (MDM) sharing data is going to be a winning path too as told in the post Sharing is the Future of MDM.

There are several ways of sharing master data like using commercial third party data, digging into open government data, having your own data locker and relying on social collaboration. These options are examined in the post Ways of Sharing Master Data.

Bookmark and Share

Omni-purpose Data Quality

QualityA recent post on this blog was called Omni-purpose MDM. Herein it is discussed in what degree MDM solutions should cover all business cases where Master Data Management plays a part.

Master Data Management (MDM) is very much about data quality. A recurring question in the data quality realm is about if data quality should be seen as in what degree data are fit for the purpose of use or if the degree of real world alignment is a better measurement.

The other day Jim Harris published a blog post called Data Quality has a Rotating Frame of Reference. In a comment Jim takes up the example of having a valid address in your database records and how measuring address validity may make no sense for measuring how data quality supports a certain business objective.

My experience is that if you look at each business objective at a time measuring data quality against the purpose of use is sound of course. However, if you have several different business objectives using the same data you will usually discover that aligning with the real world fulfills all the needs. This is explained further within the concept of Data Quality 3.0.

Using the example of a valid address measurements, and actual data quality prevention, typically work with degrees of validity as notably:

  • The validity in different levels as area, entrance and specific unit as examined in the post A Universal Challenge.
  • The validity of related data elements as an address may be valid but the addressee is not as examined in the post Beyond Address Validation.

Data quality needs for a specific business objective also changes over time. As a valid address may be irrelevant for invoicing if either the mail carrier gets it there anyway or we invoice electronically, having a valid address and addressee suddenly becomes fit for the purpose of use if the invoice is not paid and we have to chase the debt.

Bookmark and Share

Data is the new petroleum

oil”Data is the new oil” is a well-known term today used to emphasize on the fact that data and your ability to exploit data can make you rich.

The rise of big data has put some more fire to this burning issue indeed with the variant saying “Big data is the new oil”.

Now, as oil is many things, data is many things too. As few of us actually use crude oil, also called petroleum, few of us don’t use raw data to get rich. We use information distilled from raw data for specific purposes. One example is examined in the post Mashing Up Big Reference Data and Internal Master Data.

This brings me to that we have the question of quality of oil just as we have the question of the quality of data as explained nicely by Ken O’Connor in the post Data is the new oil – what grade is yours?

Bookmark and Share

How Mature are Big Data Maturity Models?

The rise of big data creates a lot of well known side effects. One of them is maturity models.

Here’s a Big Data Maturity Model from 2012. The Data Warehousing Institute has introduced their TDWI Big Data Maturity Model and Assessment Tool. And yesterday over at John Radcliffe’s blog there is an introduction of another Big Data Maturity Model.

Big Data Maturity Model Radcliffe
Click on image to go to John’s blog for more maturity.

The concept of a maturity model is well established since the Capability Maturity Model (CMM) of software development was born and probably we will also see a big data immaturity model one day.

As organizations will be climbing up the steps of the big data maturity models we will learn more about what’s up there and indeed we already know something because the use of big data started long before the use of the term big data.

What do you think: How mature are big data maturity models?

Bookmark and Share

Happy New Year and Merry Christmas

A week ago I had a quick vote here on the blog about when it will be Next Christmas.

Vote on xmasThe results are as seen to the right (or above on a mobile device). Most readers think it will be on 25th December 2013 either written in the straight forward date format as 25/12/2013 or in the awkward date format used in the United States thus being 12/25/2013. Some people, probably from Scandinavia, think it’s today the 24/12/2013. For people living in countries mostly observing the Eastern Orthodox Church Christmas will be on the 7th January, 07/01/2014 in the straight forward date format used there, using the secular Gregorian calendar. This is because the Eastern Church still sticks to the old Julian calendar which is 14 days behind the Gregorian calendar.

So, depending on what you celebrate and in which order:

  • Happy Holidays
  • Merry Christmas and Happy New Year
  • Happy New Year and Merry Christmas

Bookmark and Share

Data Quality, Real World Alignment and Visualization by Maps

Babbling about data quality, real world alignment and maps is a regular topic on this blog and this Saturday is no exception.

This week I stumbled on a discussion in the “Data, Data, Data” community on Google Plus. There was a map:

InternetPopulation2011_HexCartogram_v6_2_LD

The map visualizes how the world would look like if every internet user had an equal amount of space to live on. This turns the land masses on the earth to have a different shape than in reality given:

  • Population density
  • Internet penetration

As internet penetration is the main purpose of the map the penetration percentage for the different countries are highlighted by color in order to be fit for the purpose of use and thus showing highest  penetration in Canada, Northern Europe, Qatar, South Korea and New Zealand.

Some countries seem to have disappeared from the planet as mentioned in the comments on Google Plus: Singapore, Taiwan (officially Republic of China) and North Korea (officially Democratic People’s Republic of Korea). The latter one has probably gone because of no data or no users. Well, probably both reasons.

On a side note it’s a bit peculiar that countries on the map are labeled by the ISO 3 character code and not the 2 character code that more resembles country domains on the internet.

Bookmark and Share