Omni-purpose Data Quality

QualityA recent post on this blog was called Omni-purpose MDM. Herein it is discussed in what degree MDM solutions should cover all business cases where Master Data Management plays a part.

Master Data Management (MDM) is very much about data quality. A recurring question in the data quality realm is about if data quality should be seen as in what degree data are fit for the purpose of use or if the degree of real world alignment is a better measurement.

The other day Jim Harris published a blog post called Data Quality has a Rotating Frame of Reference. In a comment Jim takes up the example of having a valid address in your database records and how measuring address validity may make no sense for measuring how data quality supports a certain business objective.

My experience is that if you look at each business objective at a time measuring data quality against the purpose of use is sound of course. However, if you have several different business objectives using the same data you will usually discover that aligning with the real world fulfills all the needs. This is explained further within the concept of Data Quality 3.0.

Using the example of a valid address measurements, and actual data quality prevention, typically work with degrees of validity as notably:

  • The validity in different levels as area, entrance and specific unit as examined in the post A Universal Challenge.
  • The validity of related data elements as an address may be valid but the addressee is not as examined in the post Beyond Address Validation.

Data quality needs for a specific business objective also changes over time. As a valid address may be irrelevant for invoicing if either the mail carrier gets it there anyway or we invoice electronically, having a valid address and addressee suddenly becomes fit for the purpose of use if the invoice is not paid and we have to chase the debt.

Bookmark and Share

How can you have any pudding….

The social media sphere these days has a lot of good stuff around Data Quality and Big Data including this piece from Jim Harris called Big Data is Just Another Brick in the Wall.

the wallIn here Jim ponders how working with Big Data must be build on a lot of other disciplines including Data Quality and the title of the blog post is nicely composed from the title of the fantastic Pink Floyd song called Another Brick in the Wall.

In this song there is an unpleasant voice of an angry stupid old teacher yelling:

“If you don’t eat yer meat, you can’t have any pudding. How can you have any pudding if you don’t eat yer meat?”

I’m afraid I also have to raise an equally unpleasant voice of saying:

“If you don’t eat yer data quality, you can’t have any big data. How can you have any big data if you don’t eat yer data quality?”

And by the way: How can you work with big data if you don’t join the LinkedIn group called Big Data Quality?

Bookmark and Share

Beware of False Positives in Data Matching

In a recent blog post by Kristen Gregerson of Satori Software you may learn A Terrible Tale where the identity of two different real world individuals were merged into one golden record with the most horrible result you may imagine associated with a recent special day related to the results of the other kind of matching going around.

datamatching
Join the Data Matching Group on LinkedIn

As reported by Jim Harris some years ago in the post The Very True Fear of False Positives the bad things happening from false positives in data matching is indeed a hindrance for doing data matching

If we do data matching we should be aware that false positives will happen and we should know the probability of that it happens and we should know how to avoid the resulting heartache.

Indeed using a data matching tool is better than relying on simple database indexes and indeed there are differences in how good various data matching tools are at doing the job, not at least doing it under different circumstances as told in the post What is a best-in-class match engine?

Curious about how data matching tools work (differently)? There is an eLearning course available co-authored by yours truly. The course is called Data Parsing, Matching and De-duplication.

Bookmark and Share

Hierarchical Single Source of Truth

Most data quality and master data management gurus, experts and practitioners agree that achieving a “single source of truth” is a nice term, but is not what data quality and master data management is really about as expressed by Michele Goetz in the post Master Data Management Does Not Equal The Single Source Of Truth.

Even among those people, including me, who thinks emphasis on real world alignment could help getting better data and information quality opposite to focusing on fitness for multiple different purposes of use, there is acknowledgement around that there is a “digital distance” between real world aligned data and the real world as explained by Jim Harris in the post Plato’s Data. Also, different public available reference data sources that should reflect the real world for the same entity are often in disagreement.

When working with improvement of data quality in party master data, which is the most frequent and common master data domain with issues, you encounter the same issues over and over again, like:

  • Many organizations have a considerable overlap of real world entities who is a customer and a supplier at the same time. Expanding to other party roles this intersection is even bigger. This calls for a 360° Business Partner View.
  • Most organizations divide activities into business-to-business (B2B) and business-to-consumer (B2C). But the great majority of business’s are small companies where business and private is a mixed case as told in the post So, how about SOHO homes.
  • When doing B2C including membership administration in non-profit you often have a mix of single individuals and households in your core customer database as reported in the post Household Householding.
  • As examined in the post Happy Uniqueness there is a lot of good fit for purpose of use reasons why customer and other party master data entities are deliberately duplicated within different applications.
  • Lately doing social master data management (Social MDM) has emerged as the new leg in mastering data within multi-channel business. Embracing a wealth of digital identities will become yet a challenge in getting a single customer view and reaching for the impossible and not always desirable single source of truth.

A way of getting some kind of structure into this possible, and actually very common, mess is to strive for a hierarchical single source of truth where the concept of a golden record is implemented as a model with golden relations between real world aligned external reference data and internal fit for purpose of use master data.

Right now I’m having an exciting time doing just that as described in the post Doing MDM in the Cloud.

Bookmark and Share

MDM Summit Europe 2012 Preview

I am looking forward to be at the Master Data Management Summit Europe 2012 next week in London. The conference runs in parallel with the Data Governance Conference Europe 2012.

Data Governance

As I am living within a short walking distance of the venue I won’t have so much time thinking as Jill Dyché had when she recently was on a conference within driving distance, as reported on her blog post After Gartner MDM in which Jill considers MDM and takes the road less traveled. In London Jill will be delivering a key note called: Data Governance, What Your CEO Needs to know.

On the Data Governance tracks there will be a panel discussion called Data Governance in a Regulatory Environment with some good folks: Nicola Askham, Dylan Jones, Ken O’Connor and Gwen Thomas.

Nicola is currently writing an excellent blog post series on the Six Characteristics Of A Successful Data Governance Practitioner. Dylan is the founder of DataQualityPro. Ken was the star on the OCDQblog radio show today discussing Solvency II and Data Quality.

Gwen, being the founder of The Data Governance Institute, is chairing the Data Governance Conference while Aaron Zornes, the founder of The MDM Institute, is chairing the MDM Summit.

Master Data, Social MDM and Reference Data Management

The MDM Institute lately had an “MDM Alert”  with Master Data Management & Data Governance Strategic Planning Assumptions for 2012-13 with the subtitle: Pervasive & Pandemic MDM is in Your Future.

Some of the predictions are about reference data and Social MDM.

Social master data management has been a favorite subject of mine the last couple of years, and I hope to catch up with fellow MDM practitioners and learning how far this has come outside my circles.

Reference Data is a term often used either instead of Master Data or as related to Master Data. Reference data is those data defined and initially maintained outside a single enterprise. Examples from the customer master data realm are a country list, a list of states in a given country or postal code tables for countries around the world.

The trend as I see it is that enterprises seek to benefit from having reference data in more depth than those often modest populated lists mentioned above. In the customer master data realm such big reference data may be core data about:

  • Addresses being every single valid address typically within a given country.
  • Business entities being every single business entity occupying an address in a given country.
  • Consumers (or Citizens) being every single person living on an address in a given country.

There is often no single source of truth for such data.

As I’m working with an international launch of a product called instant Data Quality (iDQ™) I look forward to explore how MDM analysts and practitioners are seeing this field developing.

Bookmark and Share

Bat-and-ball Data Quality

Lately Jim Harris of the OCDQblog has written two excellent blog posts, or may I say home runs, discussing data quality with inspiration from baseball.

In the post Quality Starts and Data Quality Jim talks about that you may have a tough loss in business despite stellar data quality and have a cheap win in business despite of horrible data quality, but in the long run by starting off with good data quality, your organization have a better chance to succeed.

The follow up post called Pitching Perfect Data Quality Jim ponders that business success is achievable without perfect data quality, but data quality has a role to play.

Now, despite that baseball is a very popular sport in the United States, but largely unknown in the rest of world, I think we all understand the metaphors.

Also we have different but similar sports, with other rules, statistics and terms attached, over the world. The common name for these sports is bat-and-ball games.

In Britain, where I live now, cricket is huge and can be used to attract awareness of data issues. As late as yesterday the Ordnance Survey, a government body that have registries with addresses, coordinates and maps, made a blog post called Anyone for cricket? British blogger Peter Thomas also wrote among others a post on cricket and data quality called Wager.

Before coming to Britain I lived in Denmark, where we don’t know baseball, don’t know cricket but sometimes at family picnics, perhaps after a Carlsberg and a snaps or two, plays a similar game called rundbold, with kids and grandpa friendly rules and score board and usually using a tennis ball.

Data quality, not at least data quality in relation to party master data, which is the most prominent domain within the discipline, is also a same same but different game around the world as told in the post Partnerships for the Cloud.

Understanding the rules, statistics and terms of baseball, cricket, rundbold and all the other bat-and-ball games of the world is a daunting task, even though we all know how to hit a ball with a bat.

Bookmark and Share

Yin and Yang Data Quality

The old Chinese concept of yin and yang, or simply yīnyáng, is used to describe how polar opposites or seemingly contrary forces are interconnected and interdependent in the natural world. The concept is probably best known materialized as sweet and sour sauce.

Lately we had a debate in the data quality community on social media about if data quality is a journey or a destination, nicely summarized by Jim Harris in the post Quo Vadimus. I guess the prevailing sentiment is that it is kind of both a journey and a destination.

We also have the good old question about if data are of high quality if they are “fit for the purpose of use” or “aligned with the real world”. Sometimes these benchmarks go in opposite directions and we like to fulfill both goals at the same time.

The Data Quality discipline is tormented by belonging to both the business side and the technology side of practice. These sides are often regarded as contrary, but in my experience we get the best sauce by having both sides represented.

And oh yes, do we actually have to call it one of two diametrically different terms being Data Quality or Information Quality. Bon appetit.

Bookmark and Share