Marco Polo and Data Provenance

10th August 2011

Besides being a data geek I am also interested in pre-modern history. So it’s always nice when I’m able to combine data management and history.

A recurring subject in historian circles is a suspicion saying that Explorer Marco Polo never actually went to China.

As said in the linked article from The Telegraph: “It is more likely that the Venetian merchant adventurer picked up second-hand stories of China, Japan and the Mongol Empire from Persian merchants whom he met on the shores of the Black Sea – thousands of miles short of the Orient”.

When dealing with data and ramping up data quality a frequent challenge is that some data wasn’t captured by the data consumer – not even by the organization using the data. Some of the data stored in company databases are second-hand data and in some cases the overwhelming part of data is captured outside the organization.

As with the book telling about Marco Polo’s (alleged) travels called “Description of the World” this doesn’t mean that you can’t trust anything. But maybe some data are mixed up a bit and maybe some obvious data are missing.

I have earlier touched this subject in the post Outside Your Jurisdiction and identified second-hand data as one of the Top 5 Reasons for Downstream Cleansing.

Bookmark and Share


Proactive Data Governance at Work

28th July 2011

Data governance is 80 % about people and processes and 20 % (if not less) about technology is a common statement in the data management realm.

This blog post is about the 20 % (or less) technology part of data governance.

The term proactive data governance is often used to describe if a given technology platform is able to support data governance in a good way.

So, what is proactive data governance technology?

Obviously it must be the opposite of reactive data governance technology which must be something about discovering completeness issues like in data profiling and fixing uniqueness issues like in data matching.

Proactive data governance technology must be implemented in data entry and other data capture functionality. The purpose of the technology is to assist people responsible for data capture in getting the data quality right from the start.

If we look at master data management (MDM) platforms we have two possible ways of getting data into the master data hub:

  • Data entry directly in the master data hub
  • Data integration by data feed from other systems as CRM, SCM and ERP solutions and from external partners

In the first case the proactive data governance technology is a part of the MDM platform often implemented as workflows with assistance, checks, controls and permission management. We see this most often related to product information management (PIM) and in business-to-business (B2B) customer master data management. Here the insertion of a master data entity like a product, a supplier or B2B customer involves many different employees each with responsibilities for a set of attributes.  

The second case is most often seen in customer data integration (CDI) involving business-to-consumer (B2C) records, but certainly also applies to enriching product master data, supplier master data and B2B customer master data. Here the proactive data governance technology is implemented in the data import functionality or even in the systems of entry best done as Service Oriented Architecture (SOA) components that are hooked into the master data hub as well.

It is a matter of taste if we call such technology proactive data governance support or upstream data quality prevention, and from what I have seen so far, it does work.

Bookmark and Share


Howcatchem

25th June 2011

I’m sad to learn that Peter Falk has died. Peter Falk is most known as Lieutenant Columbo in the American television series Columbo, which has been shown all over the world including in my country when I was a teenager.

I think Columbo would have been a great data quality specialist too. Underestimated by the smart guys but focused on the important details while exercising the art of “howcatchem”. Opposite to his “whodunit” colleagues who are sitting on a horseback chasing the villains down a crowded city street or doing the same in a car while smashing most other cars on the way.

Bookmark and Share


Book Review: Berson and Dubov on MDM

10th March 2011

A few days ago Julian Schwarzenbach over at the Data and Process Advantage Blog published a review of the book “Master Data Management and Data Governance” by Alex Berson and Larry Dubov. Link to Julian’s review here.

And hey, that’s the book I have been reading too during the last months. So why not make my review too.    

I agree very much with Julian’s positive review of the book. It is a very comprehensive book – and thick and heavy I have learned from bringing it with me on travel which is where I usually read offline stuff. But master data management and related data governance is a big and heavy discipline with a lot of details that has to be dealt with.

Probably I have annoyed fellow travellers in trains and airplanes while reading the book with exclamations as: Yes, precisely, that’s what I always have said, good point and so on. Because I agree very much with many of the issues described and the solutions discussed in the book.

For the mandatory bit of criticism that must be included in every book review I will bring on my pet bashing about United States and English language centricity. Well, it’s actually not that bad, as the book at many places does indicate that other angles and pains exist than those being prominent in the United States and with the English language.

Oh, and I bear with that  my surname in the references are spelled “Sorensen” instead of “Sørensen” and that a related date are formatted like “11/22/2009” which will be the 11th day in the 22nd month of the year 2009 to me.     

Bookmark and Share


Pick Any Two

22nd February 2011

The project triangle expresses the dilemma about that you probably want your project to be good, fast and cheap, but in practice you are only able to prioritize two of these three desirable options, in short:

Good, fast, cheap – pick any two

The pick any two among three theme can be related to a lot of other activities thus stating three terms with only two combinations possible in real life.

So what could be the pick any two among three themes for data quality?

Of course the good, fast, cheap dilemma also goes for data quality projects. But as data quality management isn’t just a project but an ongoing program, what else?

I have one suggestion:

Fit for purpose, real world alignment, fix it as we go – pick any two

The term “fit for purpose” has become more or less synonymous with “high quality data” and thus here chosen to express the good angle of data quality.

Some data, especially those we call master data, is used for multiple purposes within an organization. Therefore some kind of real world alignment is often used as a fast track to improving data quality where you don’t spend time analyzing how data may fit multiple purposes at the same time in your organization. Real world alignment also may fulfill future requirements regardless of the current purposes of use.

Managing data both being fit for multiple purpose and aligned with the real world is not something you just do in a cheap way by fixing it as we go. You may pick any two options in these combinations:

  • Make some data fit for purpose by fixing it as the pains shows up.
  • Align data with the real world typically by exploiting external reference data as the prices go down.
  • Lay out a thorough plan for having fit for multiple-purpose data aligned with the real world.

Bookmark and Share


Miracle Food for Thought

17th February 2011

We all know the headlines in the media about food and drink and your health. One day something is healthy, the next day it will kill you. You are struck with horror when you learn that even a single drop of alcohol will harm your body until you are relieved by the wise words saying that a glass (or two) of red wine a day keeps the doctor away.

These misleading, exaggerated and contradictory headlines are now documented in a report called Miracle Food, Myth and the Media.

It’s the same with data quality, isn’t it?

Sometimes some data are fit for purpose. At another time at another place the very same data are rubbish.

As said as an excerpt from the Miracle Food report:

“The facts about the latest dietary discoveries are rarely as simple as the headlines imply. Accurately testing how any one element of our diet may affect our health is fiendishly difficult. And this means scientists’ conclusions, and media reports of them, should routinely be taken with a pinch of salt.”

It’s about the same with data quality, isn’t it?

Accurately testing how any one element of our data may affect our business is fiendishly difficult. So predictions of return of investment (ROI) from data quality improvement are unfortunately routinely taken with a big spoon of salt.

Bon appétit.

Bookmark and Share


Where is the Business?

12th February 2011

In technology enabled disciplines we often like to divide an organization into two distinct parts being IT (Information Technology) and “the business”.

I am aware that we do that to emphasize that our solutions has to be business centric opposite to technology centric. We mustn’t fall into the trap of discussing technology too early and certainly not selecting certain technology brands as the first step of our solutions.

A problem however is where to find “the business” in an organization. The top management surely represents all of the business (including the IT part of the business). But in order to find the so called subject matter experts we are looking down the levels in the organization where people don’t belong to “the business” but to sales, marketing, customer service, purchase, production, human resources, finance and so on.

Some technology enabled disciplines belong to a certain department. But disciplines as (enterprise wide) data quality and master data management are supposed to support most departments. The business. So where do we find the business? And who are we by the way?

Call them?

Assuming it doesn’t matter who we are: Let’s go find “the business”. I guess it doesn’t help calling the reception and ask them to put us through to “the business”. Actually the manned reception probably doesn’t exist today. And it will be surprising to get a machine asking:

  • Do you want to speak with IT? Press 1.
  • Do you want to speak with “the business”? Press 2.

If we are in my home country Denmark we also have a linguistic issue. If I ask google to translate “the business” from English to Danish I get the word “forretningen”. If I ask google to translate “forretningen” from Danish back to English I get the word “shop”. So calling “forretningen” will probably get me to the shop floor. Not a bad place, a true gemba, but maybe not the only one.

Everyone belongs to “the business”

In data quality and master data management there is a question used all over to exemplify a common challenge within these disciplines.

The question is: What is a customer?

The challenge is that people from different departments will have different definitions. Marketing defines a customer one way, sales tend to do it a bit different, finance sees it yet in another way and production has their view point. And the stereotype IT guy defines a customer as a row in the customer table.

So now we are asking for Alexander the Great from “the business” to come cutting the Gordian Knot.

That is probably not going to happen.

More likely someone from any business unit will be able to negotiate a proper conceptual solution covering all requirements from the different business units. And from what I see around it may often be someone who’s human resource master data record is related to the IT part of the business. Or was. The main point is having a holistic view of the business where everyone belongs.    

Bookmark and Share


Things Change

16th January 2011

Yesterday I posted a small piece called So I’m not a Capricorn? about how astrology may (also) be completely wrong because something has changed.

On the serious side: Don’t expect that because you get it Right the First Time then everything will be just fine from this day forward. Things change.

The most known example in data quality prevention is probably that it is of course important that when you enter the address belonging to a customer, you get it right. But as people (and companies) relocates you must also have procedures in place tracking those movements by establishing an Ongoing Data Maintenance program in order to ensure the timeliness of your data.

The other thing, so to speak, is that having things right (the first time) is always seen in the context of what was right at that time. Maybe you always asked your customers for a physical postal address, but because your way of doing business has changed, you actually become much more interested in having the eMail address. And, because What’s in an eMail Address, you would actually like to have had all of them. So your completeness went from being just fine to being just awful by following the same procedure as last year.

Predicting accuracy is hard. Expect to deal with Unpredictable Inaccuracy.       

Bookmark and Share


So I’m not a Capricorn?

15th January 2011

Yesterday was my birthday. Being born the 14th January makes me a Capricorn according to astrology.

Only there is a slight problem. As told in an article on Huffingtonpost an astronomer has kindly remarked that the assignment of signs with the calendar was made thousands of years ago. In the mean time the earth’s orbit has changed, so we should have completely new signs (and personalities?) today.     

I guess astrology qualifies as a data and information quality trainwreck by forgetting one of the most common pitfalls in data quality: Things change.  

Bookmark and Share


Technology and Maturity

4th January 2011

A recurring subject for me and many others is talking and writing about people, processes and technology including which one is most important, in what sequence they must be addressed and, which is my main concern, how they must be aligned.

As we practically always are referring to the three elements in the same order being people, processes and technology there is certainly an implicit sequence.

If we look at maturity models related to data quality we will recognize that order too.

In the low maturity levels people are the most important aspect and the subject that needs the first and most attention and people are the main enablers for starting moving up in levels.

Then in the middle levels processes are the main concerns as business process reengineering enables going up the levels.

At the top levels we see implemented technology as a main component in the description of being there.    

An example of the growing role of technology is (not surprisingly of course) in the data governance maturity model from the data quality tool vendor DataFlux.

One thing is sure though: You can’t move your organization from the low level to the high level by buying a lot of technology.

It is an evolutionary journey where the technology part comes naturally step by step by taking over more and more of the either trivial or extremely complex work done by people and where technology becomes an increasingly integrated and automated part of the business processes.

Bookmark and Share


Follow

Get every new post delivered to your Inbox.

Join 125 other followers