My Secret

Yesterday I followed a webinar on DataQualityPro with ECCMA ISO 8000 project leader Peter Benson.

Peter had a lot of good sayings and fortunately Jim Harris as a result of his live tweeting has documented a sample of good quotes here.

My favorite:

“Quality data does NOT guarantee quality information, but quality information is impossible without quality data.”

I have personally conducted an experiment that supports that hypothesis. It goes as this:

First, I found a data file on my computer. Lots of data in there being numbers and letters. And sure, what is interesting is the information I can derive for different purposes.

Then I deleted the data file and tried to see how much information was left behind.

Guess what? Not a bit.

I first published that experiment as a comment to one of Jim’s blog posts: Data Quality and the Cupertino Effect.

As documented in the comments on this blog post the subject of data (quality) versus information (quality) is ever recurring and almost always guarantees a fierce discussion among data/information management professionals.

So, I’ll just tell you this secret: My work in achieving quality information is done by fixing data quality.

And guess what? I have disabled comments on this blog post.

Bookmark and Share

New Blog Name?

As reported by Mark Goloboy here ”Data Quality” is becoming a dirty word. ”Information Quality” is in vogue.

Maybe I will soon have to change the name of my blog?

Also one may expect other related terms will be changed, like:

  • Data Governance becomes Information Governance
  • Master Data Management becomes Master Information Management
  • Data Matching becomes Information Matching
  • Data Warehouse becomes Information Warehouse
  • Database becomes Informationbase
  • Information Technology becomes Data Technology

But changing the name of a blog is a serious thing you shouldn’t do too often. I think I will wait and see if the term renaming stops at simply replacing data and information. Some guesses for further renaming:

Information Fitness replaces Data Quality as Data quality is often defined as “fit for intended purpose of use” and by replacing data with information that trail is even more clear – opposed to the other trail being real world alignment.

Information Political Correctness replaces Data Governance as Data Governance is a lot about policies and the Data Governance practice is a lot about maneuvering in the corporate political landscape.    

Master Information Technology (MIT) replaces Master Data Management (MDM)

Bookmark and Share

The Many Worlds of Data Quality

This morning I had some fun reading the articles on Wikipedia explaining about Data Quality.

I tried to compare the texts available in:

I am afraid that the quality of texts and some differences between how the subject is presented in the different languages shows the immaturity of the data quality discipline and not at least the lack of global embracement that is seen in literature, published articles and the technology available.

Three observations from the Wikipedia articles:

The French piece is in some parts a translation from the English text. However the translation became very difficult in the History section, as the English text here has the well known narrowly United States scope.

The German text is completely different from the English text. Also the title is Information Quality. The references are largely from German authors.

The Japanese text seems to be a Google Translate of the (former) English text. This is strange as much of the quality inspiration originally came from Japan.

Bookmark and Share

Bon Appetit

If I enjoy a restaurant meal it is basically unimportant to me what raw ingredients from where were used and which tools the chef used during preparing the meal. My concerns are whether the taste meet my expectations, the plate looks delicious in my eyes, the waiter seems nice and so on.

This is comparable to when we talk about information quality. The raw data quality and the tools available for exposing the data as tasty information in a given context is basically not important to the information consumer.

But in the daily work you and I may be the information chef. In that position we have to be very much concerned about the raw data quality and the tools available for what may be similar to rinsing, slicing, mixing and boiling food.

Let’s look at some analogies.

Best before

Fresh raw ingredients is similar to actualized raw data. Raw data also has a best before date depending on the nature of the data. Raw data older than that date may be spiced up but will eventually make bad tasting information.

One-stop-shopping

Buying all your raw ingredients and tools for preparing food – or taking the shortcut with ready made cookie cutting stuff – from a huge supermarket is fast and easy (and then never mind the basket usually also is filled with a lot of other products not on the shopping list).

A good chef always selects the raw ingredients from the best specialized suppliers and uses what he consider the most professional tools in the preparing process.

Making information from raw data has the same options.

Compliance

Governments around the world has for long time implemented regulations and inspection regarding food mainly focused at receiving, handling and storing raw ingredients.

The same is now going on regarding data. Regulations and inspections will naturally be directed at data as it is originated, stored and handled.

Diversity

Have you ever tried to prepare your favorite national meal in a foreign country?

Many times this is not straightforward. Some raw ingredients are simply not available and even some tools may not be among the kitchen equipment.

When making information from raw data under varying international conditions you often face the same kind of challenges.

Mu

muThe term ”Mu” has several meanings including being a lost continent. In this post I will use the meaning of “mu” being the answer to a question that can’t be answered with a simple “yes” or “no” or even “unknown” as explained on Wikipedia here.

When working with data quality you often encounter situations where the answer to a simple question must be “mu”.

Let’s say you are looking for duplicates in a customer file and have these two rows (Name, Address, City):

Margaret Smith, 1 Main Street, Anytown
Margaret & John Smith, 1 Main Street, Anytown

Is this a duplicate situation?

In a given context like preparing for a direct mail the answer could be “yes”. But in most other contexts the answer is “mu”. Here the question should be something like: How do you handle hierarchy management with these two rows? And the answer could be something like the process presented in my recent post here.

Similar considerations apply to this example (Name, Address, City):

One Truth Consultants att: John Smith, 3 Main Street, Anytown
One Truth Consultants Ltd, 3 Main Street, Anytown

And this (Contact, Company, Address, City):

John Smith, One Truth Consultants, 3 Main Street, Anytown
John Smith, One Truth Services, 3 Main Street, Anytown

The latter example is explained in more details in this post.

Bookmark and Share

Alignment of business and IT

teamworkBeing a Data Quality professional may be achieved by coming from the business side or the technology side of practice. But more important in my eyes is the question whether you have made serious attempts and succeeded in understanding the side from where you didn’t start.

Many blog posts made around the data quality conundrum discusses the role of the business side versus the role of the technology side and various weights in different contexts are given to these sides. It should not be surprising for a Data Quality professional that there is no absolute true or absolute false simple answer to such a question. Fortunately I find most discussions, when they are taken, ends up with the “peace on earth” sentiment:

  • Of course it’s the business requirements striving for business value that governs any initiative using technology in order to improve business performance
  • Of course the emerge (or discovery) of new technology may change the way you arrange business processes in order to gain on competitive business performance

From that point of view I am looking forward to continued discussions over all the important issues around data and information quality improvement and prevention as, but not limited to:

  • What is the business value of better information quality
  • How to gather business requirement related to information quality in order to make data fit for purpose(s)
  • Who is needed to accomplish the data quality improvement tasks – probably people from business, IT and all those mixed ones (credit: Jim Harris of OCDQblog)
  • When is the data quality technology so mature that it will cope with issues in a way not seen before
  • Which different kinds of methodologies and techniques are best for different sort of data quality challenges
  • Where on earth is the answers to all these questions

Bookmark and Share