Data Quality Tools: The Cygnets in Information Quality

Since engaging in the social media community around data and information quality I have noticed quite a lot of mobbing going on pointed at data quality tools. The sentiment seems to be that data quality tools are no good and will play only a very little role, if any, in solving the data and information quality conundrum.

I like to think of data quality tools as being like the cygnet (the young swan) in the fairy tale “The Ugly Duckling” by Hans Christian Andersen. An immature clumsy flapper in the barnyard. And sure, until now tools have generally not been ready to fly, but been mostly situated in the downstream corner of the landscape.

Since last September I have been involved in making a new data quality tool. The tool is based on the principles described in the post Data Quality from the Cloud.

We have now seen the first test flights in the real world and I am absolutely thrilled about the testimonial sayings. Examples:

  • “It (the tool) is lean”.  I like that since lean is a production practice that considers the expenditure of resources for any goal other than the creation of value for the end customer to be wasteful.
  • “It is gold”. I like to consider that as a calculated positive business case.
  • “It is the best thing happened in my period of employment”. I think happy people are essential to data quality.

Paraphrasing Andersen: I never dreamed there could be so much happiness, when I was working with ugly ducklings.

Bookmark and Share

The Ugly Duckling

The title of the fairy tale “The Ugly Duckling” by Hans Christian Andersen was originally supposed to be the more positive “The Young Swan” (or “The Cygnet”) , but as Andersen did not want to spoil the element of surprise in the protagonist’s transformation, he discarded it for “The Ugly Duckling”.

In a blog post called “Why Isn’t Our Data Quality Worse?” posted today (or last night local Iowa time) Jim Harris examines the psychology term “negativity bias” that explains how bad evokes a stronger reaction than good in the human mind.

Surely, data quality improvement evangelism is most often based on the strong force of badness. Always describing how bad data is everywhere. Bashing executives who don’t get it. Only as a nice positive surprise in the end we tell how our product/consultancy will transform the ugly duckling into a beautiful swan.    

My latest blog post with a truly positive angle called “What a Lovely Day” is almost 2 months old. So I promise myself the next post will have the title “The Young Swan” (or “The Cygnet”) and will be extremely positive about data quality improvement.

Bookmark and Share

What a Lovely Day

As promised earlier today, here is the first post in an endless row of positive posts about success in data quality improvement.

This beautiful morning I finished yet another of these nice recurring jobs I do from time to time: Deduplicating bunches of files ready for direct marketing making sure that only one, the whole one and nothing but one unique message reaches a given individual decision maker, be that in the online or offline mailbox.

Most jobs are pretty similar and I have a fantastic tool that automates most of the work. I only have the pleasure to learn about the nature of the data and configure the standardisation and matching process accordingly in a user friendly interface. After the automated process I’m enjoying looking for any false positives and checking for false negatives. Sometimes I’m so lucky that I have the chance to repeat the process with a slightly different configuration so we reach the best result possible.

It’s a great feeling that this work reduces the costs of mailings at my clients, makes them look more smart and professional and facilitates that correct measure of response rates that is so essential in planning future even better direct marketing activities.

But that’s not all. I’m also delighted to be able to have a continuing chat about how we over time may introduce data quality prevention upstream at the point of data entry so we don’t have to do these recurring downstream cleansing activities any more. It’s always fascinating going through all the different applications that many organisations are running, some of them so old that I didn’t dream about they existed anymore. Most times we are able to build a solution that will work in the given landscape and anyway soon the credit crunch is totally gone and here we go.

I’ll be back again with more success from the data quality improvement frontier very soon.

Bookmark and Share

Why do you watch it?

Statler and Waldorf is a pair of Muppet characters. They are two ornery, disagreeable old men. Despite constantly complaining about the show and how terrible some acts were, they would always be back the following week in the best seats in the house. At the end of one episode, they looked at the camera and asked: “Why do you watch it?”.

This is a bit like blogging about data quality, isn’t it? Always describing how bad data is everywhere. Bashing executives who don’t get it. Telling about all the hard obstacles ahead. Explaining you don’t have to boil the ocean but might get success by settling for warming up a nice little drop of water.

Despite really wanting to tell a lot of success stories, being the funny Fuzzy Bear on the stage, well, I am afraid I also have been spending most time on the balcony with Statler and Waldorf.

So, from this day forward: More success stories.

This is the start of a series of 1.3 blog posts…. No, just kidding.

Bookmark and Share

Qualities in Data Architecture

Data architecture describes the structure of data used by a business and its applications by mapping the data artifacts to data qualities, applications, locations etc.

Pont_du_gard2000 years ago the roman writer, architect and engineer Marcus Vitruvius Pollio wrote that a structure must exhibit the three qualities of firmitas, utilitas, venustas — that is, it must be strong or durable, useful, and beautiful.

I have worked with data quality for many years and always been a bit disappointed about the lack of (at)traction that has been around data quality issues. Perhaps the lack of attraction is due to that we focus so much on strength, durability and usefulness and too little about beauty – or at least attractiveness.

But how do the three qualities apply to data quality?

  • Firmitas, strength and durability, is connected to technology and how we tend to make our data be as close to reflecting real world objects as possible in terms as uniqueness, completeness, timeliness, validity, accuracy and consistency.  
  • Utilitas, usefulness, is connected to how we use data as information in business processes. Often “fit for purpose” is stated as a goal for data quality improvement – which makes it hard when multiple purposes exist in an organization.
  • Venustas – beauty or attractiveness – is connected to the mindset of people. Often we blame poor data quality on the people putting data into the data stores and direct initiatives that way using a whip called data governance. But probably we will get more attraction from people if we make or show quality data more attractive.

SidneyOperaHouseCompared to buildings data quality are often the sewers beneath the old cathedrals and new opera houses – which also may explain the lack of attraction.

If you consider yourself a data quality professional – being a tool maker, expert, whatever – you got to get up from the sewers and make and show some attractive data in the halls of the fine buildings. You know how hard it is to make quality data – but do tell about the success stories.

GreatBeltBridge