Movable Types

A big boost in knowledge sharing in mans history was made around year 1450 (in the Gregorian calendar) when Johannes Gutenberg of Germany invented the use of movable types in printing. However the movable types was actually invented 400 years before in China here using porcelain and 200 years before in Korea where metal also was used. But the East Asian inventions did not spread very well do to the Script Systems used, where you have thousands of different tablets representing each syllable or word opposite to when using an alphabet.

Anyway the invention of movable types in printing is regarded as maybe the most important invention since someone invented the wheel (for the first time).

Data quality flaws also got a big boost with the sudden increase in printed work made possible by this invention. I remember my grandmother was a text reviewer at a local newspaper, and she always complained about journalists with poor spelling capabilities and she was very upset when the names of people was spelled wrong in articles. I guess her reference file for correct spelled names was in her head as she knew every known person in the town (being my town of birth: Randers).

The use of computers (including the internet) has made the next big boost in knowledge sharing and data quality flaws including the introduction of the term data quality. Before poor data quality was called sloppiness I guess. The problem however stays the same: Putting the right characters in the right order. The first time.

Bookmark and Share

The Toyota Way

Yesterday I visited a Toyota branch office.

While waiting in the unmanned reception (a result of removing waste, known as muda in Japanese, I guess) I had the chance to study the five posters hanging there with the main principles in The Toyota Way:

  • Challenge: Form a long-term vision and meet challenges with courage and creativity.
  • Kaizen (continuous improvement): Improve business operations continuously, always driving for innovation and evolution.
  • Genchi Genbutsu (go and see): Go to the source to find the facts to make correct decisions, build consensus and achieve goals at best speed.
  • Respect: Respect others. Make every effort to understand each other, take responsibility and do your best to build mutual trust.
  • Teamwork: Stimulate personal and professional growth, share the opportunities of development and maximize individual and team performance.

What a great way to prepare for a meeting about data quality improvement.

Bookmark and Share

Is a Small Difference a Big Deal?

The title of this blog post is stolen from/was inspired by a post on the Nation of Why Not blog. The Nation of Why Not is the branded name of Royal Caribbean. Royal Caribbean operates among a lot of other vessels the world’s two largest cruise ships: ‘Oasis of the Seas’ and ‘Allure of the Seas’. The youngest ship ‘Allure of the Seas’ has just left the shipyard in Turku, Finland and passed under the Great Belt Bridge in grey Danish waters on the way to the blue Caribbean Sea.    

The Oasis and Allure are sister ships supposed to have exactly the same dimensions. But according to the official measures by DNV, Allure is 50 millimeters longer than Oasis. This has led to some teasing between the crews and now it has been suggested that NASA should make a new measurement (from up above I guess).

This is a good old classic data quality issue. Is it acceptable to assume that two similar things have the same attributes? Or do you need to measure each thing separately? And is an eventual difference a difference in the real world or a difference in measurement?

Now, with the ships I think they are a bit different anyway, as I see that the new ship Allure opposite to Oasis also have a Samba Grill, Rita’s Cantina and a Starbucks café inside.     

Bookmark and Share

The Magic Numbers

An often raised question and a subject for a lot of blog posts in the data quality realm is whether data quality challenges should be solved by people or technology.

As in all things data quality I don’t think there is a single right answer for that.

Now, in this blog post I will not tell about what I then think is the answer(s) to the question, but simply tell about what I have seen been chosen as the solution to the question, which have been both people centric solutions and technology centric solutions.

If I look at the situations where people centric solutions have been chosen versus the situations where technology centric solutions have been chosen, the first differentiator seems to be numbers:

  • If you have only a small number of customers and a single channel where entered, the better solution to optimal data quality and uniqueness seems to be a people centric solution.
  • If you have millions of customers and multiple channels where entered, the only practical solution to optimal data quality and uniqueness seems to be a technology centric solution.
  • If you have only a small number of products and a single channel where entered, the only sensible solution to optimal data quality and uniqueness seems to be a people centric solution.
  • If you have thousands of products coming from multiple channels, the most reliable solution to optimal data quality and uniqueness seems to be a technology centric solution.

So, based on common sense the answer to the people or technology question is that it magically depends on the numbers.

Bookmark and Share

Top 5 Reasons for Downstream Cleansing

I guess every data and information quality professional agrees that when fighting bad data upstream prevention is better than downstream cleansing.

Nevertheless most work in fighting bad data quality is done as downstream cleansing and not at least the deployment of data quality tools is made downstream were tools outperforms manual work in heavy duty data profiling and data matching as explained in the post Data Quality Tools Revealed.

In my experience the top 5 reasons for doing downstream cleansing are:

1) Upstream prevention wasn’t done

This is an obvious one. At the time you decide to do something about bad data quality the right way by finding the root causes, improving business processes, affect people’s attitude, building a data quality firewall and all that jazz you have to do something about the bad data already in the databases.

2) New purposes show up

Data quality is said to be about data being fit for purpose and meeting the business requirements. But new purposes will show up and new requirements have to be met in an ever changing business environment.  Therefore you will have to deal with Unpredictable Inaccuracy.

3) Dealing with external born data

Upstream isn’t necessary in your company as data in many cases is entered Outside Your Jurisdiction.

4) A merger/acquisition strikes

When data from two organizations having had different requirements and data governance maturity is to be merged something has to be done.  Some of the challenges are explained in the post Merging Customer Master Data.

5) Migration happens

Moving data from an old system to a new system is a good chance to do something about poor data quality and start all over the right way and oftentimes you even can’t migrate some data without improving the data quality. You only have to figure out when to cleanse in data migration.

Bookmark and Share

Outside Your Jurisdiction

About half a year ago I wrote a blog post called Who is Responsible for Data Quality aimed at issues with having your data coming from another corporation and going to another corporation.

My point was that many views on data governance, data ownership, the importance of upstream prevention and fitness for purpose of use in a business context is based on an assumption that the data in a given company is entered by that company, maintained by that company and consumed by that company. But this is in the business world today not true in many cases.

Actually a majority of the data quality issues I have been around since then has had exactly these ingredients:

  • When data was born it was under an outside data governance jurisdiction
  • The initial data owners, stewards and custodians were in another company
  • Upstream wasn’t in the company were the current requirements are formulated

At the point of data transfer between the two jurisdictional areas the data is already digitalized and often it is high volume of data supposed to be processed in a short time frame, so the willingness and practical possibilities for implementing manual intervention is very limited.

This means that one case of looking for technology centric solutions is when data is born outside your jurisdiction. Also you tend to deal with concrete data quality rather than fluffy information quality in this scenario. That’s a pity, as I like information quality very much – but OK, data quality technology is quite interesting too.

Bookmark and Share

My Secret

Yesterday I followed a webinar on DataQualityPro with ECCMA ISO 8000 project leader Peter Benson.

Peter had a lot of good sayings and fortunately Jim Harris as a result of his live tweeting has documented a sample of good quotes here.

My favorite:

“Quality data does NOT guarantee quality information, but quality information is impossible without quality data.”

I have personally conducted an experiment that supports that hypothesis. It goes as this:

First, I found a data file on my computer. Lots of data in there being numbers and letters. And sure, what is interesting is the information I can derive for different purposes.

Then I deleted the data file and tried to see how much information was left behind.

Guess what? Not a bit.

I first published that experiment as a comment to one of Jim’s blog posts: Data Quality and the Cupertino Effect.

As documented in the comments on this blog post the subject of data (quality) versus information (quality) is ever recurring and almost always guarantees a fierce discussion among data/information management professionals.

So, I’ll just tell you this secret: My work in achieving quality information is done by fixing data quality.

And guess what? I have disabled comments on this blog post.

Bookmark and Share

The Ugly Duckling

The title of the fairy tale “The Ugly Duckling” by Hans Christian Andersen was originally supposed to be the more positive “The Young Swan” (or “The Cygnet”) , but as Andersen did not want to spoil the element of surprise in the protagonist’s transformation, he discarded it for “The Ugly Duckling”.

In a blog post called “Why Isn’t Our Data Quality Worse?” posted today (or last night local Iowa time) Jim Harris examines the psychology term “negativity bias” that explains how bad evokes a stronger reaction than good in the human mind.

Surely, data quality improvement evangelism is most often based on the strong force of badness. Always describing how bad data is everywhere. Bashing executives who don’t get it. Only as a nice positive surprise in the end we tell how our product/consultancy will transform the ugly duckling into a beautiful swan.    

My latest blog post with a truly positive angle called “What a Lovely Day” is almost 2 months old. So I promise myself the next post will have the title “The Young Swan” (or “The Cygnet”) and will be extremely positive about data quality improvement.

Bookmark and Share

Data Quality Is Like Parenting

Thinking about it: Data Quality has a lot of similarities with parenting.

Some equivalence that comes to my mind is:

  • Parenting must be done by everyone who has children; you are not supposed to have an education in education before being parents. The same about data. You are not supposed be a data quality expert before working with data; some common sense will bring you a long way.
  • Some parenting experts never had their own children. I have seen the same with data quality experts too.
  • Many people are more knowledgeable about how other people should raise children than about raising their own children. Same same with data quality.
  • While we internally in the family may have some noise when parenting we keep that internally and keep up appearances to the outside. I think everyone have seen the same with data quality.
  • There may be different styles in parenting going from “because I said so” to talking about it. The same is true around data quality improvement efforts.
  • We do see more and more regulatory around parenting like it in my country now is forbidden to slap your kids.  I think it should be forbidden to slap your naughty data too.

Bookmark and Share