Donkey Business

When I started focusing on data quality technology 15 years ago I had great expectations about the spread of data quality tools including the humble one I was fabricating myself.

Even if you tell me that tools haven’t spread because people are more important than technology, I think most people in the data and information quality realm think that the data and information quality cause haven’t spread as much as deserved.

Fortunately it seems that the interest in solving data quality issues is getting traction these days. I have noticed two main drivers for that. If we compare with the traditional means of getting a donkey to move forward, the one encouragement is like the carrot and the other encouragement is like the stick:

  • The carrot is business intelligence
  • The stick is compliance

With business intelligence there has been a lot things said and written about that business intelligence don’t deliver unless the intelligence is build on a solid valid data foundation. As a result I have noticed I’m being involved in data quality improvement initiatives around aimed as a foundation for delivering business decisions. One of my favorite data quality bloggers Jim Harris has turned that carrot a lot on his blog: Obsessive Compulsive Data Quality.  

Another favorite data quality blogger Ken O’Conner has written about the stick being compliance work on his blog, where you will find a lot of good points that Ken has learned from his extensive involvement in regulatory requirement issues.

These times are interesting times with a lot of requirements for solving data quality issues. As we all know, the stereotype donkey is not easily driven forward and we must be aware not making the burden to heavy:    

Bookmark and Share

Top 5 Reasons for Downstream Cleansing

I guess every data and information quality professional agrees that when fighting bad data upstream prevention is better than downstream cleansing.

Nevertheless most work in fighting bad data quality is done as downstream cleansing and not at least the deployment of data quality tools is made downstream were tools outperforms manual work in heavy duty data profiling and data matching as explained in the post Data Quality Tools Revealed.

In my experience the top 5 reasons for doing downstream cleansing are:

1) Upstream prevention wasn’t done

This is an obvious one. At the time you decide to do something about bad data quality the right way by finding the root causes, improving business processes, affect people’s attitude, building a data quality firewall and all that jazz you have to do something about the bad data already in the databases.

2) New purposes show up

Data quality is said to be about data being fit for purpose and meeting the business requirements. But new purposes will show up and new requirements have to be met in an ever changing business environment.  Therefore you will have to deal with Unpredictable Inaccuracy.

3) Dealing with external born data

Upstream isn’t necessary in your company as data in many cases is entered Outside Your Jurisdiction.

4) A merger/acquisition strikes

When data from two organizations having had different requirements and data governance maturity is to be merged something has to be done.  Some of the challenges are explained in the post Merging Customer Master Data.

5) Migration happens

Moving data from an old system to a new system is a good chance to do something about poor data quality and start all over the right way and oftentimes you even can’t migrate some data without improving the data quality. You only have to figure out when to cleanse in data migration.

Bookmark and Share

Outside Your Jurisdiction

About half a year ago I wrote a blog post called Who is Responsible for Data Quality aimed at issues with having your data coming from another corporation and going to another corporation.

My point was that many views on data governance, data ownership, the importance of upstream prevention and fitness for purpose of use in a business context is based on an assumption that the data in a given company is entered by that company, maintained by that company and consumed by that company. But this is in the business world today not true in many cases.

Actually a majority of the data quality issues I have been around since then has had exactly these ingredients:

  • When data was born it was under an outside data governance jurisdiction
  • The initial data owners, stewards and custodians were in another company
  • Upstream wasn’t in the company were the current requirements are formulated

At the point of data transfer between the two jurisdictional areas the data is already digitalized and often it is high volume of data supposed to be processed in a short time frame, so the willingness and practical possibilities for implementing manual intervention is very limited.

This means that one case of looking for technology centric solutions is when data is born outside your jurisdiction. Also you tend to deal with concrete data quality rather than fluffy information quality in this scenario. That’s a pity, as I like information quality very much – but OK, data quality technology is quite interesting too.

Bookmark and Share

Data Quality Is Like Parenting

Thinking about it: Data Quality has a lot of similarities with parenting.

Some equivalence that comes to my mind is:

  • Parenting must be done by everyone who has children; you are not supposed to have an education in education before being parents. The same about data. You are not supposed be a data quality expert before working with data; some common sense will bring you a long way.
  • Some parenting experts never had their own children. I have seen the same with data quality experts too.
  • Many people are more knowledgeable about how other people should raise children than about raising their own children. Same same with data quality.
  • While we internally in the family may have some noise when parenting we keep that internally and keep up appearances to the outside. I think everyone have seen the same with data quality.
  • There may be different styles in parenting going from “because I said so” to talking about it. The same is true around data quality improvement efforts.
  • We do see more and more regulatory around parenting like it in my country now is forbidden to slap your kids.  I think it should be forbidden to slap your naughty data too.

Bookmark and Share

Going Upstream in the Circle

One of the big trends in data quality improvement is going from downstream cleansing to upstream prevention. So let’s talk about Amazon. No, not the online (book)store, but the river. Also as I am a bit tired about that almost any mention of innovative IT is about that eShop.

A map showing the Amazon River drainage basin may reveal what may go to be a huge challenge in going upstream and solve the data quality issues at the source: There may be a lot of sources. Okay, the Amazon is the world’s largest river (because it carries more water to the sea than any other river), so this may be a picture of the data streams in a very large organization. But even more modest organizations have many sources of data as more modest rivers also have several sources.

By the way: The Amazon River also shares a source with the Orinoco River through the natural Casiquiare Canal, just as many organizations also shares sources of data.

Some sources are not so easy to reach as the most distant source of the Amazon being a glacial stream on a snowcapped 5,597 m (18,363 ft) peak called Nevado Mismi in the Peruvian Andes.

Now, as I promised that the trend on this blog should be about positivity and success in data quality improvement I will not dwell at the amount of work in going upstream and prevent dirty data from every source.

I say: Go to the clouds. The clouds are the sources of the water in the river. Also I think that cloud services will help a lot in improving data quality in a more easy way as explained in a recent post called Data Quality from the Cloud.

Finally, the clouds over the Amazon River sources are made from water evaporated from the Amazon and a lot of other waters as part of the water cycle. In the same way data has a cycle of being derived as information and created in a new form as a result of the actions made from using the information.

I think data quality work in the future will embrace the full data cycle: Downstream cleansing, upstream prevention and linking in the cloud.

Bookmark and Share

New Blog Name?

As reported by Mark Goloboy here ”Data Quality” is becoming a dirty word. ”Information Quality” is in vogue.

Maybe I will soon have to change the name of my blog?

Also one may expect other related terms will be changed, like:

  • Data Governance becomes Information Governance
  • Master Data Management becomes Master Information Management
  • Data Matching becomes Information Matching
  • Data Warehouse becomes Information Warehouse
  • Database becomes Informationbase
  • Information Technology becomes Data Technology

But changing the name of a blog is a serious thing you shouldn’t do too often. I think I will wait and see if the term renaming stops at simply replacing data and information. Some guesses for further renaming:

Information Fitness replaces Data Quality as Data quality is often defined as “fit for intended purpose of use” and by replacing data with information that trail is even more clear – opposed to the other trail being real world alignment.

Information Political Correctness replaces Data Governance as Data Governance is a lot about policies and the Data Governance practice is a lot about maneuvering in the corporate political landscape.    

Master Information Technology (MIT) replaces Master Data Management (MDM)

Bookmark and Share

The Next Level

A quote about data quality from Thomas Redman says:

“It is a waste of effort to improve the quality (accuracy) of data no one ever uses.”

I have learned the quote from Jim Harris who mentioned the quote latest in his post: DQ-Tip: “There is no point in monitoring data quality…”

In a comment Phil Simon said: I love that. I’m jealous that I didn’t think of something so smart.

I’m guessing Phil was into some irony. If so, I can see why. The statement seems pretty obvious and at first glance you can’t imagine anyone taking the opposite stance: Let’s cleanse some data no one ever uses.

Also I think it was meant as being obvious in Redman’s book: Data Driven.

Well, taking it to the next level I can think of the following elaboration:

  1. If you found some data that no one ever uses you should not only avoid improving the quality of that data, you should actually delete the data and make sure that no one uses time and resources for entering or importing the same data in the future.
  2. That is unless the reason that no one ever uses the data is that the quality of the data is poor.  Then you must compare the benefits of improving the data against the costs of doing so. If costs are bigger, proceed with point 1. If benefits are bigger, go to point 3.
  3. It is not  a waste of effort to improve the quality of some data no one ever uses.

Bookmark and Share

Sticky Data Quality Flaws

Fighting against data quality flaws is often most successfully done at data entry. When incorrect information has been entered into the system it most often seems nearly impossible to eliminate the falsehood.

A hilarious example is told in an article from telegraph.co.uk. A local council sent a letter to a woman’s pet pig (named Blossom Grant) offering the animal the chance to register for a vote in last week’s UK election. This is only the culmination of a lot of letters –including tons of direct marketing – addressed to the pigsty. The pigsty was according to the article wrongly registered as a residence some years ago after a renovation. Since then the owner (named Pauline Grant) of the pig has tried to get the error corrected over and over again – but with no success.

Bookmark and Share

Royal Exceptions

I am not a royalist, but anyway: Today 16th April 2010 is the 70 years birthday of Queen Margrethe II of Denmark. Congratulations Your Majesty.

Having a queen (or king) and a royal family is a good example of that there are always exceptions. As a matter related to data quality: I would say that every person in our country has a first (given) name and a last (family) name. But the royal family hasn’t a last name – only they have some first names like those of Her Majesty being Margrethe Alexandrine Þórhildur Ingrid. (By the way: The third name is actually Icelandic; I guess that explains the ash cloud sent as a greeting from there.)

There are always exceptions. We may define data quality validation rules from here to doomsday – there will always be exceptions. We may write down business rules from now to eternity – tomorrow you will encounter the first exception. Data quality (and democracy) is never perfect – but it’s worth striving for.

Breaking through an open door

This is perhaps a road I have been down before for example lately in the post The Myth about a Myth.

But it is a pet peeve of mine.

Why are some people always reminding us that this and that must be seen in a business context?

Of course everything we do in our professional life within data quality, master data management, business intelligence and so on must be seen in a business context. Again, I have never seen any people taking the opposite stance.

I am aware that playing the “business context” card is a friendly reminder when say some people become too excited about a tool. But remember, every tool is originally made by people to solve a business challenge and if the tool continues to exist it has probably done that several times.

It may be that tools are over exposed in our business issue discussions due to that some people are doing their job:

  • Vendors are naturally pushing their tools – it’s a business issue
  • Analysts talks about tools and vendors – it’s a business issue
  • Conference organizers invites vendors to make sponsorships and tool exhibitions – it’s a business issue

But I don’t think you are breaking through anything when reminding anyone about the business context. Everyone knows that already.  Take it to the next level.