The Next Level

A quote about data quality from Thomas Redman says:

“It is a waste of effort to improve the quality (accuracy) of data no one ever uses.”

I have learned the quote from Jim Harris who mentioned the quote latest in his post: DQ-Tip: “There is no point in monitoring data quality…”

In a comment Phil Simon said: I love that. I’m jealous that I didn’t think of something so smart.

I’m guessing Phil was into some irony. If so, I can see why. The statement seems pretty obvious and at first glance you can’t imagine anyone taking the opposite stance: Let’s cleanse some data no one ever uses.

Also I think it was meant as being obvious in Redman’s book: Data Driven.

Well, taking it to the next level I can think of the following elaboration:

  1. If you found some data that no one ever uses you should not only avoid improving the quality of that data, you should actually delete the data and make sure that no one uses time and resources for entering or importing the same data in the future.
  2. That is unless the reason that no one ever uses the data is that the quality of the data is poor.  Then you must compare the benefits of improving the data against the costs of doing so. If costs are bigger, proceed with point 1. If benefits are bigger, go to point 3.
  3. It is not  a waste of effort to improve the quality of some data no one ever uses.

Bookmark and Share

11 thoughts on “The Next Level

  1. Jim Harris 22nd May 2010 / 21:12

    Excellent post Henrik,

    That Redman quote comes from his chapter about the often hidden costs of poor data and information, and specifically from a section discussing organizational confusion, where even basic questions about data can often not be answered, such as:

    – What data do you have?

    – Where are they?

    – Which are the most important?

    – How do you use them?

    – Where do they come from?

    – What are they worth?

    As Redman summarizes the section:

    “Too much data are just plain wrong, too hard to find, poorly defined, inconsistent with other data, and at risk of being lost or stolen. Organizations do not know what data they have, redundancy is out of control, and too much data are never used for anything.”

    Therefore, I agree (and I think Redman would too) with your three step plan for taking it to the next level.

    Best Regards,

    Jim

  2. philsimon 22nd May 2010 / 23:22

    Henrik

    Good post. Allow me to take a little exception to this line, though:

    It is not a waste of effort to improve the quality of some data no one ever uses.

    Isn’t this a little like a tree falling in the forest? Does it make a noise if no one hears it?

    Just being a little existential on a Saturday, mate.

    Phil

  3. Steve Sarsfield 23rd May 2010 / 04:56

    If I had a dime every time someone told me that “there is no ROI in this DQ project”, I’d definitely have a few dimes. Encouraging project managers to look for the value they’re providing to the corporation is definitely worth saying.

  4. Graham Rhind 23rd May 2010 / 06:29

    I have to stand up and state loudly and clearly that I don’t agree with Mr Redman in this case (albeit that the quote is usually taken out of context).

    That data is not used NOW will not say that data will not be used EVER; and we need to distinguish between data that decays (such as address information) and data that does not (such as date of birth). It is never a waste of effort to improve the quality (accuracy) of data no one ever uses now, but which will be used, especially when the information rooted in that data is required in the short term. Even better, collect data properly in the first instance and you won’t need to worry about wasting effort at a later stage – this concern need not arise, except for data that decays.

  5. Henrik Liliendahl Sørensen 23rd May 2010 / 06:48

    Thanks Jim, Phil, Steve and Graham.

    Once upon a time one organization never used some data and no one gave a damn.

    Also in a competing organization no one never used some similar data because the quality of that data was poor. After improving the data the data became useful information, and a lot of customers gave a dime.

    The first organization fell like a tree in the forest – no one hardly heard.

  6. philsimon 23rd May 2010 / 11:02

    I actually wasn’t being ironic. I’m just a big fan of simple maxims that convey something much deeper.

  7. Henrik Liliendahl Sørensen 23rd May 2010 / 11:51

    I love those punchlines too, Phil. I’ve just never been asked to improve some data no one uses. It would be nice though. No risk of complaints I guess.

  8. Thorsten 23rd May 2010 / 22:47

    Henrik,
    I really like the alternatives you present when coming across areas of bad data quality
    1. If the data is not needed, delete it.
    2. If the data is needed, improve the data quality.

    I’m a big fan of making things as simple as possible. So when I come across some data that is hardly ever entered, I usually suggest of deleting the data (okay, maybe archive it off just to be sure) and then CHANGE THE APPLICATION SO THE DATA CAN NEVER BE ENTERED AGAIN. (Sorry for shouting ;-))

    However, I hardly ever manage to get fields removed from an application. After all, maybe that data is needed in the future .. I’m sorry, Graham, I don’t really agree with you. I think much more can be gained by making sure the user enters the correct information on the fields that really matter than to confuse him with a number of fields that may be needed later on.

    Great discussion!
    Thorsten

  9. Graham Rhind 24th May 2010 / 06:34

    @Thorsten – I don’t think we really disagree – I think we’re referring to different things. I’m talking about data which HAS been gathered (as per the Redman quote) rather than data which IS being gathered; and also to data which WILL be used rather than data which MAY be used. (Yes, I know I have the advantage here as the quote’s in my mother language 🙂 )

    In the specific case you outline – limited resources, problems with confused data collection operatives, in a corporate environment – I’d probably go the same way as you would. But in a philosophical sense, where resources are not limited and so no choice has to be made about which data is to be collected (correctly), I’d go for collecting more (good) data in the first place so has not having to regret not having required data later on.

  10. Henrik Liliendahl Sørensen 24th May 2010 / 13:55

    Thanks Thorsten and Graham for lighting up the discussion.

    On LinkedIn Dean Groves left a comment:

    “Great discussion topic! I heard a story just the other day from someone who worked on a server consolidation project. There was little documentation on owners or users of the apps they hosted. The last resort was to shut down servers to see who complained. Only one server had no complaints, and it was eliminated. The team worked with users to consolidate the rest. Months later. IT got a call from the CFO: He’d lost his server. He used this box only once a year. You can imagine the trouble that ensued. My point: Make sure all data have an owner of record. Consider also an expiration date which, if passed without renewal by the owner, will allow IT to take the data out of active management.”

  11. philsimon 24th May 2010 / 13:57

    Brilliant! There should be a three month rule in place. If you don’t notice it for three months, then how can you legitimately claim that you need the app? It’s kind of like playing four rounds of golf without your three iron and only noticing it after a month. How much do you really need it then?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s