Data Quality 3.0

Paraphrasing Tim Berners-Lee:

“People may ask what Data Quality 3.0 is. I think what is looking misty on Web 2.0 and Data Quality 2.0 will eventually melt into a semantic Web integrated across a huge space of data where you’ll have access to an unbelievable data resource.”

Another way of putting it will be in a micro-manifesto like:

“While we value that data are of high quality if they are fit for the intended use we value more that data correctly represent the real-world construct to which they refer in order to be fit for current and future multiple purposes”.

My thesis is that there is a breakeven point when including more and more purposes where it will be less cumbersome to reflect the real world object rather than trying to align all known purposes.

You may divide the data held by an enterprise into 3 pots:

  • Global data that is not unique to operations in your enterprise but shared with other enterprises in the same industry (e.g. product reference data) and eventually the whole world (e.g. business partner data and location data). Here “shared data in the cloud” will make your “single version of the truth” easier and closer to the real world.
  • Bilateral data concerning business partner transactions and related master data. If you for example buy a spare part then also “share the describing data” making your “single version of the truth” easier and more accurate.    
  • Private data that is unique to operations in your enterprise. This may be a “single version of the truth” that you find superior to what others have found, data supporting internal business rules that make your company more competitive and data referring to internal events.

Data Management in the near future will in my eyes be closely related to the emerging web 3.0:

  • Business Intelligence will embrace internal (private) data and external (public) data in the cloud
  • Data Warehouses will link internal (private) data and external (public) data in the cloud
  • Master Data Management will align internal (private) data with external (public) data in the cloud
  • Data Quality Tools will profile internal (private) data and match internal (private) data with external (public) data in the cloud
  • Data Governance may be a lot about balancing the use of internal (private) data and external (public) data


Bookmark and Share

7 Responses to Data Quality 3.0

  1. kenoconnordataconsultant says:

    Henrik,

    I agree with your vision – and look forward to seeing it realised, soon.

    To be honest, I’m tired of the “fit for purpose” argument, an argument that excuses poor quality data by saying it is “fit for the unique purpose for which it was originally designed”.

    I like the concept of data that is “fit for current and future multiple purposes”.

    The current “bespoke” model is unsustainable. It is similar to requiring a bricklayer to bake his own bricks before he can build a wall.

    Roll on Data Quality 3.0

    Rgds Ken

  2. Henrik Liliendahl Sørensen says:

    Thanks Ken, glad we are aligned.

  3. A useful breakdown of the types of data.

  4. Carl White says:

    Excuse my naivety but precisely how CAN data be fit for a future purpose when we do not know the future purpose?

    I ask innocently and came to this blog from google so there may be context I’m not appreciating.

    Naturally, we must define data – I can imagine how we can use a currency amount for multiple future purposes. If we’re talking about a data set at a given level of granularity which a future purpose requires at a lower level of granularity, aren’t we stuck?

    Sorry for what might seem to be a stupid question and thanks for all the effort you people take to write on subjects I’m interested in!

    Carl

    • Henrik Liliendahl Sørensen says:

      Thanks for joining Carl.

      What I often see is that one organization won’t go for a certain level of data quality as uniqueness, granularity or other dimensions that another similar organization would do. This is naturally due to the business cases that the current data management efforts are based on. But sooner or later the same kind of organizations will need the same uniqueness, granularity and so, because the business challenges are the same and it’s the same real world these organizations are operating in.

      Therefore looking at the real world will often be a good way to fit those future purposes we know will arise but don’t have on the radar yet.

      One example will be that say your business is currently mostly domestic with a few foreign business partners. Therefore you don’t require any accuracy in storing the country of your foreign business partners and the applicable address format. But if your business will grow internationally, which is the way to grow today in many cases, you will regret that later.

  5. Dave Poole says:

    I’d divide data into the following pots
    1. Internation standards compliant (global)
    2. Industry standard compliant (tends to be global but strays to multi-lateral)
    3. National standard compliant (sub-global, multi-lateral)
    4. Partner agreed (bi-lateral)
    5. Proprietary

    • Henrik Liliendahl Sørensen says:

      Dave, thanks a lot for commenting.

      Very good breakdown and actually very, very close to what I am working with these days.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 125 other followers