Data Quality 3.0

Paraphrasing Tim Berners-Lee:

“People may ask what Data Quality 3.0 is. I think what is looking misty on Web 2.0 and Data Quality 2.0 will eventually melt into a semantic Web integrated across a huge space of data where you’ll have access to an unbelievable data resource.”

Another way of putting it will be in a micro-manifesto like:

“While we value that data are of high quality if they are fit for the intended purpose of use we value more that data correctly represent the real-world construct to which they refer in order to be fit for current and future multiple purposes”.

My thesis is that there is a breakeven point when including more and more purposes where it will be less cumbersome to reflect the real world object rather than trying to align all known purposes.

You may divide the data held by an enterprise into 3 pots:

  • Global data that is not unique to operations in your enterprise but shared with other enterprises in the same industry (e.g. product reference data) and eventually the whole world (e.g. business partner data and location data). Here “shared data in the cloud” will make your “single version of the truth” easier and closer to the real world.
  • Bilateral data concerning business partner transactions and related master data. If you for example buy a spare part then also “share the describing data” making your “single version of the truth” easier and more accurate.
  • Private data that is unique to operations in your enterprise. This may be a “single version of the truth” that you find superior to what others have found, data supporting internal business rules that make your company more competitive and data referring to internal events.

Data Management in the near future will in my eyes be closely related to the emerging web 3.0:

  • Business Intelligence – and Data Science –  will embrace internal (private) data and external (public) data in the cloud
  • Data Warehouses – and Data Lakes – will link internal (private) data and external (public) data in the cloud
  • Master Data Management will align internal (private) data with external (public) data in the cloud
  • Data Quality Tools will profile internal (private) data and match internal (private) data with external (public) data in the cloud
  • Data Governance may be a lot about balancing the use of internal (private) data and external (public) data – and internal and external business rules

Learn about some Data Quality 3.0 services here:

  • The iDQ(tm) (instant Data Quality) service for sharing big reference data for the benefit of customer and other party master data.
  • The Product Data Lake for sharing public and bilateral data within business ecosystems for the benefit of product master data.

Great Belt Brdige

Bookmark and Share

10 thoughts on “Data Quality 3.0

  1. kenoconnordataconsultant 16th August 2010 / 16:11

    Henrik,

    I agree with your vision – and look forward to seeing it realised, soon.

    To be honest, I’m tired of the “fit for purpose” argument, an argument that excuses poor quality data by saying it is “fit for the unique purpose for which it was originally designed”.

    I like the concept of data that is “fit for current and future multiple purposes”.

    The current “bespoke” model is unsustainable. It is similar to requiring a bricklayer to bake his own bricks before he can build a wall.

    Roll on Data Quality 3.0

    Rgds Ken

  2. Henrik Liliendahl Sørensen 16th August 2010 / 20:14

    Thanks Ken, glad we are aligned.

  3. Carl White 22nd November 2011 / 10:57

    Excuse my naivety but precisely how CAN data be fit for a future purpose when we do not know the future purpose?

    I ask innocently and came to this blog from google so there may be context I’m not appreciating.

    Naturally, we must define data – I can imagine how we can use a currency amount for multiple future purposes. If we’re talking about a data set at a given level of granularity which a future purpose requires at a lower level of granularity, aren’t we stuck?

    Sorry for what might seem to be a stupid question and thanks for all the effort you people take to write on subjects I’m interested in!

    Carl

    • Henrik Liliendahl Sørensen 23rd November 2011 / 00:19

      Thanks for joining Carl.

      What I often see is that one organization won’t go for a certain level of data quality as uniqueness, granularity or other dimensions that another similar organization would do. This is naturally due to the business cases that the current data management efforts are based on. But sooner or later the same kind of organizations will need the same uniqueness, granularity and so, because the business challenges are the same and it’s the same real world these organizations are operating in.

      Therefore looking at the real world will often be a good way to fit those future purposes we know will arise but don’t have on the radar yet.

      One example will be that say your business is currently mostly domestic with a few foreign business partners. Therefore you don’t require any accuracy in storing the country of your foreign business partners and the applicable address format. But if your business will grow internationally, which is the way to grow today in many cases, you will regret that later.

  4. Dave Poole 26th January 2012 / 11:23

    I’d divide data into the following pots
    1. Internation standards compliant (global)
    2. Industry standard compliant (tends to be global but strays to multi-lateral)
    3. National standard compliant (sub-global, multi-lateral)
    4. Partner agreed (bi-lateral)
    5. Proprietary

    • Henrik Liliendahl Sørensen 26th January 2012 / 11:34

      Dave, thanks a lot for commenting.

      Very good breakdown and actually very, very close to what I am working with these days.

  5. Mark Humphries 12th March 2013 / 08:53

    I think that lineage is going to be the big challenge in the near future with data.
    By lowering the thresholds to accessing, processing and publishing data, I expect an explosion of transformed data.
    The challenge will be to understand what is relevant and what is not. How can you trust what you see? Where did it come from and how was it transformed?
    The food industry has an equivalent problem at the moment. How do you know that your beef is beef?

    • Henrik Liliendahl Sørensen 12th March 2013 / 09:30

      Thanks for commenting Mark. Indeed data provenance/lineage will be more and more important as we share more and more data.

  6. Richard Northwood 17th June 2013 / 19:28

    Hi Henrik,

    Interesting concept, letting cloud providers or other 3rd parties enrich your data. Some of the providers of data quality providers are moving into this space already. I recently spoke to Experian reps about their acquiring the x88 platform. You can see with their existing work with data, they are perhaps positioning themselves to do something similar. But I think they are a while off yet. Good insight.

    Rich

Leave a comment