Data Warehouse vs Data Lake, Take 2

The differences between a data warehouse and a data lake has been discussed a lot as for example here and here.

To summarize, the main point in my eyes is: In a data warehouse the purpose and structure is determined before uploading data while the purpose with and structure of data can be determined before downloading data from a data lake. This leads to that a data warehouse is characterized by rigidity and a data lake is characterized by agility.

take-2Agility is a good thing, but of course, you have to put some control on top of it as reported in the post Putting Context into Data Lakes.

Furthermore, there are some great opportunities in extending the use of the data lake concept beyond the traditional use of a data warehouse. You should think beyond using a data lake within a given organization and vision how you can share a data lake within your business ecosystem. Moreover, you should consider not only using the data lake for analytical purposes but commence on a mission to utilize a data lake for operational purposes.

The venture I am working on right now have this second take on a data lake. The Product Data Lake exists in the context of sharing product information between trading partners in an agile and process driven way. The providers of product information, typically manufacturers and upstream distributors, uploads product information according to the data management maturity level of that organization. This information may very well for now be stored according to traditional data warehouse principles. The receivers of product information, typically downstream distributors and retailers, download product information according to the data management maturity level of that organization. This information may very well for now end up in a data store organized by traditional data warehouse principles.

As I have seen other approaches for sharing product information between trading partners these solutions are built on having a data warehouse like solution between trading partners with a high degree of consensus around purpose and structure. Such solutions are in my eyes only successful when restricted narrowly in a given industry probably within a given geography for a given span of time.

By utilizing the data lake concept in the exchange zone between trading partners you can share information according to your own pace of maturing in data management and take advantage of data sharing where it fits in your roadmap to digitalization. The business ecosystems where you participate are great sources of data for both analytical and operational purposes and we cannot wait until everyone agrees on the same purpose and structure. It only takes two to start the tango.

Bookmark and Share

5 thoughts on “Data Warehouse vs Data Lake, Take 2

  1. Henrik Liliendahl 18th September 2016 / 16:20

    On twitter Tom Breur reacts: “Without a logical (=virtual) DWH on top of your data lake we go back to the hell 90’s DWH tried to overcome”.

    I agree, this is the challenge. What we have put in place on the Product Data Lake is this:

    > A very generic data model for product data. The main entities are parties, profiles, products, attributes, digital assets and related products. This model is by the way more or less the same model used in successful PIM solutions. The Product Data Lake adds links between parties (partnerships), links between products, links between attributes and links between digital assets supported by transformation rules.

    > A tagging system for product attributes.

  2. Tom Breur 20th September 2016 / 18:11

    Hi Henrik, it appears to me that what you are describing is an environment with a healthy modicum of data governance, where users and data providers collaborate to jointly discover and improve the usability of data in its current form. Then from there they collaborate to incrementally improve their raw materials (ie data and meta- or masterdata). As long as everyone is aware of the status (Gold, Silver Bronze) of all data elements, I can see that working really well.

  3. Henrik Liliendahl 20th September 2016 / 18:35

    Hi Tom. Yes, this is the intension. And we have indeed received suggestions from interested subscribers to the Product Data Lake about having a way to rate the data quality. Also, we are working on a data governance framework to be used in cross company environments. It is early days now with the launch of version 1 of the service in a few days. I’m looking forward to implement these kind of data quality and data governance capabilities around the service in collaboration with subscribers and experts.

  4. jaluya 14th October 2016 / 14:21

    Reblogged this on jofdt and commented:
    An interesting read!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s