4 years ago, a post on this blog was called The Scary Data Lake. The post was about the fear about if the then new data lake concept would lead to data swamps with horrific data quality, data dumps no one would ever use, data cesspools with all the bad governed data and data sumps that would never be part of the business processes.
For sure, there have been mistakes with data lakes. But it seems that the data lake concept has matured and the understanding of what a data lake can do good is increasing. The data lake concept has even grown out of the analytic world and into more operational cases as told in the post Welcome to Another Data Lake for Data Sharing.
Some of the things we have learned is to apply well known data management principles to data lakes too. This encompasses metadata management, data lineage capabilities and data governance as reported in the post Three Must Haves for your Data Lake.
Will blockchain play a role in “data lineage”?
Good question Al. In my mind it will in theory make a lot of sense. In practice, there is a heavy computing toll. In my current data lake venture we have started without blockchain.