The concept of a data lake has until now mainly been used to describe how data gathered by a given organization can be organized in order to provide for analytical purposes. The data lake concept is closely tied to the term big data, which means that a data lake caters for handling huge volumes of data, with high velocity and where data comes in heaps of varieties. A data lake is much more agile than data marts and data warehouses, where you have to determine the purpose of the use of data beforehand.
In my eyes the idea about that every organization should gather all the data of interest behind its own firewall does not make sense. Some data will for sure be only for eyes of people behind the corporate walls. Here you should indeed put your private data into your own data lake. But a lot of data are common known data that everyone should not spend time on collecting and eventually cleansing. Here you should collaborate within your industry or other sphere around data lakes for public data.
Perhaps most importantly you should share data lakes with other members of your business ecosystem. You probably already do share data within business ecosystems with your trading partners. Until now such sharing has resembled the concepts of data marts and data warehouses being very purpose specific exchanges of data build on common beforehand understood standards for data.
Right now I am working on a cloud service called the Product Data Lake. Here we host a data lake for sharing product data in the business ecosystems of manufacturers, distributors, merchants and large end users of product information. You can join our journey by following us here on LinkedIn at the Product Data Lake LinkedIn Company Page.