The business data lake concept is a new try on getting rid of all the excel spreadsheets business people operate because of limitations in today’s enterprise data warehouses and the business intelligence solutions sitting on top of those extracted, transformed and loaded data.
In the business data lake you load raw data including unstructured data sources. Single view and related governance is restricted to master and reference data.
It’s not that you are going to load all the data in the world in your business data lake. You will link internal and external data based on where and when needed.
Thomas Redman has made a famous metaphor in the data quality realm about a polluted lake where the best option to deal with that is to prevent polluted water from streaming into the lake. I guess the rise of big data challenges that take as told some years ago in the post Extreme Data Quality.
In the business data lake we will have polluted data. In that view I think it’s a good thing that master and reference data has a special place in the lake.
What do you think? Do you like the lake – the old and/or the new one?