Three Must Haves for your Data Lake

Whether the data lake concept is a good idea or not is discussed very intensively in the data management social media community.

Lake The fear, and actual observations made, is that that a data lake will become a data dump. No one knows what is in there, where it came from, who is going to clean up the mess and eventually have a grip on how it should be handled in the future – if there is a future for the data lake concept.

Please folks. We have some concepts from the small data world that we must apply. Here are three of the important ones:

Metadata

In short, metadata is data about data. Even though the great thing about a data lake is that the structure and all purposes of the data does not have to be cut in stone beforehand, at least all data that is delivered to a data lake must be described. An example of such an implementation is examined in the post Sharing Metadata.

Data Lineage

You must also have the means to tag who delivered the data. If your data lake is within a business ecosystem, this should include the legal entity that has provided the data as told in the post Using a Business Entity Identifier from Day One.

Data Governance

Above all, you must have a framework to govern ownership (Responsibility, Accountability, Consultancy and who must be Informed), policies and standards and other stuff we know from a data governance framework. If the data lake expand across organizations by incorporating second party and third party data, we need a cross company data governance framework as for example highlighted on Product Data Lake Documentation and Data Governance.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph

Liliendahl on Data Quality

A blog about Master Data Management, Product Information Management, Data Quality Management and more

Three Must Haves for your Data Lake

Related

Leave a comment Cancel reply