Data Pool vs Data Lake

Within Product Information Management (PIM) – or Product Master Data Management if you like – there is a concept of a data pool.

Recently Justine Rodian of Stibo Systems made a nice blog post with the title Master Data Management Definitions: The Complete A-Z of MDM. Herein Justine explains a lot of terms within Master Data Management (MDM). A data pool is described as this:

“A data pool is a centralized repository of data where trading partners (e.g., retailers, distributors or suppliers) can obtain, maintain and exchange information about products in a standard format. Suppliers can, for instance, upload data to a data pool that cooperating retailers can then receive through their data pool.”

Now, during the last couple of year I have been working on the concept of applying the data lake approach to product information exchange between trading partners. Justine describes a data lake this way:

“A data lake is a place to store your data, usually in its raw form without changing it. The idea of the data lake is to provide a place for the unaltered data in its native format until it’s needed…..” 

Product Data Lake
MacRitchie Reservoir in Singapore

For a provider of product information, typically a manufacturer, the benefit of interacting via a data lake opposite to a data pool is that they do not have to go through standardization before uploading and thus have to shoehorn the data into a specific form and thereby almost certainly leave out important information and being depending on consensus between competing manufacturers.

For a receiver of information, typically a merchant as a retailer and B2B dealer, the benefit of interacting via a data lake opposite to a data pool is that they can request the data in the form they will use to be most competitive and thereby sell more and reduce costs in product information sharing. This will be further accelerated if the merchant uses several data pools.

In Product Data Lake we even combine the best of the two approaches by encompassing data pools in our reservoir concept – to stay in the water body lingo. Here data pools are refreshed with modern data management technology and less rigid incoming and outgoing streams as announced in the post Product Data Lake Version 1.3 is Live.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s