Putting Context into Data Lakes

The term data lake has become popular along with the raise of big data. A data lake is a new of way of storing data that is more agile than what we have been used to in data warehouses. This is mainly based on the principle that you should not have thought through every way of consuming data before storing the data.

This agility is also the main reason for fear around data lakes. Possible lack of control and standardization leads to warnings about that a data lake will quickly develop into a data swamp.

LakeIn my eyes we need solutions build on the data lake concept if we want business agility – and we do want that. But I also believe that we need to put data in data lakes in context.

Fortunately, there are many examples of movements in that direction. A recent article called The Informed Data Lake: Beyond Metadata by Neil Raden has a lot of good arguments around a better context driven approach to data lakes.

As reported in the post Multi-Domain MDM 360 and an Intelligent Data Lake the data management vendor Informatica is on that track too.

In all humbleness, my vision for data lakes is that a context driven data lake can serve purposes beyond analytical use within a single company and become a driver for business agility within business ecosystems like cross company supply chains as expressed in the LinkedIn Pulse post called Data Lakes in Business Ecosystems.

Bookmark and Share

Choosing the Best Term to Use in MDM

Right now I am working with a MDM (Master Data Management) service for sharing product data in the business ecosystems of manufacturers, distributors, retailers and end users of product information.

One of the challenges in putting such a service to the market is choosing the best term for the entities handled by the service.

Below is the current selection with the chosen term and some recognized alternate terms used frequently and found in various standards that exists for exchanging product data:

Terms

Please comment, if you think there are other English (or variant of English) terms that deserves to be in here.

Related Products: The often Overlooked Facet of PIM

Related products

As examined in the post Self-service Ready Product Data, there are three main different kinds of information, which we deal with within Product Information Management (PIM). These are

  • Product attributes, also sometimes called product properties or product features. These are up to thousands of different data elements that describes a product. Some are very common for most products like height, length, weight and colour. Some are very specific to the product category. This challenge is actually the reason of being for dedicated PIM solutions.
  • Digital assets are documents like product images, installation guides, line drawings, data sheets and more advanced formats as videos. You may handle these digital assets in a dedicated Digital Asset Management (DAM) system or use facilities within a PIM solution or other kind of solutions for that.
  • Related products are the links between a product and other products like a product that have several different accessories that goes with the product or a product being a successor of another now decommissioned product. Spare parts for a given product is another kind of product relation. And then we have cross-sell and up-sell relations.

While PIM solutions usually have good capabilities for handling related products, it is my experience that many organizations does not utilize this very well.

One challenge is that related products can be sourced in various ways as told in the post Related Parties, Products and Locations. These ways are:

  • From the manufacturer of the product. This source is often good when it comes to product relationship types as accessory and replacement (succession) as well as spare part relations.
  • From the customer. We know this approach from the online sales trick prompting us with the message “People who bought A also bought B”.
  • From internal considerations. Facilitating up-sell can be done by enhancing product data with that kind of product relation.

Sourcing product relations from the manufacturer through the supply chain is a must for solutions that facilitates exchange of product data in business ecosystems. In the Product Data Lake we consequently handle the sharing of product attributes, digital assets and related products.

Bookmark and Share

1st Party, 2nd Party and 3rd Party Master Data

Until now, much of the methodology and technology in the Master Data Management (MDM) world has been about how to optimize the use of what can be called first party master data. This is master data already collected within your organization and the approaches to MDM and the MDM solutions offered has revolved around federating internal silos and obtain a single source of truth within the corporate walls.

Besides that third-party data has been around for many years as described in the post Third-Party Data and MDM. Use of third party data in MDM has mainly been about enriching customer and supplier master data from business directories and in some degree utilizing standardized pools of product data in various solutions.

open doorUsing third party data for customer and supplier master data seems to be a very good idea as exemplified in the post Using a Business Entity Identifier from Day One. This is because customer and supplier master looks pretty much the same to every organization. With product master data this is not case and that is why third party sources for product master data may not be fully effective.

Second party data is data you get directly from the external source. With customer and supplier master data we see that approach in self-registration services. My recommendation is to combine self-registration and third party data in customer and supplier on-boarding processes. With product master data I think leaning mostly to second party connections in business ecosystems seems like the best way forward. There is more on that in a discussion on the LinkedIn  MDM – Master Data Management Group.

Bookmark and Share

Take an Ultra Short Survey on Product Data Exchange

How do you exchange product data with your trading partners today? At the Product Data Lake we would like to know some more about that. We do expect that many still send eMails with spreadsheets and digital assets. But please tell us how it is with you. Take the survey by clicking here.

Survey

Also please comment on this blog post on your plans or if you work with Product Information Management (PIM) as a service provider and have experiences to share.

Bookmark and Share

Adding Business Ecosystems to Omnichannel

Omichannel has become a buzzword in marketing and beyond. The jury is still out on what omnichannel really is, but most will agree that it is a refinement and/or extension of earlier known buzzwords as multichannel and cross channel. You may learn more in this article.

In omnichannel you will try really, really hard to have a single customer view across all channels, and you will try really, really, really hard to present your product information in a uniform and consistent way across all channels.

One challenge here is that your business is not an island. You are part of a business ecosystem, or several of them, as examined in the post Data Management for Business Ecosystems.

“Your customer” may look at “your product” in the sphere of another member of your business ecosystem. It may be at one of your trading partners or at one of your competitors.

So, what can you do about this when it comes to data management?

In the hard case, your competitors, it is about knowing more about your customer. Knowing about your customers relationships. Knowing about your customers relations with products and their categories. Knowing about your customer’s locational belonging. All in all the case of multidomain MDM as seen in the post Multi-Domain MDM and Data Quality Dimensions.

Omni
Expand digitilization across business ecosystems from single purposes to cover an omnichannel view

Besides your own product information you must register what you know about that product information as it is stored and handled by other members in your business ecosystem – trading partners and competitors.

With product information, you must be able to exchange that with your trading partners. You cannot expect that everyone is handling the information about the same product in exact the same way as you. Actually you should not want that. You want to be better than your competitors in some ways and you want to add value for your trading partners. But you would for sure find value in joining a place of intersection where common known characteristics about products are exchanged between trading partners – such as the Product Data lake.

Bookmark and Share

Self-service Ready Product Data

The increased use of self-service based sales approaches as in ecommerce has put a lot of pressure on cross company supply chains. Besides handling the logistics and controlling pricing, you also have to take care of a huge amount of product data and digital assets describing the goods.

You may divide product information into these five levels:

Product Information Levels

Please learn more about the five levels of product information, including how hierarchies, pricing and logistics fits in, by visiting the product information castle.

Level 4 in this model is self-service product data being:

  • Product attributes, also sometimes called product properties or product features. These are up to thousands of different data elements that describes a product. Some are very common for most products like height, length, weight and colour. Some are very specific to the product category. This challenge is actually the reason of being for dedicated Product Information Management (PIM) solutions.
  • Basic product relations are the links between a product and other products like a product that have several different accessories that goes with the product or a product being a successor of another now decommissioned product.
  • Standard digital assets are documents like installation guides, line drawings and data sheets.

These are the product data that helps the end customer comparing products and making an objective choice when buying a product for a specific purpose of use. These data are also helpful in answering the questions a buyer may have when making a purchase.

Every piece of data belonging to any level of product information may be forwarded through the cross company supply chain from the manufacturer to the end seller. Self-service product data are however the data that most obviously will do so.

In order to support end customer self-service when producing, distributing and selling goods you must establish a process driven service that automates the introduction of new products with extensive product data, the inclusion of new kinds of product data and updates to those data. You must be a digitalized member of your business ecosystem. The modern solution for that is the Product Data Lake.

Bookmark and Share

Multilingual? Mais oui! Natürlich.

Is that piece of data wrong or right? This may very well be a question about in what language we are talking about.

In an earlier double post on this blog I had a small quiz about the name of the Pope in the Catholic church. The point was that all possible answers were right as explained in post When Bad Data Quality isn’t Bad Data. The thing is that the Pope over the wold has local variants over the English name Francis. François in French, Franziskus in German, Francesco in Italian, Francisco in Spanish Franciszek in Polish, Frans in Danish and Norwegian and so on.

In today’s globalized, or should I say globalised, world, it is important that our data can be represented in different languages and that the systems we use to handle the data is built for that. The user interface may be in a certain flavor/flavour of English only, but the data model must cater for storing and presenting data in multiple languages and even variants of languages as English in its many forms. Add to that the capability of handling other characters than Latin in other script systems than alphabets as examined in the post called Script Systems.

This challenge is very close to me right when we are building a service for sharing product information in business ecosystems. So will the Product Data Lake be multilingual? Mais oui! Natürlich. Jo da.

PDL Example

PS: The Product Data Lake will actually help with collecting product information in multiple languages through the supply chains of product manufacturers, distributors, retailers and end users.

Bookmark and Share

Data Management for Business Ecosystems

Business ecosystems is an important concept of the digital age. The father of business ecosystems, James F. Moore, defined business ecosystems as:

“An economic community supported by a foundation of interacting organizations and individuals—the organisms of the business world. The economic community produces goods and services of value to customers, who are themselves members of the ecosystem. The member organisms also include suppliers, lead producers, competitors, and other stakeholders”.

The problem with data management methodologies and tools today, as I see it, is that they emphasizes on the needs inside the corporate walls of a single company without much attention to, that every single company is a member of one or several business ecosystems as examined in the post called MDM and SCM: Inside and outside the corporate walls.

Opening your data management, including your Master Data Management (MDM), up to the outside is scary business, as the ecosystems often will include your competitors as well as mentioned in the post Toilet Seats and Data Quality.

Nevertheless, if you want your company to survive in the digital age by building up your company’s digitilazation effort you have to extend your data management strategy to encompass the business ecosystems where you are a member.

And now some promotion:

Helene light 03
The Product Data Lake: A tool for business ecosystems

Take A Quick Tour around the Product Data Lake

Bookmark and Share

MDM and SCM: Inside and outside the corporate walls

QuadrantIn my journey through the Master Data Management (MDM) landscape, I am currently working from a Supply Chain Management (SCM) perspective. SCM is very exciting as it connects the buy-side and the sell-side of a company. In that connection we will be able to understand some basic features of multi-domain MDM as touched in a recent post about the MDM ancestors called Customer Data Integration (CDI) and Product Information Management (PIM). The post is called CDI, PIM, MDM and Beyond.

MDM and SCM 1.0: Inside the corporate walls

Traditional Supply Chain Management deals with what goes on from when a product is received from a supplier, or vendor if you like, to it ends up at the customer.

In the distribution and retail world, the product physically usually stays the same, but from a data management perspective we struggle with having buying views and selling views on the data.

In the manufacturing world, we sees the products we are going to sell transforming from raw materials over semi-finished products to finished goods. One challenge here is when companies grow through acquisitions, then a given real world product might be seen as a raw material in one plant but a finished good in another plant.

Regardless of the position of our company in the ecosystem, we also have to deal with the buy side of products as machinery, spare parts, supplies and other goods, which stays in the company.

MDM and SCM 2.0: Outside the corporate walls

SCM 2.0 is often used to describe handling the extended supply chain that is a reality for many businesses today due to business process outsourcing and other ways of collaboration within ecosystems of manufacturers, distributors, retailers, end users and service providers.

From a master data management perspective the ways of handling supplier/vendor master data and customer master data here melts into handling business-partner master data or simply party master data.

For product master data there are huge opportunities in sharing most of these master data within the ecosystems. Usually you will do that in the cloud.

In such environments, we have to rethink our approach to data / information governance. This challenge was, with set out in cloud computing, examined by Andrew White of Gartner (the analyst firm) in a blog post called “Thoughts on The Gathering Storm: Information Governance in the Cloud”.