Putting Context into Data Lakes

The term data lake has become popular along with the raise of big data. A data lake is a new of way of storing data that is more agile than what we have been used to in data warehouses. This is mainly based on the principle that you should not have thought through every way of consuming data before storing the data.

This agility is also the main reason for fear around data lakes. Possible lack of control and standardization leads to warnings about that a data lake will quickly develop into a data swamp.

LakeIn my eyes we need solutions build on the data lake concept if we want business agility – and we do want that. But I also believe that we need to put data in data lakes in context.

Fortunately, there are many examples of movements in that direction. A recent article called The Informed Data Lake: Beyond Metadata by Neil Raden has a lot of good arguments around a better context driven approach to data lakes.

As reported in the post Multi-Domain MDM 360 and an Intelligent Data Lake the data management vendor Informatica is on that track too.

In all humbleness, my vision for data lakes is that a context driven data lake can serve purposes beyond analytical use within a single company and become a driver for business agility within business ecosystems like cross company supply chains as expressed in the LinkedIn Pulse post called Data Lakes in Business Ecosystems.

Bookmark and Share

Reducing the Reverse Supply Chain by Improving the Forward Data Supply Chain

An increasing issue arisen in the customer self-service age – first and foremost as seen in e-commerce – is the increasing reverse supply chain. A reverse supply chain is the flow of products being returned down the supply chain because the end customer did not want or like the product.

There are several reasons for returned products. Bad product quality is an old known reason. Bad data quality is a new important reason. Bad data quality is when the end customer did not have the right data to support the purchase. The main root cause for this is incomplete data as missing specification, missing images and other digital assets as well as missing information about related products.

Some different kinds of product data was examined in the post Self-Service Ready Product Data. Data that supports customer self-service sales approaches are mainly those data that should be provided through the forward supply chain, meaning that they are originated at the manufacturer and then passed and possibly value added by distributors and retailers.

Increasing reverse supply chains is a huge problem both from a business standpoint due to increased costs and from a society standpoint due to increased environmental impact. To decrease the reverse supply chain we need better means to put comprehensive product information through the forward supply chain in a timely matter.

The Product Data Lake is a solution to do so, as the Product Data Lake ensures:

  • Completeness of product information by enabling trading partners to exchange product data in a uniform way
  • Timeliness of product information by connecting trading partners in a process driven way

Further more, the Product Data Lake ensures:

  • Conformity of product information by encompassing various international standards for product information
  • Consistency of product information by allowing upstream trading partners and downstream trading partners to interact with in-house structure of product information
  • Accuracy of product information by ensuring transparency of product information across the supply chain

Please find more information about the Product Data Lake here.pdl-diagram-new

 

Bookmark and Share

Choosing the Best Term to Use in MDM

Right now I am working with a MDM (Master Data Management) service for sharing product data in the business ecosystems of manufacturers, distributors, retailers and end users of product information.

One of the challenges in putting such a service to the market is choosing the best term for the entities handled by the service.

Below is the current selection with the chosen term and some recognized alternate terms used frequently and found in various standards that exists for exchanging product data:

Terms

Please comment, if you think there are other English (or variant of English) terms that deserves to be in here.

Related Products: The often Overlooked Facet of PIM

Related products

As examined in the post Self-service Ready Product Data, there are three main different kinds of information, which we deal with within Product Information Management (PIM). These are

  • Product attributes, also sometimes called product properties or product features. These are up to thousands of different data elements that describes a product. Some are very common for most products like height, length, weight and colour. Some are very specific to the product category. This challenge is actually the reason of being for dedicated PIM solutions.
  • Digital assets are documents like product images, installation guides, line drawings, data sheets and more advanced formats as videos. You may handle these digital assets in a dedicated Digital Asset Management (DAM) system or use facilities within a PIM solution or other kind of solutions for that.
  • Related products are the links between a product and other products like a product that have several different accessories that goes with the product or a product being a successor of another now decommissioned product. Spare parts for a given product is another kind of product relation. And then we have cross-sell and up-sell relations.

While PIM solutions usually have good capabilities for handling related products, it is my experience that many organizations does not utilize this very well.

One challenge is that related products can be sourced in various ways as told in the post Related Parties, Products and Locations. These ways are:

  • From the manufacturer of the product. This source is often good when it comes to product relationship types as accessory and replacement (succession) as well as spare part relations.
  • From the customer. We know this approach from the online sales trick prompting us with the message “People who bought A also bought B”.
  • From internal considerations. Facilitating up-sell can be done by enhancing product data with that kind of product relation.

Sourcing product relations from the manufacturer through the supply chain is a must for solutions that facilitates exchange of product data in business ecosystems. In the Product Data Lake we consequently handle the sharing of product attributes, digital assets and related products.

Bookmark and Share

Multi-Domain MDM 360 and an Intelligent Data Lake

This week I had the pleasure of being at the Informatica MDM 360 event in Paris. The “360” predicate is all over in the Informatica communication. There are the MDM 360 events around the world.  The Product 360 solution – the new wrap of the old Heiler PIM solution, as I understand it. The Supplier 360 solution. Some Customer 360 stuff including the Cloud Customer 360 for Salesforce edition.

GW MDMAll these solutions constitutes one of the leading Multi-Domain MDM offerings on the market – if not the leading. We will be wiser on that question when Gartner (the analyst firm) makes their first Multi-Domain MDM Magic Quadrant later this year as reported in the post Gravitational Waves in the MDM World.

Until now, Informatica has been very well positioned for Customer MDM, but not among the leaders for Product MDM in the ranking according to Gartner. Other analysts, as Information Difference, have Informatica in the top right corner of the (Multi-Domain) MDM landscape as seen here.

MDM and big data is another focus area for Informatica and Informatica has certainly been one of the first MDM vendors who have embraced big data – and that not just with wording in marketing. Today we cannot say big data without saying data lake. Informatica names their offering the Intelligent Data Lake.

For me, it will be interesting to see how Informatica can take full Multi-Domain MDM leadership with combining a good Product MDM solution with an Intelligent Data Lake.

Bookmark and Share

1st Party, 2nd Party and 3rd Party Master Data

Until now, much of the methodology and technology in the Master Data Management (MDM) world has been about how to optimize the use of what can be called first party master data. This is master data already collected within your organization and the approaches to MDM and the MDM solutions offered has revolved around federating internal silos and obtain a single source of truth within the corporate walls.

Besides that third-party data has been around for many years as described in the post Third-Party Data and MDM. Use of third party data in MDM has mainly been about enriching customer and supplier master data from business directories and in some degree utilizing standardized pools of product data in various solutions.

open doorUsing third party data for customer and supplier master data seems to be a very good idea as exemplified in the post Using a Business Entity Identifier from Day One. This is because customer and supplier master looks pretty much the same to every organization. With product master data this is not case and that is why third party sources for product master data may not be fully effective.

Second party data is data you get directly from the external source. With customer and supplier master data we see that approach in self-registration services. My recommendation is to combine self-registration and third party data in customer and supplier on-boarding processes. With product master data I think leaning mostly to second party connections in business ecosystems seems like the best way forward. There is more on that in a discussion on the LinkedIn  MDM – Master Data Management Group.

Bookmark and Share

Using a Business Entity Identifier from Day One

One of the ways to ensure data quality for customer – or rather party – master data when operating in a business-to-business (B2B) environment, is to on-board new entries using an external defined business entity identifier.

By doing that, you tackle some of the most challenging data quality dimensions as:

  • Uniqueness, by checking if a business with that identifier already exist in your internal master data. This approach is superior to using data matching as explained in the post The Good, Better and Best Way of Avoiding Duplicates.
  • Accuracy, by having names, addresses and other information defaulted from a business directory and thus avoiding those spelling mistakes that usually are all over in party master data.
  • Conformity, by inheriting additional data as line-of-business codes and descriptions from a business directory.

Having an external business identifier stored with your party master data helps a lot with maintaining data quality as pondered in the post Ongoing Data Maintenance.

Busienss Entity IdentifiersWhen selecting an identifier there are different options as national IDs, LEI, DUNS Number and others as explained in the post Business Entity Identifiers.

At the Product Data Lake service I am working on right now, we have decided to use an external business identifier from day one. I know this may be something a typical start-up will consider much later if and when the party master data population has grown. But, besides being optimistic about our service, I think it will be a win not to have to fight data quality issues later with guarantied increased costs.

For the identifier to use we have chosen the DUNS Number from Dun & Bradstreet. The reason is that this currently is the only worldwide covered business identifier. Also, Dun & Bradstreet offers some additional data that fits our business model. This includes consistent line-of-business information and worldwide company family trees.

Bookmark and Share

Adding Business Ecosystems to Omnichannel

Omichannel has become a buzzword in marketing and beyond. The jury is still out on what omnichannel really is, but most will agree that it is a refinement and/or extension of earlier known buzzwords as multichannel and cross channel. You may learn more in this article.

In omnichannel you will try really, really hard to have a single customer view across all channels, and you will try really, really, really hard to present your product information in a uniform and consistent way across all channels.

One challenge here is that your business is not an island. You are part of a business ecosystem, or several of them, as examined in the post Data Management for Business Ecosystems.

“Your customer” may look at “your product” in the sphere of another member of your business ecosystem. It may be at one of your trading partners or at one of your competitors.

So, what can you do about this when it comes to data management?

In the hard case, your competitors, it is about knowing more about your customer. Knowing about your customers relationships. Knowing about your customers relations with products and their categories. Knowing about your customer’s locational belonging. All in all the case of multidomain MDM as seen in the post Multi-Domain MDM and Data Quality Dimensions.

Omni
Expand digitilization across business ecosystems from single purposes to cover an omnichannel view

Besides your own product information you must register what you know about that product information as it is stored and handled by other members in your business ecosystem – trading partners and competitors.

With product information, you must be able to exchange that with your trading partners. You cannot expect that everyone is handling the information about the same product in exact the same way as you. Actually you should not want that. You want to be better than your competitors in some ways and you want to add value for your trading partners. But you would for sure find value in joining a place of intersection where common known characteristics about products are exchanged between trading partners – such as the Product Data lake.

Bookmark and Share

Self-service Ready Product Data

The increased use of self-service based sales approaches as in ecommerce has put a lot of pressure on cross company supply chains. Besides handling the logistics and controlling pricing, you also have to take care of a huge amount of product data and digital assets describing the goods.

You may divide product information into these five levels:

Product Information Levels

Please learn more about the five levels of product information, including how hierarchies, pricing and logistics fits in, by visiting the product information castle.

Level 4 in this model is self-service product data being:

  • Product attributes, also sometimes called product properties or product features. These are up to thousands of different data elements that describes a product. Some are very common for most products like height, length, weight and colour. Some are very specific to the product category. This challenge is actually the reason of being for dedicated Product Information Management (PIM) solutions.
  • Basic product relations are the links between a product and other products like a product that have several different accessories that goes with the product or a product being a successor of another now decommissioned product.
  • Standard digital assets are documents like installation guides, line drawings and data sheets.

These are the product data that helps the end customer comparing products and making an objective choice when buying a product for a specific purpose of use. These data are also helpful in answering the questions a buyer may have when making a purchase.

Every piece of data belonging to any level of product information may be forwarded through the cross company supply chain from the manufacturer to the end seller. Self-service product data are however the data that most obviously will do so.

In order to support end customer self-service when producing, distributing and selling goods you must establish a process driven service that automates the introduction of new products with extensive product data, the inclusion of new kinds of product data and updates to those data. You must be a digitalized member of your business ecosystem. The modern solution for that is the Product Data Lake.

Bookmark and Share

Multilingual? Mais oui! Natürlich.

Is that piece of data wrong or right? This may very well be a question about in what language we are talking about.

In an earlier double post on this blog I had a small quiz about the name of the Pope in the Catholic church. The point was that all possible answers were right as explained in post When Bad Data Quality isn’t Bad Data. The thing is that the Pope over the wold has local variants over the English name Francis. François in French, Franziskus in German, Francesco in Italian, Francisco in Spanish Franciszek in Polish, Frans in Danish and Norwegian and so on.

In today’s globalized, or should I say globalised, world, it is important that our data can be represented in different languages and that the systems we use to handle the data is built for that. The user interface may be in a certain flavor/flavour of English only, but the data model must cater for storing and presenting data in multiple languages and even variants of languages as English in its many forms. Add to that the capability of handling other characters than Latin in other script systems than alphabets as examined in the post called Script Systems.

This challenge is very close to me right when we are building a service for sharing product information in business ecosystems. So will the Product Data Lake be multilingual? Mais oui! Natürlich. Jo da.

PDL Example

PS: The Product Data Lake will actually help with collecting product information in multiple languages through the supply chains of product manufacturers, distributors, retailers and end users.

Bookmark and Share