Are You Still Scared by the Data Lake?

4 years ago, a post on this blog was called The Scary Data Lake. The post was about the fear about if the then new data lake concept would lead to data swamps with horrific data quality, data dumps no one would ever use, data cesspools with all the bad governed data and data sumps that would never be part of the business processes.

For sure, there have been mistakes with data lakes. But it seems that the data lake concept has matured and the understanding of what a data lake can do good is increasing. The data lake concept has even grown out of the analytic world and into more operational cases as told in the post Welcome to Another Data Lake for Data Sharing.

Some of the things we have learned is to apply well known data management principles to data lakes too. This encompasses metadata management, data lineage capabilities and data governance as reported in the post Three Must Haves for your Data Lake.

Data Lake at Halloween

How Wholesalers and Dealers of Building Materials can Improve Product Information Efficiency

MaterialsBuilding materials is a very diverse product group. As a wholesaler or dealer, you will have to manage many different attributes and digital assets depending on which product classification we are talking about.

Getting these data from a diverse crowd of suppliers is a hard job. You may have a spreadsheet for each product group where you require data from your suppliers, but this means a lot of follow up and work in putting the data into your system. You may have a supplier portal, but suppliers are probably reluctant to use it, because they cannot deal with hundreds of different supplier portals from you and all the other wholesalers and dealers possibly across many countries. In the same way that you are not happy about if you must fetch data from hundreds of different customer portals provided by manufacturers and other brand owners.

This also means that even if you can handle the logistics, you must limit your regular assortment of products and therefore often deal with special ad hoc products when they are needed to form a complete range of products asked for by your customers for a given building project. Handling of “specials” is a huge burden and the data gathering must usually be repeated if the product turns up again.

At Product Data Lake we have developed a solution to these challenges. It is a cloud service where your suppliers can provide product information in their way and you can pull the information in the way that fits your taxonomy, structure and format.

Learn about and follow the solution on our Product Data Pull LinkedIn page.

If you are interested, please ask for more information here:

 

Linked Product Data Quality

Some years ago the theme of Linked Data Quality was examined here on the blog.

As stated in the post a lot of product data is already out there waiting to be found, categorized, matched and linked.

Doing this is at the core of the Product Data Lake venture I am involved with. What we aim to do is linking product information stored using different taxonomies at trading partners, preferable by referencing international and industry standards as eCl@ss, ETIM, UNSPSC, Harmonized System, GPC and more.

Our approach is not to reinvent the wheel, but to collaborate with partners in the industry. This include:

  • Experts within a type of product as building materials and sub-sectors in this industry, machinery, chemicals, automotive, furniture and home-ware, electronics, work clothes, fashion, books and other printed materials, food and beverage, pharmaceuticals and medical devices. You may be a specialist in certain standards for product data. As an ambassador you will link the taxonomy in use at two trading partners or within a larger business ecosystem.
  • Product data cleansing specialists who have proven track records in optimizing product master data and product information. As an ambassador you will prepare the product data portfolio at a trading partner and extend the service to other trading partners or within a larger business ecosystem.
  • System integrators who can integrate product data syndication flows into Product Information Management (PIM) and other solutions at trading partners and consult on the surrounding data quality and data governance issues. As an ambassador, you will enable the digital flow of product information between two trading partners or within a larger business ecosystem.
  • Tool vendors who can offer in-house Product Information Management (PIM) / Master Data Management (MDM) solutions or similar solutions in the ERP and Supply Chain Management (SCM) sphere. As an ambassador you will able to provide, supplement or replace customer data portals at manufacturers and supplier data portals at merchants and thus offer truly automated and interactive product data syndication functionality.
  • Technology providers with data governance solutions, data quality management solutions and Artificial Intelligence (AI) / machine learning capacities for classifying and linking product information to support the activities made by ambassadors and subscribers.
  • Reservoirs, as Product Data Lake is a unique opportunity for service providers with product data portfolios (data pools and data portals) for utilizing modern data management technology and offer a comprehensive way of collecting and distributing product data within the business processes used by subscribers.

See more on the Product Data Link site, on the Product Data Link showcase page on LinkedIn or get in contact right away:

 

Become a Product Data lake ambassador!

cropped-badgesmall.png

 

 

Welcome to Another Data Lake for Data Sharing

A couple of weeks ago Microsoft, Adobe and SAP announced their Open Data Initiative. While this, as far as we know, is only a statement for now, it of course has attracted some interest based on that it is three giants in the IT industry who have agreed on something – mostly interpreted as agreed to oppose Salesforce.com.

Forming a business ecosystem among players in the market is not new. However, what we usually see is that a group of companies agrees on a standard and then each one of them puts a product or service, that adheres to that standard, on the market. The standard then caters for the interoperability between the products and services.

In this case its seems to be something different. The product or service is operated by Microsoft based on their Azure platform. There will be some form of a common data model. But it is a data lake, meaning that we should expect that data can be provided in any structure and format and that data can be consumed into any structure and format.

In all humbleness, this concept is the same as the one that is behind Product Data Lake.

The Open Data Initiative from Microsoft, Adobe and SAP focuses at customer data and seems to be about enterprise wide customer data. While it technically also could support ecosystem wide customer data, privacy concerns and compliance issues will restrict that scope in many cases.

At Product Data Lake, we do the same for product data. Only here, the scope is business ecosystem wide as the big pain with product data is the flow between trading partners as examined here.

Open Data Initiative SAP Adobe Microsoft

Digitalization has Put Data in the Forefront

20 years ago, when I started working as a contractor and entrepreneur in the data management space, data was not on the top agenda at many enterprises. Fortunately, that has changed.

An example is displayed by Schneider Electric CEO Jean-Pascal Tricoire in his recent blog post on how digitization and data can enable companies to be more sustainable. You can read it on the Schneider Electric Blog in the post 3 Myths About Sustainability and Business.

Manufacturers in the building material sector naturally emphasizes on sustainability. In his post Jean-Pascal Tricoire says: “The digital revolution helps answering several of the major sustainability challenges, dispelling some of the lingering myths regarding sustainability and business growth”.

One of three myths dispelled is: Sustainability data is still too costly and time-consuming to manage.

From my work with Master Data Management (MDM) and Product Information Management (PIM) at manufacturers and merchants in the building material sector I know that managing the basic product data, trading data and customer self-service ready product data is hard enough. Taking on sustainability data will only make that harder. So, we need to be smarter in our product data management. Smart and sustainable homes and smart sustainable cities need smart product data management.

In his post Jean-Pascal Tricoire mentions that Schneider Electric has worked with other enterprises in their ecosystem in order to be smarter about product data related to sustainability. In my eyes the business ecosystem theme is key in the product data smartness quest as pondered in the post about How Manufacturers of Building Materials Can Improve Product Information Efficiency.

MDMDG 2013 wordle

 

It is time to apply AI to MDM and PIM

The intersection between Artificial Intelligence (AI) and Master Data Management (MDM) – and the associated discipline Product Information Management (PIM) – is an emerging topic.

A use case close to me

In my work at setting up a service called Product Data Lake the inclusion of AI has become an important topic. The aim of this service is to translate between the different taxonomies in use at trading partners for example when a manufacturer shares his product information with a merchant.

In some cases the manufacturer, the provider of product information, may use the same standard for product information as the merchant. This may be deep standards as eCl@ss and ETIM or pure product classification standards as UNSPSC. In this case we can apply deterministic matching of the classifications and the attributes (also called properties or features).

Product Data Syndication

However, most often there are uncovered areas even when two trading partners share the same standard. And then again, the most frequent situation is that the two trading partners are using different standards.

In that case we initially will use human resources to do the linking. Our data governance framework for that includes upstream (manufacturer) responsibility, downstream (merchant) responsibility and our ambassador concept.

As always, applying too much human interaction is costly, time consuming and error prone. Therefore, we are very eagerly training our machines to be able to do this work in a cost-effective way, within a much shorter time frame and with a repeatable and consistent outcome to the benefit of the participating manufacturers, merchants and other enterprises involved in exchanging products and the related product information.

Learning from others

This week I participated in a workshop around exchanging experiences and proofing use cases for AI and MDM. The above-mentioned use case was one of several use cases examined here. And for sure, there is a basis for applying AI with substantial benefits for the enterprises who gets this. The workshop was arranged by Camelot Management Consultants within their Global Community for Artificial Intelligence in MDM.

Share or be left out of business

Enterprises are increasingly going to be part of business ecosystems where collaboration between legal entities not belonging to the same company family tree will be the norm.

This trend is driven by digital transformation as no enterprise possibly can master all the disciplines needed in applying a digital platform to traditional ways of doing business.

Enterprises are basically selfish. This is also true when it comes to Master Data Management (MDM). Most master data initiatives today revolve around aligning internal silos of master data and surrounding processes to fit he business objectives within an enterprise as a whole. And that is hard enough.

However, in the future that is not enough. You must also be able share master data in the business ecosystems where your enterprise will belong. The enterprises that, in a broad sense, gets this first will survive. Those who will be laggards are in danger of being left out of business.

This is the reason of being for Master Data Share.

Master Data Share or be OOB

Three Flavors of Data Monetization

The term data monetization is trending in the data management world.

Data monetization is about harvesting direct financial results from having access to data that is stored, maintained, categorized and made accessible in an optimal manner. Traditionally data management & analytics has contributed indirectly to financial outcome by aiming at keeping data fit for purpose in the various business processes that produced value to the business. Today the best performers are using data much more directly to create new services and business models.

In my view there are three flavors of data monetization:

  • Selling data: This is something that have been known to the data management world for years. Notable examples are the likes of Dun & Bradstreet who is selling business directory data as touched in the post What is a Business Directory? Another examples is postal services around the world selling their address directories. This is the kind of data we know as third party data.
  • Wrapping data around products: If you have a product – or a service – you can add tremendous value to these products and services and make them more sellable by wrapping data, potentially including third party data, around those products and services. These data will thus become second party data as touched in the post Infonomics and Second Party Data.
  • Advanced analytics and decision making: You can combine third party data, second party data and first party data (your own data) in order to make advanced analytics and fast operational decision making in order to sell more, reduce costs and mitigate risks.

Please learn more about data monetization by downloading a recent webinar hosted by Information Builders, their expert Rado Kotorov and yours truly here.

Data Monetization

Product Data Syndication Freedom

When working with product data syndication in supply chains the big pain is that data standards in use and the preferred exchange methods differ between supply chain participants.

As a manufacturer you will have hundreds of re-sellers who probably have data standards different from you and most likely wants to exchange data in a different way than you do.

As a merchant you will have hundreds of suppliers who probably have data standards different from you and most likely wants to exchange data in a different way than you do.

The aim of Product Data Lake is to take that pain away from both the manufacturer side and the merchant side. We offer product data syndication freedom by letting you as manufacturer push product information using your data standards and your preferred exchange method and letting you as a merchant pull product information using your data standards and your preferred exchange method.

Product Data SyndicationIf you want to know more. Get in contact here:

Avoid Duplicates by Avoiding Peer-to-Peer Integrations

When working in Master Data Management (MDM) programs some of the main pain points always on the list are duplicates. As explained in the post Golden Records in Multi-Domain MDM this may be duplicates in party master data (customer, supplier and other roles) as well as duplicates in product master data, assets, locations and more.

Most of the data quality technology available to solve these problems revolves around identifying duplicates.  This is a very intriguing discipline where I have spent some of my best years. However, this is only a remedy to the symptoms of the problem and not a mean to eliminate the root cause as touched in the post The Good, Better and Best Way of Avoiding Duplicates.

The root causes are plentiful and as all challenges they involve technology, processes and people.

Having an IT landscape with multiple applications where master data are a created, updated and consumed is a basic problem and a remedy to that is the main reason of being for Master Data Management (MDM) solutions. The challenge is to implement MDM technology in a way that the MDM solution will not just become another silo of master data but instead be solution for sharing master data within the enterprise – and ultimately in the digital ecosystem around the enterprise.

blind-spot-take-careThe main enemy from a technology perspective is in my experience peer-to-peer system integration solutions. If you have chosen application X to support a business objective and application Y to support another business objective and you learn that there is an integration solution between X and Y available, this is very bad news. Because short term cost and timing considerations will make that option obvious. But in the long run it will cost you dearly if the master data involved are handled in other applications as well. Because then you will have blind spots all over the place where through duplicates will enter.

The only sustainable solution is to build a master data hub where through master data are integrated and thus shared with all applications inside the enterprise and around the enterprise. This hub must encompass a shared master data model and related metadata.