It is time to apply AI to MDM and PIM

The intersection between Artificial Intelligence (AI) and Master Data Management (MDM) – and the associated discipline Product Information Management (PIM) – is an emerging topic.

A use case close to me

In my work at setting up a service called Product Data Lake the inclusion of AI has become an important topic. The aim of this service is to translate between the different taxonomies in use at trading partners for example when a manufacturer shares his product information with a merchant.

In some cases the manufacturer, the provider of product information, may use the same standard for product information as the merchant. This may be deep standards as eCl@ss and ETIM or pure product classification standards as UNSPSC. In this case we can apply deterministic matching of the classifications and the attributes (also called properties or features).

Product Data Syndication

However, most often there are uncovered areas even when two trading partners share the same standard. And then again, the most frequent situation is that the two trading partners are using different standards.

In that case we initially will use human resources to do the linking. Our data governance framework for that includes upstream (manufacturer) responsibility, downstream (merchant) responsibility and our ambassador concept.

As always, applying too much human interaction is costly, time consuming and error prone. Therefore, we are very eagerly training our machines to be able to do this work in a cost-effective way, within a much shorter time frame and with a repeatable and consistent outcome to the benefit of the participating manufacturers, merchants and other enterprises involved in exchanging products and the related product information.

Learning from others

This week I participated in a workshop around exchanging experiences and proofing use cases for AI and MDM. The above-mentioned use case was one of several use cases examined here. And for sure, there is a basis for applying AI with substantial benefits for the enterprises who gets this. The workshop was arranged by Camelot Management Consultants within their Global Community for Artificial Intelligence in MDM.

New Routes for Products. New Routes for Product Information

One of the news this week was that Maersk for the first time is taking a large container ship from East Asia to Europe using a Northern Route through the Arctic waters as told in this Financial Times article.

Arctic route

The purpose of this trip is to explore the possibility of avoiding the longer Southern Route including shoehorning the sea traffic through the narrow Suez Canal. A similar opportunity exists around North America as an alternative to going through The Panama Canal.

Similar to moving products and finding new routes for that we may also explore new routes when it comes to moving information about products. Until now the possibilities, besides cumbersome exchange of spreadsheets, have been to shoehorn product information from the manufacturer into a consensus-based data portal or data pool from where the merchant can fetch the information in accurate the same shape as his competitors does.

At Product Data Lake we have explored shorter, more agile and diverse new routes for that. We call it Product Data Syndication Freedom.

Three Flavors of Data Monetization

The term data monetization is trending in the data management world.

Data monetization is about harvesting direct financial results from having access to data that is stored, maintained, categorized and made accessible in an optimal manner. Traditionally data management & analytics has contributed indirectly to financial outcome by aiming at keeping data fit for purpose in the various business processes that produced value to the business. Today the best performers are using data much more directly to create new services and business models.

In my view there are three flavors of data monetization:

  • Selling data: This is something that have been known to the data management world for years. Notable examples are the likes of Dun & Bradstreet who is selling business directory data as touched in the post What is a Business Directory? Another examples is postal services around the world selling their address directories. This is the kind of data we know as third party data.
  • Wrapping data around products: If you have a product – or a service – you can add tremendous value to these products and services and make them more sellable by wrapping data, potentially including third party data, around those products and services. These data will thus become second party data as touched in the post Infonomics and Second Party Data.
  • Advanced analytics and decision making: You can combine third party data, second party data and first party data (your own data) in order to make advanced analytics and fast operational decision making in order to sell more, reduce costs and mitigate risks.

Please learn more about data monetization by downloading a recent webinar hosted by Information Builders, their expert Rado Kotorov and yours truly here.

Data Monetization

The Cases for Data Matching in Multi-Domain MDM

Data matching has always been a substantial part of the capabilities in data quality technology and have become a common capability in Master Data Management (MDM) solutions.

We use the term data matching when talking about linking entities where we cannot just use exact keys in databases.

The most prominent example around is matching names and addresses related to parties, where these attributes can be spelled differently and formatted using different standards but do refer to the same real-world entity. Most common scenarios are deduplication, where we clean up databases for duplicate customer, vendor and other party role records and reference matching, where we identify and enrich party data records with external directories.

A way to pre-process party data matching is matching the locations (addresses) with external references, which has become more and more available around the world, so you have a standardized address in order to reduce the fuzziness. In some geographies you can even make use of more extended location data, as whether the location is a single-family house, a high-rise building, a nursing home or campus. Geocodes can also be brought into the process.

matching MDMHandling the location as a separate unique entity can also be used in many industries as utility, telco, finance, transit and more.

For product data achieving uniqueness usually is a lesser pain point as told in the post Multi-Domain MDM and Data Quality Dimensions. But for sure requirements for matching products arises from time to time.

In the old days this was quite difficult as you often only had a product description that had to be parsed into discrete elements as examined in the post Matching Light Bulbs.

With the rise of Product Information Management (PIM) we now often do have the product attributes in a granular form. However, using traditional matching technology made for party master data will not do the trick as this is a different and more complex scenario. My thinking is that graph technology will help as touched in the post Three Ways of Finding a Product.

Where a Major Tool is Not So Cool

During my engagements in selecting and working with the major data management tools on the market, I have from time to time experienced that they often lack support for specialized data management needs in minor markets.

Two such areas I have been involved with as a Denmark based consultant are:

  • Address verification
  • Data masking

Address verification:

The authorities in Denmark offers a free of charge access to very up to data and granular accurate address data that besides the envelope form of an address also comes with a data management friendly key (usually referred to as KVHX) on the unit level for each residential and business address within the country. Besides the existence of the address you also have access to what activity that takes place on the address as for example if it is a single-family house, a nursing home, a campus and other useful information for verification, matching and other data management activities.

If you want to verify addresses with the major international data managements tools I have come around, much of these goodies are gone, as for example:

  • Address reference data are refreshed only once per quarter
  • The key and the access to more information is not available
  • A price tag for data has been introduced

Data Masking:

In Denmark (and other Scandinavian countries) we have a national identification number (known as personnummer) used much more intensively than the national IDs known from most other countries as told in the post Citizen ID within seconds.

The data masking capabilities in major data management solutions comes with pre-build functions for national IDs – but only covering major markets as the United States Social Security Number, the United Kingdom NINO and the kind of national id in use in a few other large western countries.

So, GDPR compliance is just a little bit harder here even when using a major tool.

Data Masking National ID.png
From IBM Data Masking documentation

Building a MDM Solution Using Best-in-Class Modules

Sometimes keeping it simple is the shortcut to getting it all wrong. While I am a believer in mastering all master data domains under the same vision and strategy, there are still best-in-class options when it comes to orchestrating processes and applying technology in the right chunks.

Customer Data Integration (CDI)

A recent post on this blog was called What Happened to CDI? This post examines the two overlapping disciplines Master Data Management (MDM) and Customer Data Integration (CDI). In a comment Jeff Jones argues that MDM vendors have forgotten about proper CDI workflows. Jeff says: “It seems the industry wants to go from Source to Match/Merge, instead of Source to Match/Identify and finally to Merge.” Please find and jump into the discussion here.

Also, this question was touched some years ago in the post The Place for Data Matching in and around MDM.

Product Information Management (PIM)

The product domain within Multi-Domain MDM also holds some risks of forgetting the proper ways of handling product information. In this domain we must also avoid being blinded by the promise of a single source of master data with surrounding processes and applied technology.

There are many end-to-ends to cover properly as exemplified in the post A Different End-to-End Solution for Product Information Management (PIM).

 

Master Data or

Classification of PIM Solutions

A core capability in a Product Information Management (PIM) solution is the ability to work with product classification, meaning having a way to group products for multiple purposes like how to present products in meaningful groups to potential customers and how to make sure all relevant product attributes are present for a similar group of products. This is a daunting task, usually much more demanding than the technical implementation of the PIM solution itself.

Ironically, we are also having trouble with grouping solutions for handling product data into meaningful groups. One challenge is the overlap with surrounding disciplines as discussed in the post How MDM, PIM and DAM Stick Together. This post deals with classifying solutions as Master Data Management (MDM), Product Information Management (PIM) and/or Digital Asset Management (DAM).

Then there is the selection of Three Letter Acronyms starting with P and ending with M:

  • PCM: Product Content (or Catalog) Management
  • PDM: Product Data Management
  • PIM: Product Information Management
  • PLM: Product Lifecycle Management

A recent post from the declared PIM vendor Venzee examines PCM vs. PIM: Which One Does Your Ecommerce Business Need? (The blog does not exist anymore).

In here Venzee states: “You will occasionally see PCM solutions presented as if they were actually PIM platforms. Don’t get fooled. Yes, there are similarities and terminology overlaps, but PCM is not PIM. Think of PCM as PIM’s little cousin — it’s a place to house and enrich your data, but that’s about it. Ecommerce vendors that really want to manage, optimize and distribute their data need a good PIM platform”

PDL MenuIn my current venture called Product Data Lake a challenge is explaining what kind of solution it is. I usually call it PIM-2-PIM, as it is a solution that can make two different PIM solutions at two different trading partners interact. But it might as well be PIM-2-MDM or PLM-2-PIM or DAM-2-PCM or any other available combination. Anyway, I have put our solution on The Disruptive MDM/PIM List here.

PS: If you have a solution covering Master Data and Product Information, you can register it on The Disruptive MDM/PIM List here.

What Happened to CDI?

CDI is a Three Letter Acronym which in the data management world stands for Customer Data Integration.

Today CDI is usually wrapped into Master Data Management (MDM) as examined in the post CDI, PIM, MDM and Beyond. As mentioned in this post, a well-known analyst, Aaron Zornes, runs a business called the MDM Institute, which was originally called the The Customer Data Integration Institute and still has this website: http://www.tcdii.com/.

Many Master Data Management (MDM) vendors today emphasizes on being multidomain, meaning their solutions can manage customer, supplier employee and other party master data as well as product, asset, location and other core business entity types.

However, some vendors still focus on customer master data and the topic of integrating customer data by excelling in the special pain points here, not at least identity resolution and sustainable merge/purge of duplicates. One example is Uniserv Smart Customer MDM.

In my recent little venture called The Disruptive Master Data Management Solution List the aim is to cover all kinds of MDM solutions: Small or big. New (start-up) or old. Multidomain MDM, Customer Data Integration (CDI), Product Information Management (PIM) or even Digital Asset Management (DAM). As a potential buyer, you can browse all these solutions and select your choice of one-stop-shopping candidates or combine best-of-breed solution candidates that matches your requirements in your industry and geography.

First thing that must happen is that vendors register their solutions on the site here.

MDM

The Good, the Better and the Best Kinds of Data Quality Technology

If I look at my journey in data quality I think you can say, that I started with working with the good way of implementing data quality tools, then turned to some better ways and, until now at least, is working with the best way of implementing data quality technology.

It is though not that the good old kind of tools are obsolete. They are just relieved from some of the repeating of the hard work in cleaning up dirty data.

The good (old) kind of tools are data cleansing and data matching tools. These tools are good at finding errors in postal addresses, duplicate party records and other nasty stuff in master data. The bad thing about finding the flaws long time after the bad master data has entered the databases, is that it often is very hard to do the corrections after transactions has been related to these master data and that, if you do not fix the root cause, you will have to do this periodically. However, there still are reasons to use these tools as reported in the post Top 5 Reasons for Downstream Cleansing.

The better way is real time validation and correction at data entry where possible. Here a single data element or a range of data elements are checked when entered. For example the address may be checked against reference data, phone number may be checked for adequate format for the country in question or product master data is checked for the right format and against a value list. The hard thing with this is to do it at all entry points. A possible approach to do it is discussed in the post Service Oriented MDM.

The best tools are emphasizing at assisting data capture and thus preventing data quality issues while also making the data capture process more effective by connecting opposite to collecting. Two such tools I have worked with are:

·        IDQ™ which is a tool for mashing up internal party master data and 3rd party big reference data sources as explained further in the post instant Single Customer View.

·        Product Data Lake, a cloud service for sharing product data in the business ecosystems of manufacturers, distributors, retailers and end users of product information. This service is described in detail here.

DQ

What is in a business directory?

When working with Party Master Data Management one approach to ensure accuracy, completeness and other data quality dimensions is to onboard new business-to-business (B2B) entities and enrich such current entities via a business directory.

While this could seem to be a straight forward mechanism, unfortunately it usually is not that easy peasy.

Let us take an example featuring the most widely used business directory around the world: The Dun & Bradstreet Worldbase. And let us take my latest registered company: Product Data Lake.

PDL at DnB

On this screen showing the basic data elements, there are a few obstacles:

  • The address is not formatted well
  • The country code system is not a widely used one
  • The industry sector code system shown is one among others

Address Formatting

In our address D&B has put the word “sal”, which is Danish for floor. This is not incorrect, but addresses in Denmark are usually not written with that word, as the number following a house number in the addressing standard is the floor.

Country Codes

D&B has their own 3-digit country code. You may convert to the more widely used ISO 2-character country code. I do however remember a lot of fun from my data matching days when dealing with United Kingdom where D&B uses 4 different codes for England, Wales, Scotland and Northern Ireland as well as mapping back and forth with United States and Puerto Rico. Had to be made very despacito.

Industry Sector Codes

The screen shows a SIC code: 7374 = Computer Processing and Data Preparation and Processing Services

This must have been converted from the NACE code by which the company has been registered:  63.11:(00) = Data processing, hosting and related activities.

The two codes do by the way correspond to the NAICS Code 518210 = Data processing, hosting and related activities.

The challenges in embracing the many standards for reference data was examined in the post The World of Reference Data.