Classification of PIM Solutions

A core capability in a Product Information Management (PIM) solution is the ability to work with product classification, meaning having a way to group products for multiple purposes like how to present products in meaningful groups to potential customers and how to make sure all relevant product attributes are present for a similar group of products. This is a daunting task, usually much more demanding than the technical implementation of the PIM solution itself.

Ironically, we are also having trouble with grouping solutions for handling product data into meaningful groups. One challenge is the overlap with surrounding disciplines as discussed in the post How MDM, PIM and DAM Stick Together. This post deals with classifying solutions as Master Data Management (MDM), Product Information Management (PIM) and/or Digital Asset Management (DAM).

Then there is the selection of Three Letter Acronyms starting with P and ending with M:

  • PCM: Product Content (or Catalog) Management
  • PDM: Product Data Management
  • PIM: Product Information Management
  • PLM: Product Lifecycle Management

A recent post from the declared PIM vendor Venzee examines PCM vs. PIM: Which One Does Your Ecommerce Business Need? (The blog does not exist anymore).

In here Venzee states: “You will occasionally see PCM solutions presented as if they were actually PIM platforms. Don’t get fooled. Yes, there are similarities and terminology overlaps, but PCM is not PIM. Think of PCM as PIM’s little cousin — it’s a place to house and enrich your data, but that’s about it. Ecommerce vendors that really want to manage, optimize and distribute their data need a good PIM platform”

PDL MenuIn my current venture called Product Data Lake a challenge is explaining what kind of solution it is. I usually call it PIM-2-PIM, as it is a solution that can make two different PIM solutions at two different trading partners interact. But it might as well be PIM-2-MDM or PLM-2-PIM or DAM-2-PCM or any other available combination. Anyway, I have put our solution on The Disruptive MDM/PIM List here.

PS: If you have a solution covering Master Data and Product Information, you can register it on The Disruptive MDM/PIM List here.

What Happened to CDI?

CDI is a Three Letter Acronym which in the data management world stands for Customer Data Integration.

Today CDI is usually wrapped into Master Data Management (MDM) as examined in the post CDI, PIM, MDM and Beyond. As mentioned in this post, a well-known analyst, Aaron Zornes, runs a business called the MDM Institute, which was originally called the The Customer Data Integration Institute and still has this website: http://www.tcdii.com/.

Many Master Data Management (MDM) vendors today emphasizes on being multidomain, meaning their solutions can manage customer, supplier employee and other party master data as well as product, asset, location and other core business entity types.

However, some vendors still focus on customer master data and the topic of integrating customer data by excelling in the special pain points here, not at least identity resolution and sustainable merge/purge of duplicates. One example is Uniserv Smart Customer MDM.

In my recent little venture called The Disruptive Master Data Management Solution List the aim is to cover all kinds of MDM solutions: Small or big. New (start-up) or old. Multidomain MDM, Customer Data Integration (CDI), Product Information Management (PIM) or even Digital Asset Management (DAM). As a potential buyer, you can browse all these solutions and select your choice of one-stop-shopping candidates or combine best-of-breed solution candidates that matches your requirements in your industry and geography.

First thing that must happen is that vendors register their solutions on the site here.

MDM

The Good, the Better and the Best Kinds of Data Quality Technology

If I look at my journey in data quality I think you can say, that I started with working with the good way of implementing data quality tools, then turned to some better ways and, until now at least, is working with the best way of implementing data quality technology.

It is though not that the good old kind of tools are obsolete. They are just relieved from some of the repeating of the hard work in cleaning up dirty data.

The good (old) kind of tools are data cleansing and data matching tools. These tools are good at finding errors in postal addresses, duplicate party records and other nasty stuff in master data. The bad thing about finding the flaws long time after the bad master data has entered the databases, is that it often is very hard to do the corrections after transactions has been related to these master data and that, if you do not fix the root cause, you will have to do this periodically. However, there still are reasons to use these tools as reported in the post Top 5 Reasons for Downstream Cleansing.

The better way is real time validation and correction at data entry where possible. Here a single data element or a range of data elements are checked when entered. For example the address may be checked against reference data, phone number may be checked for adequate format for the country in question or product master data is checked for the right format and against a value list. The hard thing with this is to do it at all entry points. A possible approach to do it is discussed in the post Service Oriented MDM.

The best tools are emphasizing at assisting data capture and thus preventing data quality issues while also making the data capture process more effective by connecting opposite to collecting. Two such tools I have worked with are:

·        IDQ™ which is a tool for mashing up internal party master data and 3rd party big reference data sources as explained further in the post instant Single Customer View.

·        Product Data Lake, a cloud service for sharing product data in the business ecosystems of manufacturers, distributors, merchants and end users of product information. This service is described in detail here.

DQ

What is in a business directory?

When working with Party Master Data Management one approach to ensure accuracy, completeness and other data quality dimensions is to onboard new business-to-business (B2B) entities and enrich such current entities via a business directory.

While this could seem to be a straight forward mechanism, unfortunately it usually is not that easy peasy.

Let us take an example featuring the most widely used business directory around the world: The Dun & Bradstreet Worldbase. And let us take my latest registered company: Product Data Lake.

PDL at DnB

On this screen showing the basic data elements, there are a few obstacles:

  • The address is not formatted well
  • The country code system is not a widely used one
  • The industry sector code system shown is one among others

Address Formatting

In our address D&B has put the word “sal”, which is Danish for floor. This is not incorrect, but addresses in Denmark are usually not written with that word, as the number following a house number in the addressing standard is the floor.

Country Codes

D&B has their own 3-digit country code. You may convert to the more widely used ISO 2-character country code. I do however remember a lot of fun from my data matching days when dealing with United Kingdom where D&B uses 4 different codes for England, Wales, Scotland and Northern Ireland as well as mapping back and forth with United States and Puerto Rico. Had to be made very despacito.

Industry Sector Codes

The screen shows a SIC code: 7374 = Computer Processing and Data Preparation and Processing Services

This must have been converted from the NACE code by which the company has been registered:  63.11:(00) = Data processing, hosting and related activities.

The two codes do by the way correspond to the NAICS Code 518210 = Data processing, hosting and related activities.

The challenges in embracing the many standards for reference data was examined in the post The World of Reference Data.

What Will you Complicate in the Year of the Rooster?

rooster-6Today is the first day in the new year. The year of the rooster according to the Lunar Calendar observed in East Asia. One of the characteristics of the year of the rooster is that in this year, people will tend to complicate things.

People usually likes to keep things simple. The KISS principle – Keep It Simple, Stupid – has many fans. But not me. Not that I do not like to keep things simple. I do. But only as simple as it should be as Einstein probably said. Sometimes KISS is the shortcut to getting it all wrong.

When working with data quality I have come across the three below examples of striking the right balance in making things a bit complicated and not too simple:

Deduplication

One of the most frequent data quality issues around is duplicates in party master data. Customer, supplier, patient, citizen, member and many other roles of legal entities and natural persons, where the real world entity are described more than once with different values in our databases.

In solving this challenge, we can use methods as match codes and edit distance to detect duplicates. However, these methods, often called deterministic, are far too simple to really automate the remedy. We can also use advanced probabilistic methods. These methods are better, but have the downside that the matching done is hard to explain, repeat and reuse in other contexts.

My best experience is to use something in between these approaches. Not too simple and not too overcomplicated.

Address verification

You can make a good algorithm to perform verification of postal and visit addresses in a database for addresses coming from one country. However, if you try the same algorithm on addresses from another country, it often fails miserably.

Making an algorithm for addresses from all over the world will be very complicated. I have not seen one yet, that works.

My best experience is to accept the complication of having almost as many algorithms as there are countries on this planet.

Product classification

Classifications of products controls a lot of the data quality dimensions related to product master data. The most prominent example is completeness of product information. Whether you have complete product information is dependent on the classification of the product. Some attributes will be mandatory for one product but make no sense at all to another product by a different classification.

If your product classification is too simple, your completeness measurement will not be realistic. A too granular or other way complicated classification system is very hard to maintain and will probably seem as an overkill for many purposes of product master data management.

My best experience is that you have to maintain several classification systems and have a linking between them, both inside your organization and between your trading partners.

Happy New Lunar Year

The Gartner Magic Quadrant for MDM 2016

The Gartner Magic Quadrant for Master Data Management Solutions 2016 is …… not out.

Though it can be hard for a person not coming from the United States to read those silly American dates, according to this screenshot from today, it should have been out the 19th November 2016.

gartner-mdm-2016

I guess no blue hyperlink means it has not be aired yet and I do not recall having seen any vendor bragging on social media yet either.

The plan that Gartner will retire the old two quadrants for Customer MDM and Product MDM was revealed by Andrew White of Gartner earlier this year in the post Update on our Magic Quadrant’s for Master Data Management 2016.

Well, MDM implementations are often delayed, so why not the Multidomain MDM quadrant too.

In the meantime, we can take a quiz. Please comment with your guess on who will be the leaders, visionaries, challengers and niche players. Closest guess will receive a Product Data Lake t-shirt in your company’s license level size (See here for options).

Social PIM, Take 2

My first blog post on Social PIM (Social Product Information Management) was over 4 years ago.

take-2Since then Product Data Lake has been launched. Product Data Lake resembles a social network as you connect with your trading partners from the real world in order to collaborate on getting complete and accurate product information from the manufacturer to the point-of-sales.

I would love to see you, my blog readers, become involved. The options are:

Interenterprise Data Sharing and the 2016 Data Quality Magic Quadrant

dqmq2016The 2016 Magic Quadrant for Data Quality Tools by Gartner is out. One way to have a free read is downloading the report from Informatica, who is the most-top-right vendor in the tool vendor positioning.

Apart from the vendor positioning the report as always contains valuable opinions and observations about the market and how these tools are used to achieve business objectives.

Interenterprise data sharing is the last mentioned scenario besides BI and analytics (analytical scenarios), MDM (operational scenarios), information governance programs, ongoing operations and data migrations.

Another observation is that 90% of the reference customers surveyed for this Magic Quadrant consider party data a priority while the percentage of respondents prioritizing the product data domain was 47%.

My take on this difference is that it relates to interenterprise data sharing. Parties are per definition external to you and if your count of business partners (and B2C customers) exceeds some thousands (that’s the 90%), you need some of kind of tool to cope with data quality for the master data involved. If your product data are internal to you, you can manage data quality without profiling, parsing, matching and other core capabilities of a data quality tool.  If your product data are part of a cross company supply chain, and your count of products exceeds some thousands (that’s the 47%), you probably have issues with product data quality.

In my eyes, the capabilities of a data quality tool will also have to be balanced differently for product data as examined in the post Multi-Domain MDM and Data Quality Dimensions.

Using a Business Entity Identifier from Day One

One of the ways to ensure data quality for customer – or rather party – master data when operating in a business-to-business (B2B) environment, is to on-board new entries using an external defined business entity identifier.

By doing that, you tackle some of the most challenging data quality dimensions as:

  • Uniqueness, by checking if a business with that identifier already exist in your internal master data. This approach is superior to using data matching as explained in the post The Good, Better and Best Way of Avoiding Duplicates.
  • Accuracy, by having names, addresses and other information defaulted from a business directory and thus avoiding those spelling mistakes that usually are all over in party master data.
  • Conformity, by inheriting additional data as line-of-business codes and descriptions from a business directory.

Having an external business identifier stored with your party master data helps a lot with maintaining data quality as pondered in the post Ongoing Data Maintenance.

Busienss Entity IdentifiersWhen selecting an identifier there are different options as national IDs, LEI, DUNS Number and others as explained in the post Business Entity Identifiers.

At the Product Data Lake service I am working on right now, we have decided to use an external business identifier from day one. I know this may be something a typical start-up will consider much later if and when the party master data population has grown. But, besides being optimistic about our service, I think it will be a win not to have to fight data quality issues later with guarantied increased costs.

For the identifier to use we have chosen the DUNS Number from Dun & Bradstreet. The reason is that this currently is the only worldwide covered business identifier. Also, Dun & Bradstreet offers some additional data that fits our business model. This includes consistent line-of-business information and worldwide company family trees.

Bookmark and Share