Product Data Quality

The data quality tool industry has always had a hard time offering capabilities for solving the data quality issues that relates to product data.

Customer data quality issues has always been the challenges addressed as examined in the post The Future of Data Quality Tools, where the current positioning from the analyst firm Information Difference was discussed. The leaders as Experian Data Quality, Informatica and Trillium (now part of Syncsort) always promote their data quality tools with use cases around customer data.

Back some years Oracle did have a go for product data quality with their Silver Creek Systems acquisition as mentioned by Andrew White of Gartner in this post. The approach from Silver Creek to product data quality can be seen in this MIT Information Quality Industry Symposium presentation from the year before. However, today Oracle is not even present in the industry report mentioned above.

Multi-Domain MDM and Data Quality DimensionsWhile data quality as a discipline with the methodology and surrounding data governance may be very similar between customer data and product data, the capabilities needed for tools supporting data cleansing, data quality improvement and prevention of data quality issues are somewhat different.

Data profiling is different, as it must be very tightly connected to product classification. Deduplication is useful, but far from in same degree as with customer data. Data enrichment must be much more related to second party data than third party data, which is most useful for customer and other party master data.

Regular readers of this blog will know, that my suggestion for data quality tool vendors is to join Product Data Lake.

The Future of Data Quality Tools

When looking at the data quality tool market it is interesting to observe, that the tools available does pretty much the same and that all of them are pretty good at what they do today.

A visualization of this is the vendor landscape in the latest Information Difference Data Quality Landscape:

Data Quality Landscape 2017

As you see, the leaders as Experian Data Quality, Informatica, Trillium and others are assembling at the right edge. But that is due to market strength. Else the bunch is positioned pretty much equal.

This report does in my eyes also mention some main clues about where the industry is going.

One aspect is that: “Some data quality products are stand-alone, while others link to separate master data or data governance tools with varying degrees of smoothness.”

Examples among the leaders are Informatica, with data quality, MDM, PIM and other data management tools under the same brand, and Trillium with their partnership with the top data governance vendor Collibra. We will see more of that.

Another aspect is that: “Although name and address is the most common area addressed in data quality, product data is another broad domain requiring different approaches.”

I agree with Andy Hayler of Information Difference about that product data needs a different treatment as discussed in the post Data Quality for the Product Domain vs the Party Domain.

We Need More Product Data Lake Ambassadors

ambassador

Product Data Lake is the new solution to sharing product information between trading partners. While we see many viable in-house solutions to Product Information Management (PIM), there is a need for a solution to exchange product information within cross company supply chains between manufacturers, distributors and retailers.

Completeness of product information is a huge issue for self-service sales approaches as seen in ecommerce. 81 % of e-shoppers will leave a webshop with lacking product information. The root cause of missing product information is often an ineffective cross company data supply chain, where exchange of product data is based on sending spreadsheets back and forth via email or based on biased solutions as PIM Supplier Portals.

However, due to the volume of product data, the velocity required to get data through and the variety of product data needed today, these solutions are in no way adequate or will work for everyone. Having a not working environment for cross company product data exchange is hindering true digital transformation at many organizations within trade.

As a Product Information Management professional or as a vendor company in this space, you can help manufacturers, distributors and retailers in being successful with product information completeness by becoming a Product Data Lake ambassador.

The Product Data Lake encompasses some of the most pressing issues in world-wide sharing of product data:

The first forward looking professionals and vendors in the Product Information Management realm have already joined. I would love to see you as well as our next ambassador.

Interested? Get in contact:

Interenterprise Data Sharing and the 2016 Data Quality Magic Quadrant

dqmq2016The 2016 Magic Quadrant for Data Quality Tools by Gartner is out. One way to have a free read is downloading the report from Informatica, who is the most-top-right vendor in the tool vendor positioning.

Apart from the vendor positioning the report as always contains valuable opinions and observations about the market and how these tools are used to achieve business objectives.

Interenterprise data sharing is the last mentioned scenario besides BI and analytics (analytical scenarios), MDM (operational scenarios), information governance programs, ongoing operations and data migrations.

Another observation is that 90% of the reference customers surveyed for this Magic Quadrant consider party data a priority while the percentage of respondents prioritizing the product data domain was 47%.

My take on this difference is that it relates to interenterprise data sharing. Parties are per definition external to you and if your count of business partners (and B2C customers) exceeds some thousands (that’s the 90%), you need some of kind of tool to cope with data quality for the master data involved. If your product data are internal to you, you can manage data quality without profiling, parsing, matching and other core capabilities of a data quality tool.  If your product data are part of a cross company supply chain, and your count of products exceeds some thousands (that’s the 47%), you probably have issues with product data quality.

In my eyes, the capabilities of a data quality tool will also have to be balanced differently for product data as examined in the post Multi-Domain MDM and Data Quality Dimensions.

Sign Up is Open

Over the recent one and a half year many of the posts on this blog has been about Product Data Lake, a cloud service for sharing product data in the business ecosystems of manufacturers, distributors, retailers and end users of product information.

From my work as a data quality and Master Data Management (MDM) consultant, I have seen the need for a service to solve data quality issues, when it comes to product master data. My observation has been that the root cause of these issues are found in the way that trading partners exchange product information and digital assets.

It is the aim of Product Data Lake to ensure:

  • Completeness of product information by enabling trading partners to exchange product data in a uniform way
  • Timeliness of product information by connecting trading partners in a process driven way
  • Conformity of product information by encompassing various international standards for product information
  • Consistency of product information by allowing upstream trading partners and downstream trading partners to interact with in-house structure of product information
  • Accuracy of product information by ensuring transparency of product information across the supply chain.

You can learn more about how Product Data Lake works on the documentation site.

pdl-how-much-smallBecome a:

Sign Up is open on www.productdatalake.com

Bookmark and Share

Connecting Product Information

In our current work with the Product Data Lake cloud service, we are introducing a new way to connect product information that are stored at two different trading partners.

When doing that we deal with three kinds of product attributes:

  • Product identification attributes
  • Product classification attributes
  • Product features

Product identification attributes

The most common used notion for a product identification attribute today is GTIN (Global Trade Item Number). This numbering system has developed from the UPC (Universal Product Code) being most popular in North America and the EAN (International Article Number formerly European Article Number).

Besides this generally used system, there are heaps of industry and geographical specific product identification systems.

In principle, every product in a given product data store, should have a unique value in a product identification attribute.

When identifying products in practice attributes as a model number at a given manufacturer and a product description are used too.

Product classification attributes

A product classification attribute says something about what kind of product we are talking about. Thus, a range of products in a given product data store will have the same value in a product classification attribute.

As with product identification, there is no common used standard. Some popular cross-industry classification standards are UNSPSC (United Nations Products and Service Code®) and eCl@ss, but many other standards exists too as told in the post The World of Reference Data.

Besides the variety of standards a further complexity is that these standards a published in versions over time and even if two trading partners use the same standard they may not use the same version and they may have used various versions depending on when the product was on-boarded.

Product features

A product feature says something about a specific characteristic of a given product. Examples are general characteristics as height, weight and colour and specific characteristics within a given product classification as voltage for a power tool.

Again, there are competing standards for how to define, name and identify a given feature.

pdl-tagsThe Product Data Lake tagging approach

In the Product Data Lake we use a tagging system to typify product attributes. This tagging system helps with:

  • Linking products stored at two trading partners
  • Linking attributes used at two trading partners

A product identification attribute can be tagged starting with = followed by the system and optionally the variant off the system used. Examples will be ‘=GTIN’ for a Global Trading Item Number and ‘=GTIN-EAN13’ for a 13 character EAN number. An industry geographical tag could be ‘=DKVVS’ for a Danish plumbing catalogue number (VVS nummer). ‘=MODEL’ is the tag of a model number and ‘=DESCRIPTION’ is the tag of the product description.

A product classification tag starts with a #. ‘#UNSPSC’ is for a United Nations Products and Service Code where ‘#UNSPSC-19’ indicates a given main version.

A product feature is tagged with the feature id, an @ and the feature (sometimes called property) standard. ‘EF123456@ETIM’ will be a specific feature in ETIM (an international standard for technical products). ‘ABC123@ECLASS’ is a reference to a property in eCl@ss.

Bookmark and Share

Launching too early or too late

Today the 28th August 2016 is one month away from the official launch of the Product Data Lake.

When to launch is an essential question for every start-up. Launching too early with an immature product is one common pitfall and launching too late with a complex product that does not fit the market is another common pitfall for a start-up.

At Product Data Lake we hope we have struck the right balance. You can see what we have chosen to put up in the cloud in this document.

Right now both the technical team at Larion in Ho Chi Min City and the commercial team in Copenhagen is working hard to get the last details in place for the launch that will happen as told on LinkedIn in the post Meet The Product Data Lake.

One thing we have in place is the company’s vehicle fleet. As you can see, this is according to us being both environmental and economically responsible.

Cykler

Bookmark and Share