Interenterprise Data Sharing and the 2016 Data Quality Magic Quadrant

dqmq2016The 2016 Magic Quadrant for Data Quality Tools by Gartner is out. One way to have a free read is downloading the report from Informatica, who is the most-top-right vendor in the tool vendor positioning.

Apart from the vendor positioning the report as always contains valuable opinions and observations about the market and how these tools are used to achieve business objectives.

Interenterprise data sharing is the last mentioned scenario besides BI and analytics (analytical scenarios), MDM (operational scenarios), information governance programs, ongoing operations and data migrations.

Another observation is that 90% of the reference customers surveyed for this Magic Quadrant consider party data a priority while the percentage of respondents prioritizing the product data domain was 47%.

My take on this difference is that it relates to interenterprise data sharing. Parties are per definition external to you and if your count of business partners (and B2C customers) exceeds some thousands (that’s the 90%), you need some of kind of tool to cope with data quality for the master data involved. If your product data are internal to you, you can manage data quality without profiling, parsing, matching and other core capabilities of a data quality tool.  If your product data are part of a cross company supply chain, and your count of products exceeds some thousands (that’s the 47%), you probably have issues with product data quality.

In my eyes, the capabilities of a data quality tool will also have to be balanced differently for product data as examined in the post Multi-Domain MDM and Data Quality Dimensions.

Sign Up is Open

Over the recent one and a half year many of the posts on this blog has been about Product Data Lake, a cloud service for sharing product data in the business ecosystems of manufacturers, distributors, retailers and end users of product information.

From my work as a data quality and Master Data Management (MDM) consultant, I have seen the need for a service to solve data quality issues, when it comes to product master data. My observation has been that the root cause of these issues are found in the way that trading partners exchange product information and digital assets.

It is the aim of Product Data Lake to ensure:

  • Completeness of product information by enabling trading partners to exchange product data in a uniform way
  • Timeliness of product information by connecting trading partners in a process driven way
  • Conformity of product information by encompassing various international standards for product information
  • Consistency of product information by allowing upstream trading partners and downstream trading partners to interact with in-house structure of product information
  • Accuracy of product information by ensuring transparency of product information across the supply chain.

You can learn more about how Product Data Lake works on the documentation site.

pdl-how-much-smallBecome a:

Sign Up is open on

Bookmark and Share

Connecting Product Information

In our current work with the Product Data Lake cloud service, we are introducing a new way to connect product information that are stored at two different trading partners.

When doing that we deal with three kinds of product attributes:

  • Product identification attributes
  • Product classification attributes
  • Product features

Product identification attributes

The most common used notion for a product identification attribute today is GTIN (Global Trade Item Number). This numbering system has developed from the UPC (Universal Product Code) being most popular in North America and the EAN (International Article Number formerly European Article Number).

Besides this generally used system, there are heaps of industry and geographical specific product identification systems.

In principle, every product in a given product data store, should have a unique value in a product identification attribute.

When identifying products in practice attributes as a model number at a given manufacturer and a product description are used too.

Product classification attributes

A product classification attribute says something about what kind of product we are talking about. Thus, a range of products in a given product data store will have the same value in a product classification attribute.

As with product identification, there is no common used standard. Some popular cross-industry classification standards are UNSPSC (United Nations Products and Service Code®) and eCl@ss, but many other standards exists too as told in the post The World of Reference Data.

Besides the variety of standards a further complexity is that these standards a published in versions over time and even if two trading partners use the same standard they may not use the same version and they may have used various versions depending on when the product was on-boarded.

Product features

A product feature says something about a specific characteristic of a given product. Examples are general characteristics as height, weight and colour and specific characteristics within a given product classification as voltage for a power tool.

Again, there are competing standards for how to define, name and identify a given feature.

pdl-tagsThe Product Data Lake tagging approach

In the Product Data Lake we use a tagging system to typify product attributes. This tagging system helps with:

  • Linking products stored at two trading partners
  • Linking attributes used at two trading partners

A product identification attribute can be tagged starting with = followed by the system and optionally the variant off the system used. Examples will be ‘=GTIN’ for a Global Trading Item Number and ‘=GTIN-EAN13’ for a 13 character EAN number. An industry geographical tag could be ‘=DKVVS’ for a Danish plumbing catalogue number (VVS nummer). ‘=MODEL’ is the tag of a model number and ‘=DESCRIPTION’ is the tag of the product description.

A product classification tag starts with a #. ‘#UNSPSC’ is for a United Nations Products and Service Code where ‘#UNSPSC-19’ indicates a given main version.

A product feature is tagged with the feature id, an @ and the feature (sometimes called property) standard. ‘EF123456@ETIM’ will be a specific feature in ETIM (an international standard for technical products). ‘ABC123@ECLASS’ is a reference to a property in eCl@ss.

Bookmark and Share

Launching too early or too late

Today the 28th August 2016 is one month away from the official launch of the Product Data Lake.

When to launch is an essential question for every start-up. Launching too early with an immature product is one common pitfall and launching too late with a complex product that does not fit the market is another common pitfall for a start-up.

At Product Data Lake we hope we have struck the right balance. You can see what we have chosen to put up in the cloud in this document.

Right now both the technical team at Larion in Ho Chi Min City and the commercial team in Copenhagen is working hard to get the last details in place for the launch that will happen as told on LinkedIn in the post Meet The Product Data Lake.

One thing we have in place is the company’s vehicle fleet. As you can see, this is according to us being both environmental and economically responsible.


Bookmark and Share

A Quick Tour around the Product Data Lake

The Product Data Lake is a cloud service for sharing product data in the eco-systems of manufacturers, distributors, retailers and end users of product information.

PDL tour 01As an upstream provider of products data, being a manufacturer or upstream distributor, you have these requirements:

  • When you introduces new products to the market, you want to make the related product data and digital assets available to  your downstream partners in a uniform way
  • When you win a new downstream partner you want the means to immediately and professionally provide product data and digital assets for the agreed range
  • When you add new products to an existing agreement with a downstream partner, you want to be able to provide product data and digital assets instantly and effortless
  • When you update your product data and related digital assets, you want a fast and seamless way of pushing it to your downstream partners
  • When you introduce a new product data attribute or digital asset type, you want a fast and seamless way of pushing it to your downstream partners.

The Product Data Lake facilitates these requirements by letting you push your product data into the lake in your in-house structure that may or may not be fully or partly compliant to an international standard.

PDL tour 02

As an upstream provider, you may want to push product data and digital assets from several different internal sources.

The product data lake tackles this requirement by letting you operate several upload profiles.

PDL tour 03

As a downstream receiver of product data, being a downstream distributor, retailer or end user, you have these requirements:

  • When you engage with a new upstream partner you want the means to fast and seamless link and transform product data and digital assets for the agreed range from the upstream partner
  • When you add new products to an existing agreement with an upstream partner, you want to be able to link and transform product data and digital assets in a fast and seamless way
  • When your upstream partners updates their product data and related digital assets, you want to be able to receive the updated product data and digital assets instantly and effortless
  • When you introduce a new product data attribute or digital asset type, you want a fast and seamless way of pulling it from your upstream partners
  • If you have a backlog of product data and digital asset collection with your upstream partners, you want a fast and cost effective approach to backfill the gap.

The Product Data Lake facilitates these requirements by letting you pull your product data from the lake in your in-house structure that may or may not be fully or partly compliant to an international standard.

PDL tour 04

In the Product Data Lake, you can take the role of being an upstream provider and a downstream receiver at the same time by being a midstream subscriber to the Product Data Lake. Thus, Product Data Lake covers the whole supply chain from manufacturing to retail and even the requirements of B2B (Business-to-Business) end users.

PDL tour 05

The Product Data Lake uses the data lake concept for big data by letting the transformation and linking of data between many structures be done when data are to be consumed for the first time. The goal is that the workload in this system has the resemblance of an iceberg where 10% of the ice is over water and 90 % is under water. In the Product Data Lake manually setting up the links and transformation rules should be 10 % of the duty and the rest being 90 % of the duty will be automated in the exchange zones between trading partners.

PDL tour 06

TwoLine Blue

Bookmark and Share

Did You Mean Potato or Potahto?

As told in the post Where the Streets have Two Names one aspect of address validation is the fact, that in some parts of the world, a given postal address can be presented in more than one language.

I experienced that today when using Google Maps for directions to a Master Data Management (MDM) conference in Helsinki, Finland. When typing in the address I got this message:


The case is that the two addresses proposed by Google Maps are exactly the same address, just spelled in Swedish and Finnish, the two official languages used in this region.

I think Google Maps is an example of a splendid world-wide service. But even the best world-wide services sometimes don’t match local tailored services. This is in my experience the case when it comes to address management solutions as address validation and assistance whether they come as an integrated part of a Master Data Management (MDM) solution, a stand-alone data quality tool or a general service as Google Maps.

It is Magic Quadrant Week

Earlier this week this blog featured the Magic Quadrant for Customer MDM and the Magic Quadrant for Product MDM. Today it is time to have a look at the just published Magic Quadrant for Data Quality Tools.

Last year I wondered if we finally will see that data quality tools will focus on other pain points than duplicates in party data and postal address precision as discussed in the post The Multi-Domain Data Quality Tool Magic Quadrant 2014 is out.

Well, apparently there still isn’t a market for that as the Gartner report states: “Party data (that is, data about existing customers, prospective customers, citizens or patients) remains the top priority for most organizations: Almost nine in 10 (89%) of the reference customers surveyed for this Magic Quadrant consider it a priority, up from 86% in the previous year’s survey.”

Multi-Domain MDM and Data Quality DimensionsFrom own experience in working predominantly with product master data during the last couple of years there are issues and big pain points with product data. They are just different from the main pain points with party master data as examined in the post Multi-Domain MDM and Data Quality Dimensions.

I sincerely believes that there are opportunities in providing services to solve the specific data quality challenges for product master data, that, according to Gartner, “is one of the most important information assets an organization has; second-only, perhaps, to customer master data”. In all humbleness, my own venture is called the Product Data Lake.

Anyway, as ever, Informatica is our friend when it comes to free copies of a data management quadrant. Get a free copy of the 2015 Magic Quadrant for Data Quality Tools here.

Data Quality: The Union of First Time Right and Data Cleansing

The other day Joy Medved aka @ParaDataGeek made this tweet:

Indeed, upstream prevention of bad data to enter our databases is sure the better way compared to downstream data cleaning. Also real time enrichment is better than enriching long time after data has been put to work.

That said, there are situations where data cleaning has to be done. These reasons were examined in the post Top 5 Reasons for Downstream Cleansing. But I can’t think of many situations, where a downstream cleaning and/or enrichment operation will be of much worth if it isn’t followed up by an approach to getting it first time right in the future.

If we go a level deeper into data quality challenges, there will be some different data quality dimensions with different importance to various data domains as explored in the post Multi-Domain MDM and Data Quality Dimensions.

With customer master data we most often have issues with uniqueness and location precision. While I have spend many happy years with data cleansing, data enrichment and data matching tools, I have during the last couple of years been focusing on a tool for getting that first time right.

Product master data are often marred by issues with completeness and (location) conformity. The situation here is that tools and platforms for mastering product data are focussed on what goes on inside a given organization and not so much about what goes on between trading partners. Standardization seems to be the only hope. But that path is too long to wait for and may in some way be contradicting the end purpose as discussed under the post Image Coming Soon.

So in order to have a first time right solution for product master data sharing, I have embarked on a journey with a service called the Product Data Lake. If you want to join, you are most welcome.

PS: The product data lake also has the capability of catching up with the sins of the past.

Bookmark and Share

Integration Matters

A recent report from KDR Recruitment takes a snapshot of the current state of the world of data in order to uncover some of the most pressing issues facing the Information Management industry and get a sense of what changes may be on the horizon.

One of the clearest findings was around what drives the selection of information software. The report states: “New software must integrate easily into existing infrastructure and systems. This is far and away the most important consideration for users, who also want that same flexibility to extend to customisation options and reporting functionalities.”

The graphic looks like this:


The ease of integration is in my experience indeed a very important feature when selecting (and selling) a data management tool. Optimally it should not be so because you can end up with not solving the business issue in a nice integrated way. But without integration a new data management tool will live in yet another silo probably only solving some part of the business issue.

The report from KDR Recruitment also covers where you use data to improve performance, the barriers to implementing an informational management strategy and other data management topics.You can read the full report called Not waving but drowning – The State of Data 2015 here.

PS: Kudos to KDR Recruitment for actually engaging in the sector where they work and doing so on social media. Very much in contrast to recruiters who just spam LinkedIn groups with their job openings.

Bookmark and Share

The Data Quality Market Just Passed 1 Billion USD

The Data Quality Landscape – Q1 2015 from Information Difference is out. A bit ironically, the report states that the data quality market for the calendar year 2014 was worth a fraction over $1 billion. As the $ sign  could mean a lot of different currencies like CAD, AUD or FJD this statement is very ambiguous, but I guess Andy Hayler means USD.

dollarWhile there still is a market for standalone data quality tools an increasing part of data quality tooling is actually made with tools being a Master Data Management (MDM) tool, a Data Governance tool, an Extract Load and Transform (ETL) tool, a Customer Relationship Management (CRM) tool or an other kind of tool or software suite.

This topic was recently touched on this blog in the post called Informatica without Data Quality? Herein the reasons behind why the new owners of Informatica did not mention data quality as a future goodie in the Informatica toolbox was examined.

In a follow up mail an Informatica officer explained: “As you know Data Quality has become an integral part of multidomain MDM and of the MDM fueled Product Catalog App. We still serve pure DQ (Data Quality) use cases, but we see a lot growth in DQ as part of MDM initiatives”.

You can read the full DQ Landscape 2015 here.

Bookmark and Share