A Quick Tour around the Product Data Lake

The Product Data Lake is a cloud service for sharing product data in the eco-systems of manufacturers, distributors, retailers and end users of product information.

PDL tour 01As an upstream provider of products data, being a manufacturer or upstream distributor, you have these requirements:

  • When you introduces new products to the market, you want to make the related product data and digital assets available to  your downstream partners in a uniform way
  • When you win a new downstream partner you want the means to immediately and professionally provide product data and digital assets for the agreed range
  • When you add new products to an existing agreement with a downstream partner, you want to be able to provide product data and digital assets instantly and effortless
  • When you update your product data and related digital assets, you want a fast and seamless way of pushing it to your downstream partners
  • When you introduce a new product data attribute or digital asset type, you want a fast and seamless way of pushing it to your downstream partners.

The Product Data Lake facilitates these requirements by letting you push your product data into the lake in your in-house structure that may or may not be fully or partly compliant to an international standard.

PDL tour 02

As an upstream provider, you may want to push product data and digital assets from several different internal sources.

The product data lake tackles this requirement by letting you operate several upload profiles.

PDL tour 03

As a downstream receiver of product data, being a downstream distributor, retailer or end user, you have these requirements:

  • When you engage with a new upstream partner you want the means to fast and seamless link and transform product data and digital assets for the agreed range from the upstream partner
  • When you add new products to an existing agreement with an upstream partner, you want to be able to link and transform product data and digital assets in a fast and seamless way
  • When your upstream partners updates their product data and related digital assets, you want to be able to receive the updated product data and digital assets instantly and effortless
  • When you introduce a new product data attribute or digital asset type, you want a fast and seamless way of pulling it from your upstream partners
  • If you have a backlog of product data and digital asset collection with your upstream partners, you want a fast and cost effective approach to backfill the gap.

The Product Data Lake facilitates these requirements by letting you pull your product data from the lake in your in-house structure that may or may not be fully or partly compliant to an international standard.

PDL tour 04

In the Product Data Lake, you can take the role of being an upstream provider and a downstream receiver at the same time by being a midstream subscriber to the Product Data Lake. Thus, Product Data Lake covers the whole supply chain from manufacturing to retail and even the requirements of B2B (Business-to-Business) end users.

PDL tour 05

The Product Data Lake uses the data lake concept for big data by letting the transformation and linking of data between many structures be done when data are to be consumed for the first time. The goal is that the workload in this system has the resemblance of an iceberg where 10% of the ice is over water and 90 % is under water. In the Product Data Lake manually setting up the links and transformation rules should be 10 % of the duty and the rest being 90 % of the duty will be automated in the exchange zones between trading partners.

PDL tour 06

TwoLine Blue

Bookmark and Share

Did You Mean Potato or Potahto?

As told in the post Where the Streets have Two Names one aspect of address validation is the fact, that in some parts of the world, a given postal address can be presented in more than one language.

I experienced that today when using Google Maps for directions to a Master Data Management (MDM) conference in Helsinki, Finland. When typing in the address I got this message:

Helsinki

The case is that the two addresses proposed by Google Maps are exactly the same address, just spelled in Swedish and Finnish, the two official languages used in this region.

I think Google Maps is an example of a splendid world-wide service. But even the best world-wide services sometimes don’t match local tailored services. This is in my experience the case when it comes to address management solutions as address validation and assistance whether they come as an integrated part of a Master Data Management (MDM) solution, a stand-alone data quality tool or a general service as Google Maps.

It is Magic Quadrant Week

Earlier this week this blog featured the Magic Quadrant for Customer MDM and the Magic Quadrant for Product MDM. Today it is time to have a look at the just published Magic Quadrant for Data Quality Tools.

Last year I wondered if we finally will see that data quality tools will focus on other pain points than duplicates in party data and postal address precision as discussed in the post The Multi-Domain Data Quality Tool Magic Quadrant 2014 is out.

Well, apparently there still isn’t a market for that as the Gartner report states: “Party data (that is, data about existing customers, prospective customers, citizens or patients) remains the top priority for most organizations: Almost nine in 10 (89%) of the reference customers surveyed for this Magic Quadrant consider it a priority, up from 86% in the previous year’s survey.”

Multi-Domain MDM and Data Quality DimensionsFrom own experience in working predominantly with product master data during the last couple of years there are issues and big pain points with product data. They are just different from the main pain points with party master data as examined in the post Multi-Domain MDM and Data Quality Dimensions.

I sincerely believes that there are opportunities in providing services to solve the specific data quality challenges for product master data, that, according to Gartner, “is one of the most important information assets an organization has; second-only, perhaps, to customer master data”. In all humbleness, my own venture is called the Product Data Lake.

Anyway, as ever, Informatica is our friend when it comes to free copies of a data management quadrant. Get a free copy of the 2015 Magic Quadrant for Data Quality Tools here.

Data Quality: The Union of First Time Right and Data Cleansing

The other day Joy Medved aka @ParaDataGeek made this tweet:

https://twitter.com/ParaDataGeek

Indeed, upstream prevention of bad data to enter our databases is sure the better way compared to downstream data cleaning. Also real time enrichment is better than enriching long time after data has been put to work.

That said, there are situations where data cleaning has to be done. These reasons were examined in the post Top 5 Reasons for Downstream Cleansing. But I can’t think of many situations, where a downstream cleaning and/or enrichment operation will be of much worth if it isn’t followed up by an approach to getting it first time right in the future.

If we go a level deeper into data quality challenges, there will be some different data quality dimensions with different importance to various data domains as explored in the post Multi-Domain MDM and Data Quality Dimensions.

With customer master data we most often have issues with uniqueness and location precision. While I have spend many happy years with data cleansing, data enrichment and data matching tools, I have during the last couple of years been focusing on a tool for getting that first time right.

Product master data are often marred by issues with completeness and (location) conformity. The situation here is that tools and platforms for mastering product data are focussed on what goes on inside a given organization and not so much about what goes on between trading partners. Standardization seems to be the only hope. But that path is too long to wait for and may in some way be contradicting the end purpose as discussed under the post Image Coming Soon.

So in order to have a first time right solution for product master data sharing, I have embarked on a journey with a service called the Product Data Lake. If you want to join, you are most welcome.

PS: The product data lake also has the capability of catching up with the sins of the past.

Bookmark and Share

Integration Matters

A recent report from KDR Recruitment takes a snapshot of the current state of the world of data in order to uncover some of the most pressing issues facing the Information Management industry and get a sense of what changes may be on the horizon.

One of the clearest findings was around what drives the selection of information software. The report states: “New software must integrate easily into existing infrastructure and systems. This is far and away the most important consideration for users, who also want that same flexibility to extend to customisation options and reporting functionalities.”

The graphic looks like this:

Integration

The ease of integration is in my experience indeed a very important feature when selecting (and selling) a data management tool. Optimally it should not be so because you can end up with not solving the business issue in a nice integrated way. But without integration a new data management tool will live in yet another silo probably only solving some part of the business issue.

The report from KDR Recruitment also covers where you use data to improve performance, the barriers to implementing an informational management strategy and other data management topics.You can read the full report called Not waving but drowning – The State of Data 2015 here.

PS: Kudos to KDR Recruitment for actually engaging in the sector where they work and doing so on social media. Very much in contrast to recruiters who just spam LinkedIn groups with their job openings.

Bookmark and Share

The Data Quality Market Just Passed 1 Billion USD

The Data Quality Landscape – Q1 2015 from Information Difference is out. A bit ironically, the report states that the data quality market for the calendar year 2014 was worth a fraction over $1 billion. As the $ sign  could mean a lot of different currencies like CAD, AUD or FJD this statement is very ambiguous, but I guess Andy Hayler means USD.

dollarWhile there still is a market for standalone data quality tools an increasing part of data quality tooling is actually made with tools being a Master Data Management (MDM) tool, a Data Governance tool, an Extract Load and Transform (ETL) tool, a Customer Relationship Management (CRM) tool or an other kind of tool or software suite.

This topic was recently touched on this blog in the post called Informatica without Data Quality? Herein the reasons behind why the new owners of Informatica did not mention data quality as a future goodie in the Informatica toolbox was examined.

In a follow up mail an Informatica officer explained: “As you know Data Quality has become an integral part of multidomain MDM and of the MDM fueled Product Catalog App. We still serve pure DQ (Data Quality) use cases, but we see a lot growth in DQ as part of MDM initiatives”.

You can read the full DQ Landscape 2015 here.

Bookmark and Share

IDQ vs iDQ™

The previous post on this blog was called Informatica without Data Quality? This post digs into the messaging around the recent takeover of Informatica and the future for the data quality components in the Informatica toolbox.

In the comments Julien Peltier and Richard Branch discusses the cloud emphasis in the messaging from the new Informatica owners and especially the future of Master Data Management (MDM) in the cloud.

open-doorMy best experience with MDM in the cloud is with a service called iDQ™ – a service that shares TLA (Three Letter Acronym) with Informatica Data Quality by the way. The former stands for instant Data Quality. This is a service that revolves around turning your MDM inside-out as latest touched on this blog in the post The Pros and Cons of MDM 3.0.

iDQ™ specifically deals with customer (or rather party) master data, how to get this kind of master data right the first time and how to avoid duplicates as explored in the post The Good, Better and Best Way of Avoiding Duplicates.

Bookmark and Share

Informatica without Data Quality?

This week it was announced that Informatica, a large data management tool provider, will be taken over by a London based private equity firm and a Canadian pension invest management organization.

shark_eatThe first analyst reactions and line up of the potential benefits and the potential drawbacks can be found here on searchCIO in an article called Informatica going private could be a good thing for CIOs.

Most quotes in this article are from Ted Friedman, the Gartner analyst who writes the data quality tool magic quadrant, and Friedman notes, that the new owners doesn’t mention data quality as one of the goodies in the Informatica toolbox (opposite to data security, an area Informatica is not well known for).

So, maybe the new owners just don’t know yet what they bought, or they have a clear vision for the data management market where data quality is just being a natural part of cloud integration, master data management, data integration for next-generation analytics, and data security. The alternative routes could be decommissioning or split of, both familiar routes for this kind of take over.

Splitting of the data quality components should not be too hard, as some of these components has come to Informatica as acquisitions of Similarity Systems from Ireland and Identity Systems, which once was SSA with roots in Australia. I was actually a bit surprised when watching an Informatica presentation in London last autumn that the data quality part was the good old SSA Name3 service.

Bookmark and Share

No plan of operations extends with any certainty beyond the first contact with the full load of data

There is a famous saying from the military world stating that: “No plan survives contact with the enemy.” At least one blogger has used the paraphrasing saying: “No plan survives contact with the data.” A good read by the way.

Helmuth_Karl_Bernhard_von_Moltke
Helmuth von Moltke the Elder

Like most famous sayings also this phrase is simplified from the original version. The military observation made by Helmuth von Moltke the Elder is in full length: “No plan of operations extends with any certainty beyond the first contact with the main hostile force.”

Translating the extended military learning into data management makes a lot of sense too. You may plan data management activities using selected examples and you may test those using nice little samples. Like skirmishes before the real battle in warfare. But if your data management solution goes live on the full load of data for the first time, there most often is news for you.

From my data matching days I remember this clearly as explained in the post Seeing is Believing.

The mitigation is to test with a full load of data before going live. In data management we actually have a realistic way of overcoming the observation made by Field Marshall Helmuth Carl Bernard Graf von Moltke and revisit our plan of operations before the second and serious contact with the full load of data.

Bookmark and Share

Be Prepared

Working with data governance and data quality can be a very backward looking quest. It often revolves around how to avoid a recent data disaster or catching up with the organizational issues, the process orchestration and new technology implementations needed to support current business objectives with current data types in a better way.

This may be hard enough. But you must also be prepared for the future.

open-doorThe growth of available data to support your business is a challenge today. Your competitors take advantage of new data sources and better exploitation of known data sources while you are sleeping. New competitors emerge with business ideas based on new ways of using data.

The approach to inclusion of new data sources, data entities, data attributes and digital assets must be a part of your data governance framework and data quality capability. If you are not prepared for this, your current data quality will not only be challenged by decay of current data elements but also of not sufficiently governed new data elements or lack of business agility because you can’t include new data sources and elements in a safe way.

Some essentials in being prepared for inclusion of new kinds of data are:

  • A living business glossary that facilitates a shared understanding of new data elements within your organization including how they relate to or replaces current data elements.
  • Configurable data quality measurement facilities, data profiling functionality and data matching tools so on-boarding every new data element doesn’t require a new data quality project.
  • Self-service and automation being the norm for data capture and data consumption. Self-service must be governed both internally in your organization and externally as explained in the post Data Governance in the Self-Service Age.

Bookmark and Share