Data Matching and Real-World Alignment

Data matching is a sub discipline within data quality management. Data matching is about establishing a link between data elements and entities, that does not have the same value, but are referring to the same real-world construct.

The most common scenario for data matching is deduplication of customer data records held across an enterprise. In this case we often see a gap between what we technically try to do and the desired business outcome from deduplication. In my experience, this misalignment has something to do with real-world alignment.

Data Matching and Real World Alignment

What we technically do is basically to find a similarity between data records that typically has been pre-processed with some form of standardization. This is often not enough.

Location Intelligence

Deduplication and other forms of data matching with customer master data revolves around names and addresses.

Standardization and verification of addresses is very common element in data quality / data matching tools. Often such at tool will use a service either from its same brand or a third-party service. Unfortunately, no single service is often enough. This is because:

  • Most services are biased towards a certain geography. They may for example be quite good for addresses in The United States but very poor compared to local services for other geographies. This is especially true for geographies with multiple languages in play as exemplified in the post The Art in Data Matching.
  • There is much more to an address than the postal format. In deduplication it is for example useful to know if the address is a single-family house or a high-rise building, a nursing home, a campus or other building with lots of units.
  • Timeliness of address reference data is underestimated. I recently heard from a leader in the Gartner Quadrant for Data Quality Tools that a quarterly refresh is fine. It is not, as told in the post Location Data Quality for MDM.

Identity Resolution

The overlaps and similarities between data matching and identity resolution was discussed in the post Deduplication vs Identity Resolution.

In summary, the capability to tell if two data records represent the same real-world entity will eventually involve identity resolution. And as this is very poorly supported by data quality tools around, we see that a lot of manual work will be involved if the business processes that relies on the data matching cannot tolerate too may, or in some cases any, false positives – or false negatives.

Hierarchy Management

Even telling that a true positive match is true in all circumstances is hard. The predominant examples of this challenge are:

  • Is a match between what seems to be an individual person and what seems to be the household where the person lives a true match?
  • Is a match between what seems to be a person in a private role and what seems to be the same person in a business role a true match? This is especially tricky with sole proprietors working from home like farmers, dentists, free lance consultants and more.
  • Is a match between two sister companies on the same address a true match? Or two departments within the same company?

We often realize that the answer to the questions are different depending on the business processes where the result of the data matching will be used.

The solution is not simple. The data matching functionality must, if we want automated and broadly usable results, be quite sophisticated in order to take advantage of what is available in the real-world. The data model where we hold the result of the data matching must be quite complex if we want to reflect the real-world.

The Long Tail of Product Data Synchronization

When discussing with peers and interested parties about Product Data Lake, some of the alternatives are often brought up. These are EDI and GDSN.

So, what is the difference between those services and Product Data Lake.

Electronic Data Interchange (EDI) is the concept of businesses electronically communicating information that was traditionally communicated on paper, such as purchase orders and invoices. EDI also has a product catalog functionality encompassing:

  • Seller name and contact information
  • Terms of sale information, including discounts available
  • Item identification and description
  • Item physical details including type of packaging
  • Item pricing information including quantity and unit of measure

The Global Data Synchronization Network (GDSN) is an internet-based, interconnected network of interoperable data pools and a global registry known as the GS1 Global Registry, that enables companies around the globe to exchange standardised and synchronised supply chain data with their trading partners using a standardised Global Product Classification (GPC).

This service focuses on retail, healthcare, food-service and transport / logistics. In some geographies GS1 is also targeting DIY – do it yourself building materials and tools for consumers.

Product Data Lake is a cloud service for sharing product information (product data syndication) in the business ecosystems of manufacturers, distributors / wholesalers, merchants, marketplaces and large end users of product information.

Our vision is that Product Data Lake will be the process driven key service for exchanging any sort of product information within business ecosystems all over the world, with the aim of optimally assist self-service purchase – both B2C and B2B – of every kind of product.

In that way, Product Data Lake is the long tail of product data synchronization supplementing EDI and GDSN for a long range of product groups, product attributes, digital assets, product relationships and product classification systems:

EDI GDSN PDLFind out more in the Product Data Lake Overview.

How Wholesalers and Dealers of Building Materials can Improve Product Information Efficiency

MaterialsBuilding materials is a very diverse product group. As a wholesaler or dealer, you will have to manage many different attributes and digital assets depending on which product classification we are talking about.

Getting these data from a diverse crowd of suppliers is a hard job. You may have a spreadsheet for each product group where you require data from your suppliers, but this means a lot of follow up and work in putting the data into your system. You may have a supplier portal, but suppliers are probably reluctant to use it, because they cannot deal with hundreds of different supplier portals from you and all the other wholesalers and dealers possibly across many countries. In the same way that you are not happy about if you must fetch data from hundreds of different customer portals provided by manufacturers and other brand owners.

This also means that even if you can handle the logistics, you must limit your regular assortment of products and therefore often deal with special ad hoc products when they are needed to form a complete range of products asked for by your customers for a given building project. Handling of “specials” is a huge burden and the data gathering must usually be repeated if the product turns up again.

At Product Data Lake we have developed a solution to these challenges. It is a cloud service where your suppliers can provide product information in their way and you can pull the information in the way that fits your taxonomy, structure and format.

Learn about and follow the solution on our Product Data Pull LinkedIn page.

If you are interested, please ask for more information here:

 

Wrapping Data Around Tangible Products

There are three kinds of data monetization: Selling data, wrapping data around products and utilizing advanced analytics leading to fast operational decision making. These options were examined in the post Three Flavors of Data Monetization.

If we look at the middle option, wrapping data around products, and narrow it down to wrapping data around tangible products, there are some ways to execute that for supply change delegates, not at least if the participating business entities embraces the business ecosystem where goods are moved through:

  • Manufacturers need to streamline the handling of product information internally. This includes disciplines as PLM (Product Lifecycle Management) and PIM (Product Information Management). On top of that, manufacturers need to be effective in the way the product information is forwarded to direct customers and distributors/wholesalers and merchants as exemplified in the post How Manufacturers of Building Materials Can Improve Product Information Efficiency.
  • Merchants need to utilize the best way of getting data into inhouse PIM (Product Information Management) solutions or other kind of solutions where data flows in from trading partners. Many merchants have a huge variety in product information needs as told in the post Work Clothes versus Fashion: A Product Information Perspective. On top of that a merchant will have supplying manufacturers and distributors with varying formats and capabilities to offer product information as discussed in the post PIM Supplier Portals: Are They Good or Bad?.
  • Shippers may extend their offerings from moving the goods between manufacturers and merchants (or directly to end users) to also moving the information about the goods as suggested in the post New Routes for Products. New Routes for Product Information.

The end goal is that the buyer personas in self-service scenarios will be able to make a fact based and full informed decision as pondered in the post Where to Buy a Magic Wand?

Magic wand 3

New Routes for Products. New Routes for Product Information

One of the news this week was that Maersk for the first time is taking a large container ship from East Asia to Europe using a Northern Route through the Arctic waters as told in this Financial Times article.

Arctic route

The purpose of this trip is to explore the possibility of avoiding the longer Southern Route including shoehorning the sea traffic through the narrow Suez Canal. A similar opportunity exists around North America as an alternative to going through The Panama Canal.

Similar to moving products and finding new routes for that we may also explore new routes when it comes to moving information about products. Until now the possibilities, besides cumbersome exchange of spreadsheets, have been to shoehorn product information from the manufacturer into a consensus-based data portal or data pool from where the merchant can fetch the information in accurate the same shape as his competitors does.

At Product Data Lake we have explored shorter, more agile and diverse new routes for that. We call it Product Data Syndication Freedom.

Product Data Syndication Freedom

When working with product data syndication in supply chains the big pain is that data standards in use and the preferred exchange methods differ between supply chain participants.

As a manufacturer you will have hundreds of re-sellers who probably have data standards different from you and most likely wants to exchange data in a different way than you do.

As a merchant you will have hundreds of suppliers who probably have data standards different from you and most likely wants to exchange data in a different way than you do.

The aim of Product Data Lake is to take that pain away from both the manufacturer side and the merchant side. We offer product data syndication freedom by letting you as manufacturer push product information using your data standards and your preferred exchange method and letting you as a merchant pull product information using your data standards and your preferred exchange method.

Product Data SyndicationIf you want to know more. Get in contact here:

Product Information on Demand

Video on demand has become a popular way to watch television series, films and other entertainment and Netflix is probably the most known brand for delivering that.

The great thing about watching video on demand is that you do not have to enjoy the service at the exact same time as everyone else, as it was the case back in the days when watching TV or going to the movies were the options available.

At Product Data Lake we will bring that convenience to business ecosystems, as the situation today with broadcasting product information in supply chains very much resembles the situation we had before video on demand came around in the TV/Movie world.

As a provider of product information (being a manufacturer or upstream distributor), you will push your product information into Product Data lake, when you have the information available. Moreover, you will only do that once for each product and piece of information. No more coming to each theatre near your audience and extensive reruns of old stuff.

As a receiver of product information (being a downstream distributor, reseller or large end user), you will pull product information when you need it. That will be when you take a new product into your range or do a special product sale as well as when you start to deal with a new piece of information. No more having to be home at a certain time when your supplier does the show or waiting in ages for a rerun when you missed it.

Learn more about how Product Data Lake makes your life in Product Information Management (PIM) easier by following us here on LinkedIn.

Product Data Lake

 

6 Decades of the LEGO® Brick and the 2nd Decade of MDM

28th January 2018 marks the 60th anniversary of the iconic LEGO® brick.

As I was raised close to the LEGO headquarter in Billund, Denmark, I also remember having a considerable amount of LEGO® bricks to play with as a child back in the 60’s in the first decade of the current LEGO® brick design. At that time the brick was a brick, where you had to combine a few sizes and colours of bricks into resembling a usable thing from the real world. Since then the range of shapes and colours of the pieces from the Lego factory have grown considerably.

MDM BlocksMaster Data Management (MDM) went into the 2nd decade some years ago as reported in the post Happy 10 Years Birthday MDM Solutions. MDM has some basic building blocks, as proposed by former Gartner analyst John Radcliffe  back in 00’s and touched in the post The Need for a MDM Vision.

These blocks indeed look like the original LEGO® bricks.

Through the 2nd decade of MDM and in coming decades we will probably see a lot of specialised blocks in many shapes describing and covering the people, process and technology parts of MDM. Let us hope that they will all stick well together as the LEGO® bricks have done for the past 60 years.

PS: Some if the sticking together is described in the post How MDM, PIM and DAM Stick Together.

Sell more. Reduce costs.

Business outcome is the end goal of any data management activity may that be data governance, data quality management, Master Data Management (MDM) and Product Information Management (PIM).

Business outcome comes from selling more and reducing costs.

At Product Data Lake we have a simple scheme for achieving business outcome through selling more goods and reducing costs of sharing product information between trading partners in business ecosystems:

Sell more Reduce costs

Interested? Get in touch:

Using Pull or Push to Get to the Next Level in Product Information Management

The importance of having a viable Product Information Management (PIM) solution has become well understood for companies who participates in supply chains.

The next step towards excellence in PIM is to handle product information in close collaboration with your trading partners. Product Data Lake is the solution for that. Here upstream providers of product information (manufacturers and upstream distributors) and downstream receivers of product information (downstream distributors and retailers) connect their choice of in-house PIM solution or other product master data solution as PLM (Product Lifecycle Management) or ERP.

Read more about that in the post What a PIM-2-PIM Solution Looks Like.

The principle behind Product Data Lake is inspired by how a data lake differs from a traditional data warehouse. In a data lake the linking and transformation takes place late, when the data is consumed by the receiver.

pdl-diagram-new

Product Data Lake resembles a social network as you connect with your trading partners from the real world in order to collaborate on getting complete and accurate product data from the manufacturer to the point-of-sales:

  • Pull-PushAs a downstream receiver, you can be on the winning side by utilizing our Product Data Pull service
  • As an upstream provider, you can be on the winning side by utilizing our Product Data Push service