RDM: A Small but Important Extension to MDM

Reference Data Management (RDM) is a small but important extension to Master Data Management (MDM). Together with a large extension, being big data and data lakes, mastering reference data is increasingly being part of the offerings from MDM solution vendors as told in the post Extended MDM Platforms.

RDM

Reference Data

Reference data are these smaller lists of values that gives context to master data and ensures that we use the same (or linkable) codes for describing master data entities. Examples are:

Reference data tend to be externally defined and maintained typically by international standardization bodies or industry organizations, but reference data can also be internally defined to meet your specific business model.

3 RDM Solutions from MDM Vendors

Informatica has recently released their first version of a new RDM solution: MDM – Reference 360. This is by the way the first true Software as a Service (SaaS) solution from Informatica in the MDM space. This solution emphasizes on building a hierarchy of reference data lists, the ability to make crosswalks between the lists, workflow (approval) around updates and audit tracks.

Reltio has embraced RDM has an integral part of their Reltio Cloud solution where the “RDM capabilities improves data governance and operational excellence with an easy to use application that creates, manages and provisions reference data for better reporting and analytics.

Semarchy has a solution called Semarchy xDM. The x indicates that this solution encompasses all kinds of enterprise grade data and thus both Master data and Reference data while “xDM extends the agile development concept to its implementation paradigm”.

Data Modelling and Data Quality

There are intersections between data modelling and data quality. In examining those we can use a data quality mind map published recently on this blog:

Data modelling and data quality

Data Modelling and Data Quality Dimensions:

Some data quality dimensions are closely related to data modelling and a given data model can impact these data quality dimensions. This is the case for:

  • Data integrity, as the relationship rules in a traditional entity-relation based data model fosters the integrity of the data controlled in databases. The weak sides are, that sometimes these rules are too rigid to describe actual real-world entities and that the integrity across several databases is not covered. To discover the latter one, we may use data profiling methods.
  • Data validity, as field definitions and relationship rules controls that only data that is considered valid can enter the database.

Some other data quality dimensions must be solved with either extended data models and/or alternative methodologies. This is the case for:

  • Data completeness:
    • A common scenario is that for example a data model born in the United States will set the state field within an address as mandatory and probably to accept only a value from a reference list of 50 states. This will not work in the rest of world. So, in order to not getting crap or not getting data at all, you will either need to extend the model or loosening the model and control completeness otherwise.
    • With data about products the big pain is that different groups of products require different data elements. This can be solved with a very granular data model – with possible performance issues, or a very customized data model – with scalability and other issues as a result.
  • Data uniqueness: A common scenario here is that names and addresses can be spelled in many ways despite that they reflect the same real-world entity. We can use identity resolution (and data matching) to detect this and then model how we link data records with real world duplicates together in a looser or tighter way.

Emerging technologies:

Some of the emerging technologies in the data storing realm are presenting new ways of solving the challenges we have with data quality and traditional entity-relationship based data models.

Graph databases and document databases allows for describing and operating data models better aligned with the real world. This topic was examined in the post Encompassing Relational, Document and Graph the Best Way.

In the Product Data Lake venture I am working with right now we are also aiming to solve the data integrity, data validity and data completeness issues with product data (or product information if you like) using these emerging technologies. This includes solving issues with geographical diversity and varying completeness requirements through a granular data model that is scalable, not only seen within a given company but also across a whole business ecosystem encompassing many enterprises belonging to the same (data) supply chain.

Connecting Silos

The building next to my home office was originally two cement silos standing in an industrial harbor area among other silos. These two silos are now transformed into a connected office building as this area has been developed into a modern residence and commercial quarter.

Master Data Management (MDM) is on similar route.

The first quest for MDM has been to be a core discipline in transforming siloed data stores within a given company into a shared view of the core entities that must be described in the same way across different departmental views. Going from the departmental stage to the enterprise wide stage is examined in the post Three Stages of MDM Maturity.

But as told in this post, it does not stop there. The next transformation is to provide a shared view with trading partners in the business ecosystem(s) where your company operates. Because the shared data in your organization is also a silo when digital transformation puts pressure on each company to become a data integrated part of a business ecosystem.

A concept for doing that is described on the blog page called Master Data Share.

Silos
Connected silos in Copenhagen North Harbor – and connecting data silos enterprise wide and then business ecosystem wide

Artificial Intelligence (AI) and Multienterprise MDM

The previous post on this blog was called Machine Learning, Artificial Intelligence and Data Quality. In here the it was examined how Artificial Intelligence (AI) is impacted by data quality and how data quality can impact AI.

Master Data Management (MDM) will play a crucial role in sustaining the needed data quality for AI and with the rise of digital transformation encompassing business ecosystems we will also see an increasing need for ecosystem wide MDM – also called multienterprise MDM.

Right now, I am working with a service called Product Data Lake where we strive to utilize AI including using Machine Learning (ML) to understand and map data standards and exchange formats used within product information exchange between trading partners.

The challenge in this area is that we have many different classification systems in play as told in the post Five Product Classification Standards. Besides the industry and cross sector standards we still have many homegrown standards as well.

Some of these standards (as eClass and ETIM) also covers standards for the attributes needed for a given product classification, but still, we have plenty of homegrown standards (at no standards) for attribute requirements as well.

Add to that the different preferences for exchange methods and we got a chaotic system where human intervention makes Sisyphus look like a lucky man. Therefore, we have great expectations about introducing machine learning and artificial intelligence in this space.

AI ML PDL

Next week, I will elaborate on the multienterprise MDM and artificial theme on the Master Data Management Summit Europe in London.

Solutions for Handling Product Master Data and Digital Assets

There are three kinds of solutions for handling product master data and related digital assets:

  • Master Data Management (MDM) solutions that are either focussed on product master data or being a multi-domain MDM solution covering the product domain as well as the party domain, the location domain, the asset domain and more.
  • Product Information Management (PIM) solutions.
  • Digital Asset Management (DAM) solutions.

According to Gartner Analyst Simon Walker a short distinction is:

  • MDM of product master data solutions help manage structured product data for enterprise operational and analytical use cases
  • PIM solutions help extend structured product data through the addition of rich product content for sales and marketing use cases
  • DAM solutions help users create and manage digital multimedia files for enterprise, sales and marketing use cases

The below figure shows what kind of data that is typically included in respectively an MDM solution, a PIM solution and/or a DAM solution.

MDM PIM DAM

This is further elaborated in the post How MDM, PIM and DAM Stick Together.

The solution vendors have varying offerings going from being best-of-breed in one of the three categories to offering a OneStopShopping solution for all disciplines.

If you are to compile a list of suitable and forward-looking solutions for MDM, PIM and/or DAM for your required mix, you can start looking at The Disruptive List of MDM/PIM/DAM solutions.

To use Excel or not to use Excel in Product Information Management?

Excel is used heavily throughout data management and this is true for Product Information Management (PIM) too.

The reason of being for PIM solutions is often said to be to eliminate the use of spreadsheets. However, PIM solutions around have functionality to co-exist with spreadsheets, because spreadsheets are still a fact of life.

This is close to me as I have been working on a solution to connect PIM solutions (and other solutions for handling product data) between trading partners. This solution is called Product Data Lake.

Our goal is certainly also to eliminate the use of spreadsheets in exchanging product information between trading partners. However, as an intermediate state we must accept that spreadsheets exists either as the replacement of PIM solutions or because PIM solutions does not (yet) fulfill all purposes around product information.

So, consequently we have added a little co-existence with Excel spreadsheets in today´s public online release of Product Data Lake version 1.10.

PDL version 1 10

The challenge is that product information is multi-dimensional as we for example have products and their attributes typically represented in multiple languages. Also, each product group has its collection of attributes that are relevant for that group of products.

Spreadsheets are basically two dimensional – rows and columns.

In Product Data Lake version 1.10 we have included a data entry sheet that mirrors spreadsheets. You can upload a two-dimensional spreadsheet into a given product group and language, and you can download that selection into a spreadsheet.

This functionality can typically be used by the original supplier of product information – the manufacturer. This simple representation of data will then be part of the data lake organisation of varieties of product information supplemented by digital assets, product relationships and much more.

1,000 Blog Posts and More to Come

number_1000I just realized that this post will be number 1,000 published on this blog. So, let me not say something new but just recap a little bit on what it has been all about in the last nearly 10 years of running a blog on some nerdy stuff.

Data quality has been the main theme. When writing about data quality one will not avoid touching Master Data Management (MDM). In fact, the most applied category used here on this site, with 464 and counting entries, is Master Data.

The second most applied category on this blog is, with 219 entries, Data Architecture.

The most applied data quality activity around is data matching. As this is also where I started my data quality venture, there has been 192 posts about Data Matching.

The newest category relates to Product Information Management (PIM) and is, with 20 posts at the moment, about Product Data Syndication.

Even though that data quality is a serious subject, you must not forget to have fun. 66 posts, including a yearly April Fools post, has been categorized as Supposed to be a Joke.

Thanks to all who are reading this blog and not least to all who from time to time takes time to make a comment, like and share.

B2C vs B2B in Product Information Management

The difference between doing Business-to-Consumer (B2C) or Business-to-Business (B2B) reflects itself in many IT enabled disciplines.

Yin and yangWhen it comes to Product Information Management (PIM) this is true as well. As PIM has become essential with the rise of eCommerce, some of the differences are inherited from the eCommerce discipline. There is a discussion on this in a post on the Shopify blog by Ross Simmonds. The post is called B2B vs B2C Ecommerce: What’s The Difference?

Some significant observations to go into the PIM realm is that for B2B, compared to B2C:

  • The audience is (on average) narrower
  • The price is (on average) higher
  • The decision process is (on average) more thoughtful

How these circumstances affect the difference for PIM was exemplified here on the blog in the post Work Clothes versus Fashion: A Product Information Perspective.

To sum up the differences I would say that some of the technology you need, for example PIM solutions, is basically the same but the data to go into these solutions must be more elaborate and stringent for B2B. This means that for B2B, compared to B2C, you (on average) need:

  • More complete and more consistent attributes (specifications, features, properties) for each product and these should be more tailored to each product group.
  • More complete and consistent product relations (accessories, replacements, spare parts) for each product.
  • More complete and consistent digital assets (images, line drawings, certificates) for each product.

How to achieve that involves deep collaboration in the supply chains of manufacturers, distributors and merchants. The solutions for that was examined in the post The Long Tail of Product Data Synchronization.

The Long Tail of Product Data Synchronization

When discussing with peers and interested parties about Product Data Lake, some of the alternatives are often brought up. These are EDI and GDSN.

So, what is the difference between those services and Product Data Lake.

Electronic Data Interchange (EDI) is the concept of businesses electronically communicating information that was traditionally communicated on paper, such as purchase orders and invoices. EDI also has a product catalog functionality encompassing:

  • Seller name and contact information
  • Terms of sale information, including discounts available
  • Item identification and description
  • Item physical details including type of packaging
  • Item pricing information including quantity and unit of measure

The Global Data Synchronization Network (GDSN) is an internet-based, interconnected network of interoperable data pools and a global registry known as the GS1 Global Registry, that enables companies around the globe to exchange standardised and synchronised supply chain data with their trading partners using a standardised Global Product Classification (GPC).

This service focuses on retail, healthcare, food-service and transport / logistics. In some geographies GS1 is also targeting DIY – do it yourself building materials and tools for consumers.

Product Data Lake is a cloud service for sharing product information (product data syndication) in the business ecosystems of manufacturers, distributors / wholesalers, merchants, marketplaces and large end users of product information.

Our vision is that Product Data Lake will be the process driven key service for exchanging any sort of product information within business ecosystems all over the world, with the aim of optimally assist self-service purchase – both B2C and B2B – of every kind of product.

In that way, Product Data Lake is the long tail of product data synchronization supplementing EDI and GDSN for a long range of product groups, product attributes, digital assets, product relationships and product classification systems:

EDI GDSN PDLFind out more in the Product Data Lake Overview.