The World of Reference Data

Google EarthReference Data Management (RDM) is an evolving discipline within data management. When organizations mature in the reference data management realm we often see a shift from relying on internally defined reference data to relying on externally defined reference data. This is based on the good old saying of not to reinvent the wheel and also that externally defined reference data usually are better in fulfilling multiple purposes of use, where internally defined reference data tend to only cater for the most important purpose of use within your organization.

Then, what standard to use tend to be a matter of where in the world you are. Let’s look at three examples from the location domain, the party domain and the product domain.

Location reference data

If you read articles in English about reference data and ensuring accuracy and other data quality dimensions for location data you often meet remarks as “be sure to check validity against US Postal Services” or “make sure to check against the Royal Mail PAF File”. This is all great if all your addresses are in the United States or the United Kingdom. If all your addresses are in another country, there will in many cases be similar services for the given country. If your address are spread around the world, you have to look further.

There are some Data-as-a-Service offerings for international addresses out there. When it comes to have your own copy of location reference data the Universal Postal Union has an offering called the Universal POST*CODE® DataBase. You may also look into open data solutions as GeoNames.

Party reference data

Within party master data management for Business-to-Business (B2B) activities you want to classify your customers, prospects, suppliers and other business partners according to what they do, For that there are some frequently used coding systems in areas where I have been:

  • Standard Industrial Classification (SIC) codes, the four-digit numerical codes assigned by the U.S. government to business establishments.
  • The North American Industry Classification System (NAICS).
  • NACE (Nomenclature of Economic Activities), the European statistical classification of economic activities.

As important economic activities change over time, these systems change to reflect the real world. As an example, my Danish company registration has changed NACE code three times since 1998 while I have been doing the same thing.

This doesn’t make conversion services between these systems more easy.

Product reference data

There are also a good choice of standardized and standardised classification systems for product data out there. To name a few:

  • TheUnited Nations Standard Products and Services Code® (UNSPSC®), managed by GS1 US™ for the UN Development Programme (UNDP).
  • eCl@ss, who presents themselves as: “THE cross-industry product data standard for classification and clear description of products and services that has established itself as the only ISO/IEC compliant industry standard nationally and internationally”. eCl@ss has its main support in Germany (the home of the Mercedes E-Class).

In addition to cross-industry standards there are heaps of industry specific international, regional and national standards for product classification.

Bookmark and Share

Data Quality: The Union of First Time Right and Data Cleansing

The other day Joy Medved aka @ParaDataGeek made this tweet:

https://twitter.com/ParaDataGeek

Indeed, upstream prevention of bad data to enter our databases is sure the better way compared to downstream data cleaning. Also real time enrichment is better than enriching long time after data has been put to work.

That said, there are situations where data cleaning has to be done. These reasons were examined in the post Top 5 Reasons for Downstream Cleansing. But I can’t think of many situations, where a downstream cleaning and/or enrichment operation will be of much worth if it isn’t followed up by an approach to getting it first time right in the future.

If we go a level deeper into data quality challenges, there will be some different data quality dimensions with different importance to various data domains as explored in the post Multi-Domain MDM and Data Quality Dimensions.

With customer master data we most often have issues with uniqueness and location precision. While I have spend many happy years with data cleansing, data enrichment and data matching tools, I have during the last couple of years been focusing on a tool for getting that first time right.

Product master data are often marred by issues with completeness and (location) conformity. The situation here is that tools and platforms for mastering product data are focussed on what goes on inside a given organization and not so much about what goes on between trading partners. Standardization seems to be the only hope. But that path is too long to wait for and may in some way be contradicting the end purpose as discussed under the post Image Coming Soon.

So in order to have a first time right solution for product master data sharing, I have embarked on a journey with a service called the Product Data Lake. If you want to join, you are most welcome.

PS: The product data lake also has the capability of catching up with the sins of the past.

Bookmark and Share

Multi-Domain MDM and Data Quality Dimensions

The most frequently mentioned domains within Master Data Management (MDM) are customer, product and location. Data quality is a core discipline when working with MDM. In data quality we talk about different dimensions as uniqueness, relevance, completeness, timeliness, precision, conformity and consistency.

While these data quality dimensions apply to all domains of MDM, some different dimensions apply a bit more to one of the domains or the intersections of the domains.

Below is a figure with an attempt to illustrate where the dimensions belong the most:

Multi-Domain MDM and Data Quality Dimensions

Uniqueness is the most addressed data quality dimension when it comes to customer master data. Customer master data are often marred by duplicates, meaning two or more database rows describing the same real world entity. There are several remedies around to cure that pain. These remedies are explored in the post The Good, Better and Best Way of Avoiding Duplicates.

With product master data, uniqueness is a less frequent issue. However, completeness is often a big pain. One reason is that completeness means different requirements for different categories of products as explained in the post Hierarchical Completeness within Product Information Management.

When working with location master data consistency can be a challenge. Addressing, so to speak, the different postal address formats around the world is certainly not a walkover. Even google maps does not have all the right answers as told in the post Sometimes Big Brother is Confused.

In the intersection between the location domain and the customer domain the data quality dimension called precision can be hard to manage as reported in the post A Universal Challenge. What is relevant to know about your customers and what is relevant to tell about your products are essential questions in the intersection of the customer and product master data domains.

Conformity of product data is related to locations. Take unit measurement. In the United States the length of a small thing will be in inches. In most of the rest of the world it will be in centimetres. In the UK you can never know.

Timeliness is the everlasting data quality dimension all over.

Bookmark and Share

CDI, PIM, MDM and Beyond

The TLAs (Three Letter Acronyms) in the title of this blog post stands for:

  • Customer Data Integration
  • Product Information Management
  • Master Data Management

CDI and PIM are commonly seen as predecessors to MDM. For example, the MDM Institute was originally called the The Customer Data Integration Institute and still have this website: http://www.tcdii.com/.

Today Multi-Domain MDM is about managing customer, or rather party, master data together with product master data and other master data domains as visualized in the post A Master Data Mind Map. Some of the most frequent other master domains are location master data and asset master data, where the latter one was explored in the post Where is the Asset? A less frequent master data domain is The Calendar MDM Domain.

QuadrantYou may argue that PIM (Product Information Management) is not the same as Product MDM. This question was examined in the post PIM, Product MDM and Multi-Domain MDM. In my eyes the benefits of keeping PIM as part of Multi-Domain MDM are bigger than the benefits of separating PIM and MDM. It is about expanding MDM across the sell-side and the buy-side of the business eventually by enabling wide use of customer self-service and supplier self-service.

The external self-service theme will in my eyes be at the centre of where MDM is going in the future. In going down that path there will be consequences for how we see data governance as discussed in the post Data Governance in the Self-Service Age. Another aspect of how MDM is going to be seen from the outside and in is the increased use of third party reference data and the link between big data and MDM as touched in the post Adding 180 Degrees to MDM.

Besides Multi-Domain MDM and the links between MDM and big data a much mentioned future trend in MDM is doing MDM in the cloud. The latter is in my eyes a natural consequence of the external self-service themes and increased use of third party reference data which all together with the general benefits of the SaaS (Software as a Service) and DaaS (Data as a Service) concepts will make MDM morph into something like MDaaS (Master Data as a Service) – an at least nearly ten year old idea by the way, as seen in this BeyeNetwork article by Dan E Linstedt.

Bookmark and Share

Related Parties, Products and Locations

Managing relationships between entities is a very important part of Master Data Management (MDM) as told in the post Another Facet of MDM: Master Relationship Management.

puzzleThere are relationships between entities within the single MDM domains and there are relationships between entities across multiple MDM domains.

Related Parties

Within customer (or rather party) MDM establishing the relationships between entities heavily increases the value of the data assets. Examples are:

  • In B2B (Business-to-Business) environments knowing about company family trees supports both analytic and operational challenges. That knowledge is often provided by enriching data from third party data providers, but as most things in life there is no silver bullet available, as the real world is quite complex and in no way fully covered by any provider I know about.
  • In B2C (Business-to-Consumer) environments knowing about how individuals are related in households is key to many analytic and operational issues too. Here having high quality location data is a necessity.

Related Products

In today’s multi-channel world there is a rush for getting product entities enriched with a myriad of attributes to support customer self-service and thus as a minimum mimicking the knowledge of the traditional sales person in a brick and mortar store.

But we also need to mimic that sales persons knowledge about how products relates. That knowledge can be collected in different ways:

  • From the manufacturer of the product. This source is often good when it comes to product relationship types as accessory and replacement (succession).
  • From the customer. We know this approach from the online sales trick prompting us with the message “People who bought A also bought B”.
  • From internal considerations. Facilitating up-sell can be done by enhancing product data with that kind of product relations.

Multi-Domain Relations

Here we may have:

Bookmark and Share

PIM, Product MDM and Multi-Domain MDM

Over on the Informatica Perspectives blog Monica McDonnell of Informatica seems to be determined to separate Product Information Management (PIM) and Product Master Data Management (Product MDM) as we now have the second attempt in the post PIM is not Product MDM Part 2.

I can easily see the reason for this quest for Informatica, as Informatica will very much like to position the Heiler acquisition as an Informatica Multi-Domain MDM aware PIM solution as mentioned in the post MDM Aware MDM Solutions.

There will always be pros and cons for having capabilities delivered in smaller best of breed packages opposed to in larger integrated packages. On the MDM market the vendors pitch their offerings according to how they got there. SAP is using Hybris as an eCommerce focused PIM add-on to SAP. On the other hand Stibo Systems and Riversand have been adding MDM to PIM and now adds Multi-Domain to MDM as reported in the post The second part of the Multi-Domain MDM Magic Quadrant is out.

In the PIM / Product MDM realm we have several other considerations on how to address different disciplines with technology support. An important capability within PIM is Digital Asset Management (DAM) as described in the post Digital Assets and Product MDM. DAM can be a separate application or part of PIM / Product MDM. Technology support for Data Governance could also come separately as reported in the post Data governance tools: The new snake oil?

QuadrantNow, back to PIM versus Product MDM. I’m not sure it is wise to divorce these two. It seems to be a kind of back looking exercise. I would like to marry them as part of looking forward in a multi-domain MDM world. To catch up on Monica’s arguments PIM has been much about the sell-side of things. I think we should be better at integrating the buy-side and the sell-side of Product MDM / PIM as examined in the post An Alternative Multi-Domain MDM Quadrant.

Bookmark and Share

The Calendar MDM Domain

When we talk about multi-domain Master Data Management (MDM) we usually recognize party (customer, supplier, employee) and product as the most predominant domains. The location domain is also widely understood as a separate domain. Further we can discus about assets as done in the post Where is the Asset.

Then there is the Calendar domain. In many industries calendar may just be seen as configuration data. However, in some industries calendar is a true master data domain.

CalendarOne example is in postal services as mentioned in the post The Path to Multi-Domain MDM.

Another example is public transit, an industry I have worked with in the last 16 years. Managing calendar data has many challenges in running a business or authority in public transit. Some tricky points are:

  • Keeping track of day types where different partners sees the week and public holidays differently, not at least when crossing borders.
  • Changing the day not necessarily at midnight, but at various times.
  • Assigning and monitoring services to a schedule under these circumstances.

In public transit managing calendar data has the same issues as the other more common master data domains, as calendar data may be represented differently in applications across the IT landscape stretching from back-office systems to mobile devices on-board vehicles, as examined in the post Going in the Wrong Direction.

Bookmark and Share

The second part of the Multi-Domain MDM Magic Quadrant is out

Gartner, the analyst firm, recently released their magic quadrant for Master Data Management (MDM) of customer data solutions 2014 as reported in the post Customer MDM Magic Wordles.

Now the quadrant for Master Data Management of product data solutions 2014 is out too, so we can overlay the two quadrants and see how multi-domain Master Data Management (MDM) solutions are doing in terms of who are performing well with product master data and who are performing well with customer master data.

If we focus on leading vendors with differences in quadrant positioning, Informatica is better positioned with customer master data (leader) than with product master data (visionary and niche with two different products). Stibo Systems and Riversand are positioned very well within product master data (leaders) but not positioned at all with customer master data, though both vendors are naming themselves as multi-domain MDM solution providers and surely have such capabilities. I personally worked on the multi-domain roadmap with one of them some years ago.

MDM Brands
This is not the quadrant. Just some vendor names.

 

As every year the vendors makes press releases about the quadrant.

Upen Varanasi, CEO of Riversand, has commented in this way: “In this new Digital world, accurate product and other master information is a foundation for our customers’ major business initiatives, which requires a comprehensive, highly scalable and flexible MDM solution with full multi-domain capabilities.” See the full press release from Riversand here.

Mikael Lyngsø, CEO of Stibo Systems says: “Every member of the Stibo Systems’ team remains committed to providing our customers and partners with cutting edge solutions that not only meet their most pressing business issues but also create the most lasting and demonstrable value.” The about the company section has this bold statement: “Stibo Systems is the global leader in multidomain Master Data Management (MDM) solutions.” Read the full press release here.

Rob Karel, vice president, Product Strategy and Product Marketing, MDM, Informatica says: “We continue to deliver a multidomain MDM solution that, in combination with Informatica PIM, delivers end-to-end customer, product and supplier information governance and stewardship. Delivering complete, reliable and consistent product information – across every channel – is the key to a great customer experience.” Informatica also kindly provides a free copy of the report. Get the Magic Quadrant for Master Data Management of Product Data Solutions 2014 here.

Bookmark and Share

Customer MDM Magic Wordles

The Gartner Magic Quadrant for Master Data Management of Customer Data 2014 is out. One place to get it for free is by using the Informatica registry style page offered in the Informatica communication here.

So, what is good and what is bad when looking for a MDM vendor if you are focusing on customer data right now?

Some words in the strengths assessment of vendors are:

Magic plus

Some words in the cautions assessment of vendors are:

Magic minus

Bookmark and Share

The Path to Multi-Domain MDM

Multi-Domain Master Data Management (MDM) is about dealing with master data in several different data domains as customer (or party), product, location, asset or calendar. The typical track today is starting in one domain. There are many, even contradicting, good reasons for that.

Depending on in what industry vertical you are the main pain points that urges you to start doing MDM belongs to either of the MDM domains. Customer MDM is the most common one typically seen where you have a large number of customer records in your databases. We see starting with product MDM in organizations with many products in the databases. This is for example the case for large retailers and distributors.

Master DataIt can be other domains as well. One example from a MDM conference I recall is that Royal Mail in the UK started with the calendar domain. Besides that this domain had pain points for that organization a reason to do that was to start small before taking on the big chunks.

Even though you start with one domain, you must think about the end state. One thing to consider multi-domain wise is the data governance part, as you will not come out well if you choose different approaches to data governance for each master data domain. Of course, the technology part is there too. Choosing a solution that eventually will take you all the way is appealing to many organizations looking for a MDM platform.

Another approach to multi-domain MDM can be through what I know at least one MDM tool vendor calls Evolutionary MDM™. But we can call it other things. Agile or lean MDM for example. Using that approach you do not solve everything within one domain before going on to the next one.

It is about eliminating as many pain points as possible in the shortest feasible time-frame.

Bookmark and Share