1st Party, 2nd Party and 3rd Party Master Data

Until now, much of the methodology and technology in the Master Data Management (MDM) world has been about how to optimize the use of what can be called first party master data. This is master data already collected within your organization and the approaches to MDM and the MDM solutions offered has revolved around federating internal silos and obtain a single source of truth within the corporate walls.

Besides that third-party data has been around for many years as described in the post Third-Party Data and MDM. Use of third party data in MDM has mainly been about enriching customer and supplier master data from business directories and in some degree utilizing standardized pools of product data in various solutions.

open doorUsing third party data for customer and supplier master data seems to be a very good idea as exemplified in the post Using a Business Entity Identifier from Day One. This is because customer and supplier master looks pretty much the same to every organization. With product master data this is not case and that is why third party sources for product master data may not be fully effective.

Second party data is data you get directly from the external source. With customer and supplier master data we see that approach in self-registration services. My recommendation is to combine self-registration and third party data in customer and supplier on-boarding processes. With product master data I think leaning mostly to second party connections in business ecosystems seems like the best way forward. There is more on that in a discussion on the LinkedIn  MDM – Master Data Management Group.

Bookmark and Share

Using a Business Entity Identifier from Day One

One of the ways to ensure data quality for customer – or rather party – master data when operating in a business-to-business (B2B) environment, is to on-board new entries using an external defined business entity identifier.

By doing that, you tackle some of the most challenging data quality dimensions as:

  • Uniqueness, by checking if a business with that identifier already exist in your internal master data. This approach is superior to using data matching as explained in the post The Good, Better and Best Way of Avoiding Duplicates.
  • Accuracy, by having names, addresses and other information defaulted from a business directory and thus avoiding those spelling mistakes that usually are all over in party master data.
  • Conformity, by inheriting additional data as line-of-business codes and descriptions from a business directory.

Having an external business identifier stored with your party master data helps a lot with maintaining data quality as pondered in the post Ongoing Data Maintenance.

Busienss Entity IdentifiersWhen selecting an identifier there are different options as national IDs, LEI, DUNS Number and others as explained in the post Business Entity Identifiers.

At the Product Data Lake service I am working on right now, we have decided to use an external business identifier from day one. I know this may be something a typical start-up will consider much later if and when the party master data population has grown. But, besides being optimistic about our service, I think it will be a win not to have to fight data quality issues later with guarantied increased costs.

For the identifier to use we have chosen the DUNS Number from Dun & Bradstreet. The reason is that this currently is the only worldwide covered business identifier. Also, Dun & Bradstreet offers some additional data that fits our business model. This includes consistent line-of-business information and worldwide company family trees.

Bookmark and Share

Starting up at the age of 56

It is never too late to start up, I have heard. So despite I usually brag about having +35 years of experience in the intersection of business and IT and a huge been done list in Data Quality and Master Data Management (MDM) which can get me nice consultancy engagements, a certain need on the market has been puzzling in my head for some time.

Before that, when someone asked me what to do in the MDM space I told them to create something around sharing master data between organisations. Most MDM solutions are sold to a given organization to cover the internal processes there. There are not many solutions out there that covers what is going on between organizations.

But why not do that myself? – with the help of some younger people.

FirstLogoSaveYou may have noticed, that I during the last year have been writing about something called the Product Data Lake. This has until recently mostly just been a business concept that could be presented on power point slides. So called slideware. But now it is becoming real software being deployed in the cloud.

Right now a gifted team in Vietnam, where I also am this week, is building the solution. We aim to have it ready for the first trial subscribers in August 2016. We will also be exhibiting the solution in London in late September, where we will be at the Start-up Alley in the combined Customer Contact, eCommerce and Technology for Marketing exhibition.

At home in Denmark, some young people are working on our solution too as well as the related launching activities and social media upbeat. This includes a LinkedIn company page. For continuous stories about our start-up, please follow the Product Data Lake page on LinkedIn here.

Bookmark and Share

Did You Mean Potato or Potahto?

As told in the post Where the Streets have Two Names one aspect of address validation is the fact, that in some parts of the world, a given postal address can be presented in more than one language.

I experienced that today when using Google Maps for directions to a Master Data Management (MDM) conference in Helsinki, Finland. When typing in the address I got this message:

Helsinki

The case is that the two addresses proposed by Google Maps are exactly the same address, just spelled in Swedish and Finnish, the two official languages used in this region.

I think Google Maps is an example of a splendid world-wide service. But even the best world-wide services sometimes don’t match local tailored services. This is in my experience the case when it comes to address management solutions as address validation and assistance whether they come as an integrated part of a Master Data Management (MDM) solution, a stand-alone data quality tool or a general service as Google Maps.

Using a Data Lake for Reference Data

TechTarget has recently published a definition of the term data lake.

In the explanation it is mentioned that the term data lake is being accepted as a way to describe any large data pool in which the schema and data requirements are not defined until the data is queried. The explanation also states that: “While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data.”

A data lake is an approach to overcome the known big data characteristics being volume, velocity and variety, where probably the former one being variety is the most difficult to overcome with a traditional data warehouse approach.

If we look at traditional ways of using data warehouses, this has revolved around storing internal transaction data linked to internal master data. With the raise of big data there will be a swift to encompassing more and more external data. One kind of external data is reference data, being data that typically is born outside a given organization and data that has many different purposes of use.

Big reference dataSharing data with the outside must be a part of your big data approach. This goes for including traditional flavours of big data as social data and sensor data as well what we may call big reference data being pools of global data and bilateral data as explained on this blog on the page called Data Quality 3.0. The data lake approach may very well work for big reference data as it may for other flavours of big data.

The BrightTalk community on Big Data and Data Management has a formidable collection of webinars and videos on big data and data management topics. I am looking forward to contribute there on the 25th June 2015 with a webinar about Big Reference Data.

Bookmark and Share

Is big data all about analytics?

My answer to the question in the title of this blog post is NO. In my eyes big data is not just data warehouse 3.0. It is also data quality 3.0.

The concept of the data lake is growing in popularity in the big data world and so are the counts of warnings about your data lake becoming a data swamp, a data marsh or a data cesspool. Doing analytic work on a nice data lake sounds great. Doing it in a huge swamp, a large marsh or a giant cesspool does not sound so nice.

Figure 1In nature a lake stays fresh by having good upstream supply of water and a downstream system as well. In kind of the same way your data lake should not be a closed system or a dump within your organization.

Sharing data with the outside must be a part of your big data approach. This goes for including traditional flavours of big data as social data and sensor data as well what we may call big reference data being pools of global data and bilateral data as explained on this blog on the page called Data Quality 3.0.

The BrightTalk community on Big Data and Data Management has a formidable collection of webinars and videos on big data and data management topics. I am looking forward to contribute there on the 25th June 2015 with a webinar about Big Reference Data.

Bookmark and Share

CDI, PIM, MDM and Beyond

The TLAs (Three Letter Acronyms) in the title of this blog post stands for:

  • Customer Data Integration
  • Product Information Management
  • Master Data Management

CDI and PIM are commonly seen as predecessors to MDM. For example, the MDM Institute was originally called the The Customer Data Integration Institute and still have this website: http://www.tcdii.com/.

Today Multi-Domain MDM is about managing customer, or rather party, master data together with product master data and other master data domains as visualized in the post A Master Data Mind Map. Some of the most frequent other master domains are location master data and asset master data, where the latter one was explored in the post Where is the Asset? A less frequent master data domain is The Calendar MDM Domain.

QuadrantYou may argue that PIM (Product Information Management) is not the same as Product MDM. This question was examined in the post PIM, Product MDM and Multi-Domain MDM. In my eyes the benefits of keeping PIM as part of Multi-Domain MDM are bigger than the benefits of separating PIM and MDM. It is about expanding MDM across the sell-side and the buy-side of the business eventually by enabling wide use of customer self-service and supplier self-service.

The external self-service theme will in my eyes be at the centre of where MDM is going in the future. In going down that path there will be consequences for how we see data governance as discussed in the post Data Governance in the Self-Service Age. Another aspect of how MDM is going to be seen from the outside and in is the increased use of third party reference data and the link between big data and MDM as touched in the post Adding 180 Degrees to MDM.

Besides Multi-Domain MDM and the links between MDM and big data a much mentioned future trend in MDM is doing MDM in the cloud. The latter is in my eyes a natural consequence of the external self-service themes and increased use of third party reference data which all together with the general benefits of the SaaS (Software as a Service) and DaaS (Data as a Service) concepts will make MDM morph into something like MDaaS (Master Data as a Service) – an at least nearly ten year old idea by the way, as seen in this BeyeNetwork article by Dan E Linstedt.

Bookmark and Share