Master data and reference data are two types of data that are shared enterprise wide and even in the wider business ecosystem where your company operates.
In your organization and business ecosystem the data that is shared is basically held in applications like ERP and CRM solutions that have come with a data model provided by the solution vendor. These data models are built to facilitate the operations that is supported by each of these applications and is a data model that must suite every kind of organization.
A core reason of being for a Master Data Management (MDM) solution is to provide a data store where master data is represented in a way that reflects the business model of your organization. This data store serves many purposes as for example being a data integration hub and the place where the results of data quality improvements (eg de-duplication) are stored.
Such a data hub can go beyond master data entities and represent reference data and critical application data that is shared across your organization and the wider business ecosystem within a given industry.
Learn more about flexible data models in a data hub context in the Semarchy whitepaper authored by me and titled The Intelligent Data Hub: Taking MDM to the Next Level.
There are intersections between data modelling and data quality. In examining those we can use a data quality mind map published recently on this blog:
Data Modelling and Data Quality Dimensions:
Some data quality dimensions are closely related to data modelling and a given data model can impact these data quality dimensions. This is the case for:
- Data integrity, as the relationship rules in a traditional entity-relation based data model fosters the integrity of the data controlled in databases. The weak sides are, that sometimes these rules are too rigid to describe actual real-world entities and that the integrity across several databases is not covered. To discover the latter one, we may use data profiling methods.
- Data validity, as field definitions and relationship rules controls that only data that is considered valid can enter the database.
Some other data quality dimensions must be solved with either extended data models and/or alternative methodologies. This is the case for:
- Data completeness:
- A common scenario is that for example a data model born in the United States will set the state field within an address as mandatory and probably to accept only a value from a reference list of 50 states. This will not work in the rest of world. So, in order to not getting crap or not getting data at all, you will either need to extend the model or loosening the model and control completeness otherwise.
- With data about products the big pain is that different groups of products require different data elements. This can be solved with a very granular data model – with possible performance issues, or a very customized data model – with scalability and other issues as a result.
- Data uniqueness: A common scenario here is that names and addresses can be spelled in many ways despite that they reflect the same real-world entity. We can use identity resolution (and data matching) to detect this and then model how we link data records with real world duplicates together in a looser or tighter way.
Some of the emerging technologies in the data storing realm are presenting new ways of solving the challenges we have with data quality and traditional entity-relationship based data models.
Graph databases and document databases allows for describing and operating data models better aligned with the real world. This topic was examined in the post Encompassing Relational, Document and Graph the Best Way.
In the Product Data Lake venture I am working with right now we are also aiming to solve the data integrity, data validity and data completeness issues with product data (or product information if you like) using these emerging technologies. This includes solving issues with geographical diversity and varying completeness requirements through a granular data model that is scalable, not only seen within a given company but also across a whole business ecosystem encompassing many enterprises belonging to the same (data) supply chain.
What is data quality anyway? This question has been touched many times on this blog.
Data quality can be assessed using a range of data quality dimensions – the ones coloured green in the above mind map. These dimensions relate in different ways to various data domains as examined in the post Multi-Domain MDM and Data Quality Dimensions.
Data quality can be managed using a toolbox of sub disciplines – as the ones coloured turquoise in the above mind map. The reasons for data cleansing was discussed in the blog post Top 5 Reasons for Downstream Cleansing. Data profiling was visited in the post Data Quality Tools Revealed along with data matching. The relationship between data matching and identity resolution was recently described in the post Data Matching and Real-World Alignment.
The data quality discipline is closely related to – the yellow coloured – other disciplines as data modelling, Reference Data Management (RDM), Master Data Management (MDM), metadata management and – if not a sub discipline of – data governance as also shown in the post A Data Management Mind Map.
This blog is about Data Quality 3.0, Product Data Syndication Freedom, Multienterprise MDM – and many more data management topics.
These topics and the many more data management topics I have been around looks like the mind map below:
If I can be of any help to you in the data management realm, here are some Popular Offerings.
Master Data Management (MDM) is a lot about data modelling. When you buy a MDM tool it will have some implications for your data model. Here are three kinds of data models that may come with a tool:
An off-the-shelf model
This kind is particularly popular with customer and other party master data models. Core party data are pretty much the same to every company. We have national identification numbers, names, addresses, phone numbers and that kind of stuff where you do not have to reinvent the wheel.
Also, you will have access to rich reference data with a model such as address directories (which you may regard as belonging to a separate location domain), business directories (as for example the Dun & Bradstreet Worldbase) and in some countries citizen directories as well. MDM tools may come with a model shaped for these sources.
Tools which are optimized for data matching, including deduplication of party master data, will often shoehorn your party master data into a data model feasible for that.
A buildable model
When it comes to multi-domain MDM we will deal with entities that are not common to everyone.
Here a capability to build your model in the MDM tool is needed. One such tool I have worked with is Semarchy. Here semi-technical people are able to build and deploy incrementally more complex data models, that are default equipped with needed functionality around handling a golden copy and auditing data onboarding and changing.
A dynamic model
Product Information Management (PIM) requires that your end users can build the model on the fly, as product data are so different between product groups.
In my current venture called Product Data Lake the model has these main entities:
This model resembles the data model in most PIM solutions (and PIM based MDM solutions), except that we have the party and their two-way partnerships at the top, as Product Data Lake takes care of exchanging data between inhouse PIM solutions at trading partners participating in business ecosystems.
Party and product are the most frequent master data domains around.
Often you meet party as one of the most frequent party roles being customer and supplier (or vendor) or by another term related to the context as for example citizen, patient, member, student, passenger and many more. These are the people and legal entities we are interacting with and with whom we usually exchange money – and information.
Product (or material) is the things we buy, make and sell. The goods (or services) we exchange.
In my current venture called Product Data Lake our aim to serve the exchange of information about products between trading partners who are customers and suppliers in business ecosystems.
For that, we have been building a data model. Below you see our first developed conceptual data model, which has party and product as the core entities.
As this is a service for business ecosystems, another important entity is the partnership between suppliers and customers of products and the information about the products.
The product link entity in this data model is handling the identification of products by the pairs of trading partners. In the same way, this data model has link entities between the identification of product attributes at pair of trading partners (build on same standards or not) as well as digital asset types.
If you are offering product information management services, at thus being a potential Product Data Lake ambassador, or you are part of a business ecosystem with trading partners, I will be happy to discus with you about adding handling of trading partnerships and product information exchange to your current model.
Right now there is a good discussion going on in the Multi-Domain MDM Group on LinkedIn. A member asks:
“I’d like to hear back from anyone who has implemented party master data in either a single, unified schema or separate, individual schemas (Vendor, Customer, etc.).
What were the pros and cons of your approach? Would you do it the same way if you had it to do again?”
This is a classic consideration at the heart of multi-domain MDM. As I see it, and what I advise my clients to do, is to have a common party (or business partner) structure for identification, names, addresses and contact data. This should be supported by data quality capabilities strongly build on external reference data (third party data). Besides this common structure, there should be specific structures for customer, vendor/supplier and other party roles.
This subject was also recently examined here on the blog in the post Multi-Side MDM.
What is your opinion and experience with this question? Please have your say either here on the blog or in the LinkedIn Multi-Domain MDM Group.
Every organization needs Master Data Management (MDM). But does every organization need a MDM tool?
In many ways the MDM tools we see on the market resembles common database tools. But there are some things the MDM tools do better than a common database management tool. The post called The Database versus the Hub outlines three such features being:
- Controlling hierarchical completeness
- Achieving a Single Business Partner View
- Exploiting Real World Awareness
Controlling hierarchical completeness and achieving a single business partner view is closely related to the two things data quality tools do better than common database systems as explained in the post Data Quality Tools Revealed. These two features are:
- Data profiling and
- Data matching
Specialized data profiling tools are very good at providing out-of-the-box functionality for statistical summaries and frequency distributions for the unique values and formats found within the fields of your data sources in order to measure data quality and find critical areas that may harm your business. These capabilities are often better and easier to use than what you find inside a MDM tool. However, in order to measure the improvement in a business context and fix the problems not just in a one-off you need a solid MDM environment.
When it comes to data matching we also still see specialized solutions that are more effective and easier to use than what is typically delivered inside MDM solutions. Besides that, we also see business scenarios where it is better to do the data matching outside the MDM platform as examined in the post The Place for Data Matching in and around MDM.
Looking at the single MDM domains we also see alternatives. Customer Relation Management (CRM) systems are popular as a choice for managing customer master data. But as explained in the post CRM systems and Customer MDM: CRM systems are said to deliver a Single Customer View but usually they don’t. The way CRM systems are built, used and integrated is a certain track to create duplicates. Some remedies for that are touched in the post The Good, Better and Best Way of Avoiding Duplicates.
With product master data we also have Product Information Management (PIM) solutions. From what I have seen PIM solutions has one key capability that is essentially different from a common database solution and how many MDM solutions, that are built with party master data in mind, has. That is a flexible and super user angled way of building hierarchies and assigning attributes to entities – in this case particularly products. If you offer customer self-service, like in eCommerce, with products that have varying attributes you need PIM functionality. If you want to do this smart, you need a collaboration environment for supplier self-service as well as pondered in the post Chinese Whispers and Data Quality.
All in all the necessary components and combinations for a suitable MDM toolbox are plentiful and can be obtained by one-stop-shopping or by putting some best-of-breed solutions together.
Usually data models are made to fit a specific purpose of use. As reported in the post A Place in Time this often leads to data quality issues when the data is going to be used for purposes different from the original intended. Among many examples we not at least have heaps of customer tables like this one:
Compared to how the real world works this example has some diversity flaws, like:
- state code as a key to a state table will only work with one country (the United States)
- zipcode is a United States description only opposite to the more generic “Postal Code”
- fname (First name) and lname (Last name) don’t work in cultures where given name and surname have the opposite sequence
- The length of the state, zipcode and most other fields are obviously too small almost anywhere
More seriously we have:
- fname and lname (First name and Last name) and probably also phone should belong to an own party entity acting as a contact related to the company
- company name should belong to an own party entity acting in the role as customer
- address1, address2, city, state, zipcode should belong to an own place entity probably as the current visiting place related to the company
In my experience looking at the real world will help a lot when making data models that can survive for years and stand use cases different from the one in immediate question. I’m not talking about introducing scope creep but just thinking a little bit about how the real world looks like when you are modelling something in that world, which usually is the case when working with Master Data Management (MDM).
A challenge within many disciplines is easily to explain what the discipline is about and that certainly is true for Master Data Management (MDM) too as we often have the question: What is master data?
A good short explanation is:
“The description of the who, what and where in transaction data”.
It could also, with help from Wikipedia, be:
“Information that is key to the operation of a business”.
From Gartner (the analyst firm) we have:
“The consistent and uniform set of identifiers and extended attributes that describes the core entities of the enterprise”.
The latter one I would not try on friends and relatives though.
Examples are often a good way to go. Visualization is great too. So, therefore I have played with a mind map of what master data entities may be: