There are intersections between data modelling and data quality. In examining those we can use a data quality mind map published recently on this blog:
Data Modelling and Data Quality Dimensions:
Some data quality dimensions are closely related to data modelling and a given data model can impact these data quality dimensions. This is the case for:
Data integrity, as the relationship rules in a traditional entity-relation based data model fosters the integrity of the data controlled in databases. The weak sides are, that sometimes these rules are too rigid to describe actual real-world entities and that the integrity across several databases is not covered. To discover the latter one, we may use data profiling methods.
Data validity, as field definitions and relationship rules controls that only data that is considered valid can enter the database.
Some other data quality dimensions must be solved with either extended data models and/or alternative methodologies. This is the case for:
A common scenario is that for example a data model born in the United States will set the state field within an address as mandatory and probably to accept only a value from a reference list of 50 states. This will not work in the rest of world. So, in order to not getting crap or not getting data at all, you will either need to extend the model or loosening the model and control completeness otherwise.
With data about products the big pain is that different groups of products require different data elements. This can be solved with a very granular data model – with possible performance issues, or a very customized data model – with scalability and other issues as a result.
Data uniqueness: A common scenario here is that names and addresses can be spelled in many ways despite that they reflect the same real-world entity. We can use identity resolution (and data matching) to detect this and then model how we link data records with real world duplicates together in a looser or tighter way.
Some of the emerging technologies in the data storing realm are presenting new ways of solving the challenges we have with data quality and traditional entity-relationship based data models.
In the Product Data Lake venture I am working with right now we are also aiming to solve the data integrity, data validity and data completeness issues with product data (or product information if you like) using these emerging technologies. This includes solving issues with geographical diversity and varying completeness requirements through a granular data model that is scalable, not only seen within a given company but also across a whole business ecosystem encompassing many enterprises belonging to the same (data) supply chain.
Data quality can be assessed using a range of data quality dimensions – the ones coloured green in the above mind map. These dimensions relate in different ways to various data domains as examined in the post Multi-Domain MDM and Data Quality Dimensions.
The data quality discipline is closely related to – the yellow coloured – other disciplines as data modelling, Reference Data Management (RDM), Master Data Management (MDM), metadata management and – if not a sub discipline of – data governance as also shown in the post A Data Management Mind Map.
Master Data Management (MDM) is a lot about data modelling. When you buy a MDM tool it will have some implications for your data model. Here are three kinds of data models that may come with a tool:
An off-the-shelf model
This kind is particularly popular with customer and other party master data models. Core party data are pretty much the same to every company. We have national identification numbers, names, addresses, phone numbers and that kind of stuff where you do not have to reinvent the wheel.
Also, you will have access to rich reference data with a model such as address directories (which you may regard as belonging to a separate location domain), business directories (as for example the Dun & Bradstreet Worldbase) and in some countries citizen directories as well. MDM tools may come with a model shaped for these sources.
Tools which are optimized for data matching, including deduplication of party master data, will often shoehorn your party master data into a data model feasible for that.
A buildable model
When it comes to multi-domain MDM we will deal with entities that are not common to everyone.
Here a capability to build your model in the MDM tool is needed. One such tool I have worked with is Semarchy. Here semi-technical people are able to build and deploy incrementally more complex data models, that are default equipped with needed functionality around handling a golden copy and auditing data onboarding and changing.
A dynamic model
Product Information Management (PIM) requires that your end users can build the model on the fly, as product data are so different between product groups.
This model resembles the data model in most PIM solutions (and PIM based MDM solutions), except that we have the party and their two-way partnerships at the top, as Product Data Lake takes care of exchanging data between inhouse PIM solutions at trading partners participating in business ecosystems.
Party and product are the most frequent master data domains around.
Often you meet party as one of the most frequent party roles being customer and supplier (or vendor) or by another term related to the context as for example citizen, patient, member, student, passenger and many more. These are the people and legal entities we are interacting with and with whom we usually exchange money – and information.
Product (or material) is the things we buy, make and sell. The goods (or services) we exchange.
In my current venture called Product Data Lake our aim to serve the exchange of information about products between trading partners who are customers and suppliers in business ecosystems.
For that, we have been building a data model. Below you see our first developed conceptual data model, which has party and product as the core entities.
As this is a service for business ecosystems, another important entity is the partnership between suppliers and customers of products and the information about the products.
The product link entity in this data model is handling the identification of products by the pairs of trading partners. In the same way, this data model has link entities between the identification of product attributes at pair of trading partners (build on same standards or not) as well as digital asset types.
If you are offering product information management services, at thus being a potential Product Data Lake ambassador, or you are part of a business ecosystem with trading partners, I will be happy to discus with you about adding handling of trading partnerships and product information exchange to your current model.
“I’d like to hear back from anyone who has implemented party master data in either a single, unified schema or separate, individual schemas (Vendor, Customer, etc.).
What were the pros and cons of your approach? Would you do it the same way if you had it to do again?”
This is a classic consideration at the heart of multi-domain MDM. As I see it, and what I advise my clients to do, is to have a common party (or business partner) structure for identification, names, addresses and contact data. This should be supported by data quality capabilities strongly build on external reference data (third party data). Besides this common structure, there should be specific structures for customer, vendor/supplier and other party roles.
This subject was also recently examined here on the blog in the post Multi-Side MDM.
Every organization needs Master Data Management (MDM). But does every organization need a MDM tool?
In many ways the MDM tools we see on the market resembles common database tools. But there are some things the MDM tools do better than a common database management tool. The post called The Database versus the Hub outlines three such features being:
Controlling hierarchical completeness
Achieving a Single Business Partner View
Exploiting Real World Awareness
Controlling hierarchical completeness and achieving a single business partner view is closely related to the two things data quality tools do better than common database systems as explained in the post Data Quality Tools Revealed. These two features are:
Data profiling and
Specialized data profiling tools are very good at providing out-of-the-box functionality for statistical summaries and frequency distributions for the unique values and formats found within the fields of your data sources in order to measure data quality and find critical areas that may harm your business. These capabilities are often better and easier to use than what you find inside a MDM tool. However, in order to measure the improvement in a business context and fix the problems not just in a one-off you need a solid MDM environment.
When it comes to data matching we also still see specialized solutions that are more effective and easier to use than what is typically delivered inside MDM solutions. Besides that, we also see business scenarios where it is better to do the data matching outside the MDM platform as examined in the post The Place for Data Matching in and around MDM.
Looking at the single MDM domains we also see alternatives. Customer Relation Management (CRM) systems are popular as a choice for managing customer master data. But as explained in the post CRM systems and Customer MDM: CRM systems are said to deliver a Single Customer View but usually they don’t. The way CRM systems are built, used and integrated is a certain track to create duplicates. Some remedies for that are touched in the post The Good, Better and Best Way of Avoiding Duplicates.
With product master data we also have Product Information Management (PIM) solutions. From what I have seen PIM solutions has one key capability that is essentially different from a common database solution and how many MDM solutions, that are built with party master data in mind, has. That is a flexible and super user angled way of building hierarchies and assigning attributes to entities – in this case particularly products. If you offer customer self-service, like in eCommerce, with products that have varying attributes you need PIM functionality. If you want to do this smart, you need a collaboration environment for supplier self-service as well as pondered in the post Chinese Whispers and Data Quality.
All in all the necessary components and combinations for a suitable MDM toolbox are plentiful and can be obtained by one-stop-shopping or by putting some best-of-breed solutions together.
Usually data models are made to fit a specific purpose of use. As reported in the post A Place in Time this often leads to data quality issues when the data is going to be used for purposes different from the original intended. Among many examples we not at least have heaps of customer tables like this one:
Compared to how the real world works this example has some diversity flaws, like:
state code as a key to a state table will only work with one country (the United States)
zipcode is a United States description only opposite to the more generic “Postal Code”
fname (First name) and lname (Last name) don’t work in cultures where given name and surname have the opposite sequence
The length of the state, zipcode and most other fields are obviously too small almost anywhere
More seriously we have:
fname and lname (First name and Last name) and probably also phone should belong to an own party entity acting as a contact related to the company
company name should belong to an own party entity acting in the role as customer
address1, address2, city, state, zipcode should belong to an own place entity probably as the current visiting place related to the company
In my experience looking at the real world will help a lot when making data models that can survive for years and stand use cases different from the one in immediate question. I’m not talking about introducing scope creep but just thinking a little bit about how the real world looks like when you are modelling something in that world, which usually is the case when working with Master Data Management (MDM).
A challenge within many disciplines is easily to explain what the discipline is about and that certainly is true for Master Data Management (MDM) too as we often have the question: What is master data?
A good short explanation is:
“The description of the who, what and where in transaction data”.
One of my pet peeves in data quality for CRM and ERP systems is the often used way at looking at entities, not at least party entities, in a flat data model as told in the post A Place in Time.
Party master data, and related location master data, will eventually be modeled in very complex models and surely we see more and more examples of that. For example I remember that I long time ago worked with the ERP system that later became Microsoft Dynamics AX. Then I had issues with the simplistic and not role aware data model. While I’m currently working in a project using the AX 2012 Address Book it’s good to see that things have certainly developed.
This blog has quite a few posts on hierarchy management in Master Data Management (MDM) and even Hierarchical Data Matching. But I have to admit that even complex relational data models and hierarchical approaches in fact don’t align completely with the real world.
I remember at this year’s MDM Summit Europe that Aaron Zornes suggested that a graph database will be the best choice for reflecting the most basic reference dataset being The Country List. Oh yes, and in master data too you should think then, though I doubt that the relational database and hierarchy management will be out of fashion for a while.
So it could be good to know if you have seen or worked with graph databases in master data management beyond representing a static analysis result as a graph database.