MDM Tools Revealed

Every organization needs Master Data Management (MDM). But does every organization need a MDM tool?

In many ways the MDM tools we see on the market resembles common database tools. But there are some things the MDM tools do better than a common database management tool. The post called The Database versus the Hub outlines three such features being:

  • Controlling hierarchical completeness
  • Achieving a Single Business Partner View
  • Exploiting Real World Awareness

Controlling hierarchical completeness and achieving a single business partner view is closely related to the two things data quality tools do better than common database systems as explained in the post Data Quality Tools Revealed. These two features are:

  • Data profiling and
  • Data matching

Specialized data profiling tools are very good at providing out-of-the-box functionality for statistical summaries and frequency distributions for the unique values and formats found within the fields of your data sources in order to measure data quality and find critical areas that may harm your business. These capabilities are often better and easier to use than what you find inside a MDM tool. However, in order to measure the improvement in a business context and fix the problems not just in a one-off you need a solid MDM environment.

When it comes to data matching we also still see specialized solutions that are more effective and easier to use than what is typically delivered inside MDM solutions. Besides that, we also see business scenarios where it is better to do the data matching outside the MDM platform as examined in the post The Place for Data Matching in and around MDM.

Looking at the single MDM domains we also see alternatives. Customer Relation Management (CRM) systems are popular as a choice for managing customer master data.  But as explained in the post CRM systems and Customer MDM: CRM systems are said to deliver a Single Customer View but usually they don’t. The way CRM systems are built, used and integrated is a certain track to create duplicates. Some remedies for that are touched in the post The Good, Better and Best Way of Avoiding Duplicates.

integriertWith product master data we also have Product Information Management (PIM) solutions. From what I have seen PIM solutions has one key capability that is essentially different from a common database solution and how many MDM solutions, that are built with party master data in mind, has. That is a flexible and super user angled way of building hierarchies and assigning attributes to entities – in this case particularly products. If you offer customer self-service, like in eCommerce, with products that have varying attributes you need PIM functionality. If you want to do this smart, you need a collaboration environment for supplier self-service as well as pondered in the post Chinese Whispers and Data Quality.

All in all the necessary components and combinations for a suitable MDM toolbox are plentiful and can be obtained by one-stop-shopping or by putting some best-of-breed solutions together.

Happy Old New Year in Reference Data Management

Today the 14th January in our times calendar used to be the first day in the new year when the Julian calendar was used before different countries at different times shifted to the Gregorian calendar.

Firework

Such shifts in what we generally refer to as reference data is a well-known pain in data management as exemplified in the post called The Country List. Within data warehouse management, we refer to this as Slowly Changing Dimensions.

Master Data Management (MDM) and Reference Data Management (RDM) are two closely related disciplines and often we may use the terms synonymously and indeed sometimes working with the same real world entity is MDM in one context but RDM in another context.

I have worked in industries, as public transit, where the calendar and related data must be treated as master data. But surely, in many other industries this will be an overkill. However, I have seen other entities treated as a simple List of Values (LoV) where it should be handled as master data or at least more complex reference data. Latest example is plants within a global company, where the highest ambition is proposed to be a mark for active or inactive, which hardly reflect the complexity in starting or buying a plant and closing or selling the same and the data management rules according to the changing states.

So happy 14th of January even if this is not New Year to you – but hey, at least it is my birthday.

Big Data Quality, Santa Style

Previous years close to Christmas posts on this blog has been about Multi-Domain MDM, Santa Style and Data Governance, Santa Style.

julemandenSo this year it may be the time to have a closer look at big data quality, Santa style, meaning how we can imagine Santa Claus is joining the raise of big data while observing that exploiting data, big or small, is only going to add real value if you believe in data quality. Ho ho ho.

At the Santa Claus organization they have figured out, that there is a close connection between excellence in working with big data and excellence in multi-domain Master Data Management (MDM) and data governance.

Here are some of the findings in the big data paper that the Chief Data Elf just signed off:

  • The feasibility of the new algorithms for naughty or nice marking using social media listening combined with our historical records is heavily dependent on unique, accurate and timely boys and girls master data. The party data governance elf gathering will be accountable for any nasty and noisy issues.
  • Implementation of the automated present buying service based on fuzzy matching between our supplier self-service based multi-lingual product catalogue and the wish list data lake must be done in a phased schedule. The product data governance elf committee are responsible for avoiding any false positives (wrong present incidents) and decreasing the number of false negatives (someone not getting what could be purchaed within the budget).
  • Last year we had and an 12.25 % overspend on reindeers due to incorrect and missing chimney positions. This year the reliance on crowdsourced positions will be better balanced with utilizing open government property data where possible. The location data governance elves will consult with the elves living on the roof at each head of state in order make them release more and better quality of any such data (the Gangnam Project).

The World of Reference Data

Google EarthReference Data Management (RDM) is an evolving discipline within data management. When organizations mature in the reference data management realm we often see a shift from relying on internally defined reference data to relying on externally defined reference data. This is based on the good old saying of not to reinvent the wheel and also that externally defined reference data usually are better in fulfilling multiple purposes of use, where internally defined reference data tend to only cater for the most important purpose of use within your organization.

Then, what standard to use tend to be a matter of where in the world you are. Let’s look at three examples from the location domain, the party domain and the product domain.

Location reference data

If you read articles in English about reference data and ensuring accuracy and other data quality dimensions for location data you often meet remarks as “be sure to check validity against US Postal Services” or “make sure to check against the Royal Mail PAF File”. This is all great if all your addresses are in the United States or the United Kingdom. If all your addresses are in another country, there will in many cases be similar services for the given country. If your address are spread around the world, you have to look further.

There are some Data-as-a-Service offerings for international addresses out there. When it comes to have your own copy of location reference data the Universal Postal Union has an offering called the Universal POST*CODE® DataBase. You may also look into open data solutions as GeoNames.

Party reference data

Within party master data management for Business-to-Business (B2B) activities you want to classify your customers, prospects, suppliers and other business partners according to what they do, For that there are some frequently used coding systems in areas where I have been:

  • Standard Industrial Classification (SIC) codes, the four-digit numerical codes assigned by the U.S. government to business establishments.
  • The North American Industry Classification System (NAICS).
  • NACE (Nomenclature of Economic Activities), the European statistical classification of economic activities.

As important economic activities change over time, these systems change to reflect the real world. As an example, my Danish company registration has changed NACE code three times since 1998 while I have been doing the same thing.

This doesn’t make conversion services between these systems more easy.

Product reference data

There are also a good choice of standardized and standardised classification systems for product data out there. To name a few:

  • TheUnited Nations Standard Products and Services Code® (UNSPSC®), managed by GS1 US™ for the UN Development Programme (UNDP).
  • eCl@ss, who presents themselves as: “THE cross-industry product data standard for classification and clear description of products and services that has established itself as the only ISO/IEC compliant industry standard nationally and internationally”. eCl@ss has its main support in Germany (the home of the Mercedes E-Class).

In addition to cross-industry standards there are heaps of industry specific international, regional and national standards for product classification.

Bookmark and Share

Making a Firmographic Analysis

What demographics are to people, firmographics are to organizations.

I am currently working with starting up a Business-to-Business (B2B) service. In order to assess the market I had to know something about how many companies there are out there who possibly could be in need of such a service.

The service will work word-wide, but adhering to the sayings about thinking globally/big and starting locally/small I have started with assessing the Danish market. Also there are easy and none expensive access to business directories for Denmark.

My first filter was selecting companies with at least 50 employees.

As the service is suitable for companies within ecosystems of manufacturers, distributors and retailers, I selected the equivalent range of industry codes. In this case it was NACE codes which resembles SIC codes and other classifications of Line-Of-Business used in other geographies.

There were circa 2,500 companies in my selection. However, some belong to the same company family tree. By doing a merge/purge with the largest company in a company family tree as the survivor, the list was down to circa 2,000 companies.

For this particular service, there are some other possibly competing approaches that are stronger for some kinds of goods than other kinds of goods. For that purpose, I made a bespoke categorization being:

  • Priority A: Building materials, furniture, houseware, machinery and vehicles.
  • Priority B: Electronics, books and clothes.
  • Priority C: Pharmaceuticals, food, beverage and tobacco.

Retailers that span several priorities were placed in priority B. Else, for this high level analysis, I only used the primary Line-Of-Business.

The result was as shown below:

Firmographic

So, from my firmographic analysis I know the rough size of the target market in one locality. I can assume, that other markets look more or less the same or I can do specific firmographics on other geographies. Also, I can apply first results of dialogues with entities in the breakdown model and see if the model needs a modification.

Bookmark and Share

The Data Matching Institute is Here

Within data management we already have “The MDM Institute”, “The Data Governance Institute” and “The Data Warehouse Institute (TDWI)” and now we also have “The Data Matching Institute”.

TDMIThe founder of The Matching Institute is Alexandra Duplicado. Aleksandra says: “The reason I founded The Institute of Data Matching is that I am sick and tired of receiving duplicate letters with different spellings of my name and address”. Alex is also pleased about, that she now have found a nice office in edit distance of her home.

Before founding The Matching of Data Institute Alexander worked at the Universal Postal Union with responsibility for extra-terrestrial partners. When talking about the future of The Match Institute Sasha remarks: “It is a matter of not being too false positive. But it is a unique concept”.

One of the first activities for The Data-Matching Institute will be organizing a conference in Brussels. Many tool vendors such as Statistical Analysis System Inc., Dataflux and SAS Instiute will sponsor the Brüssel conference. I hope to join many record linkage friends in Bruxelles says Alexandre.

The Institute of Matching of Data also plans to offer a yearly report on the capabilities of the tool vendors. Asked about when that is going to happen Aleksander says: “Without being too deterministic a probabilistic release date is the next 1st of April”.

Bookmark and Share

The Multi-Domain Data Quality Tool Magic Quadrant 2014 is out

Gartner, the analyst firm, has a different view of the data quality tool market than of the Master Data Management (MDM) market. The MDM market has two qudrants (customer MDM and product MDM) as reported in the post The Second part of the Multi-Domain MDM Magic Quadrant is out. There is only one quadrant for data quality tools.

Well, actually it is difficult to see a quadrant for product data quality tools. Most data quality tools revolves around the customer (or rather party) domain, with data matching and postal address verification as main features.

For the party domain it makes sense to have these capabilities deployed outside the MDM solution in some cases as examined in the post The place for Data Matching in and around MDM. And of course data quality tools are used in heaps of organizations who hasn’t a MDM solution.

For the product domain it is hard to see a separate data quality tool if you have a Product Information Management (PIM) / Product MDM solution. Well, maybe if you are an Informatica fan. Here you may end up with a same branded PIM (Heiler), Product MDM (Siperian) and data quality tool (SSA Name3) environment as a consequence of the matters discussed in the post PIM,  Product MDM and Multi-Domain MDM.

What should a data quality tool do in the product domain then? Address verification would be exotic (and ultimately belongs to the location domain). Data matching is a use case, but not usually something that eliminates main pain points with product data.

Some issues that have been touched on this blog are:

Anyway the first vendor tweets about the data quality tools quadrant 2014 is turning up, and I guess some of the vendors will share the report for free soon.

Magic Quadrant for Data Quality Tools 2014

Update 3rd December: I received 3 emails from Trillium Software today with a link to the report here.

Bookmark and Share

The Place for Data Matching in and around MDM

Data matching has increasingly become a component of Master Data Management (MDM) solutions. This has mostly been the case for MDM of customer data solutions, but it is also a component of MDM of product data solutions not at least when these solutions are emerging into the multi-domain MDM space.

The deployment of data matching was discussed nearly 5 years ago in the post Deploying Data Matching.

Neural NetworkWhile MDM solutions since then have been picking up on the share of the data matching being done around it is still a fairly small proportion of data matching that is performed within MDM solutions. Even if you have a MDM solution with data matching capabilities, you might still consider where data matching should be done. Some considerations I have come across are:

Acquisition and silo consolidation circumstances

A common use case for data matching is as part of an acquisition or internal consolidation of data silos where two or more populations of party master data, product master data and other important entities are to be merged into a single version of truth (or trust) in terms of uniqueness, consistency and other data quality dimensions.

While the MDM hub must be the end goal for storing that truth (or trust) there may be good reasons for doing the data matching before the actual on-boarding of the master data.

These considerations includes

The point of entry

The MDM solution isn’t for many good reasons not always the system of entry. To do the data matching at the stage of data being put into the MDM hub may be too late. Expanding the data matching capabilities as Service Oriented Architecture component may be a better way as pondered in the post Service Oriented Data Quality.

Avoiding data matching

Even being a long time data matching practitioner I’m afraid I have to bring up the subject of avoiding data matching as further explained in the post The Good, The Better and The Best Way of Avoiding Duplicates.

Bookmark and Share

Putting Two Things in One Field

A very common data quality issue is when a field in a data record is populated with more than one piece of information.

Sometimes this is done as a work around, because we have a piece of information,  but we haven’t a field with that distinct purpose of use. Then we find a more or less related existing field where in we can squeeze this additional piece of information.

But we also have some very common cases where this bad habit is required by external business rules or wide spread tradition.

Legal formsLegal Form in Company Names

This example is examined in the post Legal Forms from Hell.

One should think that it is time for changing the bad (legal demanded) practice of mixing legal forms with company names and serve the original purpose in another more data quality friendly way.

An Address Line

An address line will typically hold a couple of elements as a street (thoroughfare) name, a house number and maybe some kind of unit identification.

By the way the order of street name and house number is opposite in approximately two equal parts of the world, with the exception of places where numbering within blocks between streets is the standard.

Education in Person Name

You can put professor in front of your name and even MBA – Master of Business Administration!! – after your name in the name field.

In the next few days I will put AFCM (Accidental Field Content Misuser) after my name.

Bookmark and Share

An Alternative Multi-Domain MDM Quadrant

No, this is not an(other) attempt to challenge Gartner, the analyst firm, in making quadrants about vendors in the Master Data Management (MDM) realm.

This an attempt to highlight some capabilities of Multi-Domain MDM solutions here focusing on party and product master data and the sell-side and the buy-side of MDM as discussed some years ago in the post Sell-side vs Buy-side Master Data Quality.

A simple quadrant will look like this:

Quadrant

  • The upper right corner is where MDM started, being with solutions back then called Customer Data Integration (CDI).
  • The Product Information Management (PIM) side is quite diverse and depending on the industry vertical where implemented:
    • Retailers and distributors have their challenges with sometimes high numbers of products that goes in and comes out as the same but with data reflecting different viewing points.
    • Manufacturers have other issues managing raw materials, semi-finished products, finish products and products and services used to facilitate the processes.
    • Everyone have supplies.
  • The supplier master data management has more or less also been part of the PIM space but looks more like customer master data and should be part of a party master data discipline also embracing other party roles as employee.

Also, this quadrant is by the way without other important domains as location (as discussed in the post Bringing the Location to Multi-Domain MDM) and asset (as discussed in the post Where is the Asset?)

Bookmark and Share