Will Graph Databases become Common in MDM?

One of my pet peeves in data quality for CRM and ERP systems is the often used way at looking at entities, not at least party entities, in a flat data model as told in the post A Place in Time.

Party master data, and related location master data, will eventually be modeled in very complex models and surely we see more and more examples of that. For example I remember that I long time ago worked with the ERP system that later became Microsoft Dynamics AX.  Then I had issues with the simplistic and not role aware data model. While I’m currently working in a project using the AX 2012 Address Book it’s good to see that things have certainly developed.

This blog has quite a few posts on hierarchy management in Master Data Management (MDM) and even Hierarchical Data Matching. But I have to admit that even complex relational data models and hierarchical approaches in fact don’t align completely with the real world.

In a comment to the post Five Flavors of Big Data Mike Ferguson asked about graph data quality. In my eyes using graph databases in master data management will indeed bring us closer to the real world and thereby deliver a better data quality for master data.

I remember at this year’s MDM Summit Europe that Aaron Zornes suggested that a graph database will be the best choice for reflecting the most basic reference dataset being The Country List. Oh yes, and in master data too you should think then, though I doubt that the relational database and hierarchy management will be out of fashion for a while.

So it could be good to know if you have seen or worked with graph databases in master data management beyond representing a static analysis result as a graph database.

GraphDatabase_PropertyGraph
Wikiopedia article on graph database

Bookmark and Share

The Postal Address Hierarchy

Using postal addresses is a core element in many data quality improvement and master data management (MDM) activities.

HierarchyAs touched many times on this blog postal addresses are formatted very differently around the world. However they may all be arranged in a sort of hierarchy, where there are up to 6 general levels being:

  • Country
  • Region
  • City or district
  • Thoroughfare (street) or block
  • Building number
  • Unit within building

In addition to that the postal code (postcode or zip code) is part of many address formats. Seen in the hierarchical light the postal code is a tricky concept as it may identify a city, district, thoroughfare, a single building or even a given unit within or section of a building. The latter is true for my company address in the United Kingdom, where we have a very granular postcode system.

Country

As discussed in the post The Country List even the top level of a postal address hierarchy isn’t a simple list fit for every purpose. Some issues are:

  • There are different sources with different perceptions of which are the countries on this planet
  • What we regard as countries comes in hierarchies
  • Several coding systems are available

Region

The region is an element in some address formats like the states in the United States and the provinces in Canada, while other countries like Germany that is divided into quite independent Länder do not have the region as a required part of the postal address. The same goes for Swiss cantons.

City or district

I once read that if you used the label city in a web form in Australia, you would get a lot of values like: “I do not live in a city”.

Anyway this level is often (but as mentioned certainly not always) where the postal code is applied. The postal code district may be a single town with surroundings, several villages or a district within a big city.

Thoroughfare (street) or block

Most countries use thoroughfares as streets, roads, lanes, avenues, mews, boulevards and whatever they are called around. Beware that the same street may have several spellings and even several names.

Japan is a counterexample of the use of thoroughfares, as here it’s the blocks between the thoroughfares that are part of the postal address.

Building number

Usually this element will be an integer. However formats with a letter behind the integer (example: 21 A) or a range of integers (example: 21-23) are most annoying. And then this British classic: One Main Grove. OMG.

Unit within a building

This element may or may not be present in a postal address depending on if the building is a single family house or company site, the postal delivery sees it as such or you may actually indicate where within the building the delivery goes or you go. The ups and downs of this level are examined in the post A Universal Challenge.

Bookmark and Share

On MDM, Data Models and Big Data

As described in the post Small Data with Big Impact my guess is that we will see Master Data Management solutions as a core element in having data architectures that are able to make sustainable results from dealing with big data.

If we look at party master data a serious problem with many ERP and CRM systems around is that the data model for party master data aren’t good enough for dealing with the many different forms and differences in which the parties we hold data about are represented in big data sources which makes the linking between traditional systems of record and big data very hard.

Having a Master Data Management (MDM) solution with a comprehensive data model for party master data is essential here.

Some of the capabilities we need are:

Storing multiple occurrences of attributes

People and companies have many phone numbers, they have many eMail addresses and they have many social identities and you will for sure meet these different occurrences in big data sources. Relating these different occurrences to the same real world entity is essential as reported in the post 180 Degree Prospective Customer View isn’t Unusual.

An MDM hub with a corresponding data model is the place to manage that challenge in one place.

Exploiting rich external reference data

As told in the post Where the Streets have Two Names and emphasized in the comments to the post the real world has plenty of examples of the same thing having many names. And this real world will be reflected in big data sources.

Your MDM solution should embrace external reference data solving these issues.

Handling the time dimension

In the post A Place in Time the flaws of the usual customer table in ERP and CRM systems is examined. One common issue is handling when attributes changes. Change of address happens a lot. And this may be complicated by that we may operate several address types at the same time like visiting addresses, billing addresses and correspondence addresses. These different addresses will also pop up in big data sources. And the same goes for other attributes.

You must get that right in your MDM implementation.

Customer Table
The usual but very wrong customer table that wont work with big data.

Bookmark and Share

Where the Streets have one Name but Two Spellings

Last week’s post called Where The Streets have Two Names caught a lot of comments both on this blog and in LinkedIn groups as here on Data Quality Professionals and on The Data Quality Association, with a lot of examples from around the world on how this challenge actually exist more or less everywhere.

Recently I had the pleasure of experiencing a variant of the challenge when driving around in a rented car in the Saint Petersburg area in Russia. Here the streets usually only have one name but that may be presented in two different alphabets being the local Cyrillic or the Latin alphabet I’m used to which also was included in the reference data on the Sat Nav. So while it was nice for me to type destinations in Latin letters it was nice to have directions in Cyrillic in order to follow the progress on road signs.

So here standardization (or standardisation) to one preferred language, alphabet or script system isn’t the best solution. Best of breed solutions for handling addresses must be able to handle several right spellings for the same address.

Nevsky_Prospekt,_St_Petersburg,_street_sign
Street sign in Cyrillic with Latin subtitle

Bookmark and Share

Call me on Phone, Mobile or Skype

When calling people in order to have a long distance conversation there are three main ways today:

  • The landline phone, which have been around since the 19th century and penetrated most homes and businesses in the last century
  • The mobile phone, which came around in the 70’s and spread rapidly in the 90’s
  • Skype, a voice over internet service that grew in the 00’s

Using these services involves and identifier which may be stored in customer tables and other party master data repositories with some implications for data management and identity resolution:

TelephoneThe Landline Phone Number

The landline phone number is a very common attribute in databases around and is often used as the main identifier of a customer in ERP and CRM solutions around.

Using a landline phone number for identity resolution has some challenges, including:

  • As with most attributes they may change. Depending on the country in question they may change during relocation and most phone number systems gets and upgrade over the years.
  • In business-to-business (B2B) a company typically has more than one phone number.
  • In business-to-consumer (B2C) the landline phone number merely belongs to a household rather than a single individual. That may be good or not good depending on purpose of use.

The Mobile Phone Number

Mobile phone numbers also piles up in databases around. In relation to identity resolution there are issues with mobile phone numbers, namely:

  • They change a lot.
  • It’s not always clear to who a number actually belongs:
    • A company paid phone may be used for both business and pleasure and may be transferred to another individual
    • In a household a person may be registered for a range of mobile phones used by individual members of the household including children

The Skype ID

I seldom see databases with Skype ID’s. In my experience Skype ID aren’t used a lot in internal master data. They reside in Skype and social network profiles like for example LinkedIn.

A final rant

Today I hardly ever use a landline phone, I use my mobile once in a while and I use Skype a lot. Not because it’s convenient, but because the telecom companies has decided to charge international mobile calls in ways so greedy that it make Somali sea pirates look like honest business men.

Bookmark and Share

Names, Addresses and National Identification Numbers

When working with customer, or rather party, master data management and related data quality improvement and prevention for traditional offline and some online purposes, you will most often deal with names, addresses and national identification numbers.

While this may be tough enough for domestic data, doing this for international data is a daunting task.

Names

In reality there should be no difference between dealing with domestic data and international data when it comes to names, as people in today’s globalized world move between countries and bring their names with them.

Traditionally the emphasize on data quality related to names has been on dealing with the most frequent issues be that heaps of nick names in the United States and other places, having a “van” in bulks of names in the Netherlands or having loads of surname like middle names in Denmark.

With company names there are some differences to be considered like the inclusion of legal forms in company names as told in the post Legal Forms from Hell.

UPU S42Addresses

Address formats varies between countries. That’s one thing.

The availability of public sources for address reference data varies too. These variations are related to for example:

  • Coverage: Is every part of the country included?
  • Depth: Is it street level, house number level or unit level?
  • Costs: Are reference data expensive or free of charge?

As told in the post Postal Code Musings the postal code system in a given country may be the key (or not) to how to deal with addresses and related data quality.

National Identification Numbers

The post called Business Entity Identifiers includes how countries have different implementations of either all-purpose national identification numbers or single-purpose national identification numbers for companies.

The same way there are different administrative practices for individuals, for example:

  • As I understand it is forbidden by constitution down under to have all-purpose identification numbers for individuals.
  • The United States Social Security Number (SSN) is often mentioned in articles about party data management. It’s an example of a single-purpose number in fact used for several purposes.
  • In Scandinavian countries all-purpose national identification numbers are in place as explained in the post Citizen ID within seconds.

Dealing with diversity

Managing party master data in the light of the above mentioned differences around the world isn’t simple. You need comprehensive data governance policies and business rules, you need elaborate data models and you need a quite well equipped toolbox regarding data quality prevention and exploiting external reference data.

Bookmark and Share

Multi-Channel Data Matching

Most data matching activities going on are related to matching customer, other rather party, master data.

In today’s business world we see data matching related to party master data in those three different channels types:

  • Offline is the good old channel type where we have the mother of all business cases for data matching being avoiding unnecessary costs by sending the same material with the postman twice (or more) to the same recipient.
  • Online has been around for some time. While the cost of sending the same digital message to the same recipient may not be a big problem, there are still some other factors to be considered, like:
    • Duplicate digital messages to the same recipient looks like spam (even if the recipient provided different eMail addresses him/her self).
    • You can’t measure a true response rate
  • Social is the new channel type for data matching. Most business cases for data matching related to social network profiles are probably based on multi-channel issues.

Multi-channel data matchingThe concept of having a single customer view, or rather single party view, involves matching identities over offline, online and social channels, and typical elements used for data matching are not entirely the same for those channels as seen in the figure to the right.

Most data matching procedures are in my experience quite simple with only a few data elements and no history track taking into considering. However we do see more sophisticated data matching environments often referred to as identity resolution, where we have historical data, more data elements and even unstructured data taking into consideration.

When doing multi-channel data matching you can’t avoid going from the popular simple data matching environments to more identity resolution like environments.

Some advices for getting it right without too much complication are:

  • Emphasize on data capturing by getting it right the first time. It helps a lot.
  • Get your data models right. Here reflecting the real world helps a lot.
  • Don’t reinvent the wheel. There are services for this out here. They help a lot.

Read more about such a service in the post instant Single Customer View.

Bookmark and Share

Putting it Right

Data Governance (DG), Reference Data Management (RDM) and Management Data Management (MDM) are closely related disciplines.

MDM DG RDMConsequently the Data Governance Conference Europe 2013 and the Master Data Management Summit Europe 2013 are co-located and a hot topic this year is Reference Data Management.

The difficulties in putting the sessions on the conference in one right place may be seen by that the session called Establishing Reference Data Governance in the Large Enterprise is part of a MDM track, but is actually mostly about data governance. The session is labeled Product MDM & Reference Data, but will be about governing reference data for multi-domain MDM and the data governance program described was in fact based on a party master data challenge involving reference data for industry classification.

In the session Petter Larsen, Head of Data Governance at Norway’s largest financial services group called DNB, and Thomas T. Thykjaer, Lead MDM Consultant at Capgemini, will connect the dots in the landscape of business vocabularies, data models, the data governance toolbox, data domains and reference data architecture.

I for sure look forward to that Petter and Thomas will put it right.

Bookmark and Share