On MDM, Data Models and Big Data

As described in the post Small Data with Big Impact my guess is that we will see Master Data Management solutions as a core element in having data architectures that are able to make sustainable results from dealing with big data.

If we look at party master data a serious problem with many ERP and CRM systems around is that the data model for party master data aren’t good enough for dealing with the many different forms and differences in which the parties we hold data about are represented in big data sources which makes the linking between traditional systems of record and big data very hard.

Having a Master Data Management (MDM) solution with a comprehensive data model for party master data is essential here.

Some of the capabilities we need are:

Storing multiple occurrences of attributes

People and companies have many phone numbers, they have many eMail addresses and they have many social identities and you will for sure meet these different occurrences in big data sources. Relating these different occurrences to the same real world entity is essential as reported in the post 180 Degree Prospective Customer View isn’t Unusual.

An MDM hub with a corresponding data model is the place to manage that challenge in one place.

Exploiting rich external reference data

As told in the post Where the Streets have Two Names and emphasized in the comments to the post the real world has plenty of examples of the same thing having many names. And this real world will be reflected in big data sources.

Your MDM solution should embrace external reference data solving these issues.

Handling the time dimension

In the post A Place in Time the flaws of the usual customer table in ERP and CRM systems is examined. One common issue is handling when attributes changes. Change of address happens a lot. And this may be complicated by that we may operate several address types at the same time like visiting addresses, billing addresses and correspondence addresses. These different addresses will also pop up in big data sources. And the same goes for other attributes.

You must get that right in your MDM implementation.

Customer Table
The usual but very wrong customer table that wont work with big data.

Bookmark and Share

Reaching the Cloud with MDM

As reported in the post The MDM Landscape is Slowly Changing a saying from the Information Difference MDM Landscape 2013 is:

  • “The market is starting to dabble in cloud-based implementations…”

I have spent some part of the last months with a cloud-based Master Data Management implementation in this case using the iDQ™ MDM Edition.

Well, actually it isn’t a full cloud implementation. There is a frontend taking care of user interaction in the cloud and there is a backend taking care of integration on-premise.

I guess many other MDM implementations embracing cloud technology will look like this solution being a hybrid, where some services are based in the cloud and some services are based on-premise.

What about your MDM implementation(s). Is it cloud-based, based on-premise or hybrid?

Hohenzollern Castle in Southern Germany

Bookmark and Share

On Washing Rental Cars and Shared Data

Recently a tweet from Doug Laney of Gartner has been retweeted a lot:

Rented Car

As most analogies it may fit or maybe not fit seen in different perspectives. Actually rental cars are probably some of the most washed cars as the rental company wash and clean the car between every rental.

In the same way as rental cars usually are quite clean I have also found that sharing data is a powerful way to have clean data as told on the page about Data Quality 3.0. This is also the grounding concept behind the instant Data Quality solution I’m working with, where we have just released our iDQ™ MDM Edition.

Bookmark and Share

Where the Streets have one Name but Two Spellings

Last week’s post called Where The Streets have Two Names caught a lot of comments both on this blog and in LinkedIn groups as here on Data Quality Professionals and on The Data Quality Association, with a lot of examples from around the world on how this challenge actually exist more or less everywhere.

Recently I had the pleasure of experiencing a variant of the challenge when driving around in a rented car in the Saint Petersburg area in Russia. Here the streets usually only have one name but that may be presented in two different alphabets being the local Cyrillic or the Latin alphabet I’m used to which also was included in the reference data on the Sat Nav. So while it was nice for me to type destinations in Latin letters it was nice to have directions in Cyrillic in order to follow the progress on road signs.

So here standardization (or standardisation) to one preferred language, alphabet or script system isn’t the best solution. Best of breed solutions for handling addresses must be able to handle several right spellings for the same address.

Nevsky_Prospekt,_St_Petersburg,_street_sign
Street sign in Cyrillic with Latin subtitle

Bookmark and Share

Real World Alignment and Continental Drift

You can find many great analogies for working with data quality and Master Data Management (MDM) in world maps. One example is reported in the post The Greenland Problem in MDM, which is about how different business units have a different look on the same real world entity.

Real world alignment isn’t of course without challenges. Also because the real world changes as reported on Daily Mail in an article about how modern countries would be placed on the landmasses as they were 300 million years ago.

World 300 M years ago

The image above may very well show how many master data repositories today reflect the real world. Yep, we may have the country list covered well enough. We may even do quite well if we look at each geographical unit independently. However, the big picture doesn’t fit the world as it is today.

Bookmark and Share

Last Time Right

The ”First Time Right” principle is a good principle for data quality and indeed getting data right the first time is a fundamental concept in the instant Data Quality service I’m working with these days.

However, some blog posts in the data quality realm this week has pointed out that there is a life, and sometime an end of life, after data has hopefully been captured right the first time.

In the post From Cable to Grave by Guy Mucklow on the Postcode Anywhere blog the bad consequences of a case of chasing debt from a customer not among us anymore is examined.

Asset in, Garbage Out: Measuring data degradation is the title of a post by Rob Karel on Informatica Perspectives. Herein Rob goes through all the dangers data may encounter after being entered right the first time.

timingSome years ago I touched the subject in the post Ongoing Data Maintenance. As told here I’m convinced, after having seeing it work, that a good approach to also getting it right the last time is to capture data in a way that makes data maintainable.

Some techniques for doing this are:

  • Where possible collect external identifiers
  • Atomize data instead of squeezing several different elements into one attribute
  • Make the data model reflect the real world

And oh, it’s not the first time, neither the last time, I will touch this subject. It needs constant attention.

Bookmark and Share

The World of Measuring

A common data quality issue in data management is the use of different measuring systems. Let’s have a look at some of the issues.

Mile or Kilometer, Pound or Kilogram

There is the imperial system with units as a mile and a pound. And there is the metric system with units as meter and gram.

According to Wikipedia the metric system, though there are nuances in world-wide use, is used all over except in notably the United States.

Metric Penetratiion

Celsius or Fahrenheit

For temperature scale we have the Celsius scale used all over and the Fahrenheit scale in the United States.

Big-endian, Little-endian or Middle-endian

When expressing a date we have the ISO standard as a big-endian format like today is 2013-04-27. But all over the world a little-endian format like today is 27-04-2013 is used except in the United States (and all the social networks coming from there) where today is expressed in a middle-endian format being 04-27-2013.

Bookmark and Share

Names, Addresses and National Identification Numbers

When working with customer, or rather party, master data management and related data quality improvement and prevention for traditional offline and some online purposes, you will most often deal with names, addresses and national identification numbers.

While this may be tough enough for domestic data, doing this for international data is a daunting task.

Names

In reality there should be no difference between dealing with domestic data and international data when it comes to names, as people in today’s globalized world move between countries and bring their names with them.

Traditionally the emphasize on data quality related to names has been on dealing with the most frequent issues be that heaps of nick names in the United States and other places, having a “van” in bulks of names in the Netherlands or having loads of surname like middle names in Denmark.

With company names there are some differences to be considered like the inclusion of legal forms in company names as told in the post Legal Forms from Hell.

UPU S42Addresses

Address formats varies between countries. That’s one thing.

The availability of public sources for address reference data varies too. These variations are related to for example:

  • Coverage: Is every part of the country included?
  • Depth: Is it street level, house number level or unit level?
  • Costs: Are reference data expensive or free of charge?

As told in the post Postal Code Musings the postal code system in a given country may be the key (or not) to how to deal with addresses and related data quality.

National Identification Numbers

The post called Business Entity Identifiers includes how countries have different implementations of either all-purpose national identification numbers or single-purpose national identification numbers for companies.

The same way there are different administrative practices for individuals, for example:

  • As I understand it is forbidden by constitution down under to have all-purpose identification numbers for individuals.
  • The United States Social Security Number (SSN) is often mentioned in articles about party data management. It’s an example of a single-purpose number in fact used for several purposes.
  • In Scandinavian countries all-purpose national identification numbers are in place as explained in the post Citizen ID within seconds.

Dealing with diversity

Managing party master data in the light of the above mentioned differences around the world isn’t simple. You need comprehensive data governance policies and business rules, you need elaborate data models and you need a quite well equipped toolbox regarding data quality prevention and exploiting external reference data.

Bookmark and Share

What’s so special about your party master data?

My last blog post was called Is Managing Master Data a Differentiating Capability? The post is an introduction to a conference session being a case story about managing master data at Philips.

During my years working with data quality and master data management it has always struck me how different organizations are managing the party master data domain while in fact the issues are almost the same everywhere.

business partnersFirst of all party master data are describing real world entities being the same to everyone. Everyone is gathering data about the same individuals and the same companies being on the same addresses and having the same digital identities. The real world also comes in hierarchies as households, company families and contacts belonging to companies which are the same to everyone. We may call that the external hierarchy.

Based on that everyone has some kind of demand for intended duplicates as a given individual or company may have several accounts for specific purposes and roles. We may call that the internal hierarchy.

A party master data solution will optimally reflect the internal hierarchy while most of the business processes around are supported by CRM-systems, ERP-systems and special solutions for each industry.

Fulfilling reflecting the external hierarchy will be the same to everyone and there is no need for anyone to reinvent the wheel here. There are already plenty of data models, data services and data sources out there.

Right now I’m working on a service called instant Data Quality that is capable of embracing and mashing up external reference data sources for addresses, properties, companies and individuals from all over the world.

The iDQ™ service already fits in at several places as told in the post instant Data Quality and Business Value. I bet it fits your party master data too.

Bookmark and Share