Adding 180 Degrees to MDM

Master Data Management (MDM) has traditionally been about being better at utilizing and sharing internal registrations about our customers, suppliers, products, assets and other core business entities.

My latest work around master data management revolves around the concept of bringing in external data sources in order to make on-boarding processes more efficient and provide more accurate, complete and timely master data.

So, it was good to see that this approach is gaining more traction when attending the MDM Summit Europe 2013.

The old stuff

Andy Walker of BP presented how BP has built the management of party master data around aligning with the D&B WorldBase for business-to-business (B2B) customer and vendor master data.

Knowing about with which actual legal entities you are doing business and which external hierarchies they belong to is crucial for BP both in daily operations and when it comes to reporting and analysis utilizing party master data.

Using business directories isn’t new at all; it has been around for ages and from what I have seen: It works when you do it properly and consistently.

The new stuff

Big data was a hot topic on the conference. As reported in a post from the first day embracing big data may lead to Double Trouble with Social MDM and Big Data.

Steve Jones TweetHowever, digging into big data and doing social MDM may certainly also provide new opportunities as we by utilizing these new sources actually may be able to obtain (or closing in at) a 360 degree view on various master data entity types. It is, as said and tweeted by Steve Jones of Capgemini, about looking outside-in.

Bookmark and Share

The Country List

It’s the second day of the MDM Summit Europe 2013 in London today.

The last session I attended today was an expert panel on Reference Data Management (RDM).

Country ListI guess the list of countries on this planet is the prime example of what is reference data and today’s session provided no exception from that.

Even though a list of countries is fairly small and there shouldn’t be everyday changes to the list, maintaining a country list isn’t as simple as you should think.

First of all official sources for a country list aren’t in agreement. The range of countries given an ISO code isn’t the same as the range of countries where for example the Universal Postal Union (UPU) says you can make a delivery.

Another example I have had some challenges with is that for example the D&B WorldBase (a large word-wide business directory) has four country codes for what is generally regarded as the United Kingdom, as the D&B country reference data probably is defined by a soccer fan recognizing the distinct national soccer teams from England, Wales, Scotland and Northern Ireland.

The expert panel moderator, Aaron Zornes, went as far as suggesting that a graph database maybe the best technology for reflecting the complexity in reference data. Oh yes, and in master data too you should think then, though I doubt that the relational database and hierarchy management will be out of fashion for a while.

Bookmark and Share

Double Trouble with Social MDM and Big Data

Yesterday was the first day at the MDM Summit Europe 2013 in London.

One of the workshops I attended was called Master Data Governance for Cloud/Social MDM/Big Data. The workshop was lead by Malcolm Chisholm, one of my favorite thought leaders within data management.

According to Malcolm Chisholm, and I totally agree with that, the rise of social networks and big data will have a tremendous impact on future MDM (Master Data Management) architecture. We are not going to see that these new opportunities and challenges will replace the old way of doing MDM. Integration of social data and other big data will add new elements to the existing component landscape around MDM solutions.

Like it or not, things are going to be more complicated than before.

We will have some different technologies and methodologies handling the old systems of record and the new systems of engagement at the same time, for example relational databases (as we know it today) for master data and columnar databases for big data.

Profiling results from analysis of big data will be added to the current identity resolution centric master data elements handled in current master data solutions. Furthermore, there will be new interfaces for social collaboration around master data maintenance on top of the current interfaces.

So, the question is if taking on the double trouble is worth it. Doing nothing, in this case sticking to small data, is always a popular option. But will the organizations choosing that path exist in the next decade? – or will they be outsmarted by newcomers?

MDM Summit Europe 2013

Bookmark and Share

Names, Addresses and National Identification Numbers

When working with customer, or rather party, master data management and related data quality improvement and prevention for traditional offline and some online purposes, you will most often deal with names, addresses and national identification numbers.

While this may be tough enough for domestic data, doing this for international data is a daunting task.

Names

In reality there should be no difference between dealing with domestic data and international data when it comes to names, as people in today’s globalized world move between countries and bring their names with them.

Traditionally the emphasize on data quality related to names has been on dealing with the most frequent issues be that heaps of nick names in the United States and other places, having a “van” in bulks of names in the Netherlands or having loads of surname like middle names in Denmark.

With company names there are some differences to be considered like the inclusion of legal forms in company names as told in the post Legal Forms from Hell.

UPU S42Addresses

Address formats varies between countries. That’s one thing.

The availability of public sources for address reference data varies too. These variations are related to for example:

  • Coverage: Is every part of the country included?
  • Depth: Is it street level, house number level or unit level?
  • Costs: Are reference data expensive or free of charge?

As told in the post Postal Code Musings the postal code system in a given country may be the key (or not) to how to deal with addresses and related data quality.

National Identification Numbers

The post called Business Entity Identifiers includes how countries have different implementations of either all-purpose national identification numbers or single-purpose national identification numbers for companies.

The same way there are different administrative practices for individuals, for example:

  • As I understand it is forbidden by constitution down under to have all-purpose identification numbers for individuals.
  • The United States Social Security Number (SSN) is often mentioned in articles about party data management. It’s an example of a single-purpose number in fact used for several purposes.
  • In Scandinavian countries all-purpose national identification numbers are in place as explained in the post Citizen ID within seconds.

Dealing with diversity

Managing party master data in the light of the above mentioned differences around the world isn’t simple. You need comprehensive data governance policies and business rules, you need elaborate data models and you need a quite well equipped toolbox regarding data quality prevention and exploiting external reference data.

Bookmark and Share

How important is big data quality?

Along with the rise of big data the question about quality of big data and the importance of taking data quality into consideration when analyzing big data is raised again and again.

We had a poll in the LinkedIn Big Data Quality group. The results are as shown below:

Big Data Important

So, some people consider data quality to be more important for big data than for small data (the data we have analyzed until the rise of big data), some people consider data quality to be less important with big data, but the majority of people who voted (included yours truly), consider the quality of big data to be equally important as it has been with small data.

As expressed in some comments voting “the same” is often an aggregate of some things that are more important and other things that are less important.

Also some people have voted “mu”  (wrong question) and in the comments explained that you really can’t compare small data with big data.

A repeated sentiment in the comments is that data quality for small data is going to be more important with the rise of big data as examined in the post Small Data with Big Impact.

Bookmark and Share

Crap, Damned Crap, and Big Data

Lately Jim Harris made a thought provoking post on the Mike2 blog. The post is called A Contrarian’s View of Unstructured Data.

Herein Jim wrote:

“My contrarian’s view of unstructured data is that it is, in large part, gigabytes of gossip and yottabytes of yada yada digitized, rumors and hearsay amplified by the illusion-of-truth effect and succumbing to the perception-is-reality effect until the noise amplifies so much that its static solidifies into a signal.”

Indeed, the sound of social data may be like that. Yesterday I wrote a post called Keep It Real, Stupid. Herein I mentioned an apparently fake quote by Albert Einstein saying:

“If you can’t explain it simply, you don’t understand it well enough”.

Today I tried to see how the fake quote was doing on Twitter.

OMG: Going on more than one tweet per minute along with some mutations of the quote saying:

“If you can’t explain it to a six-year-old, you don’t understand it yourself”.

“You do not really understand something unless you can explain it to your grandmother”.

OK folks: Sense-making of social data is not going to be simple. Not even relatively simple.

Simply Einstein Tweets

Simply Einstein Tweet 2

Right:

Simply Einstein Tweet 3

Bookmark and Share

Keep It Real, Stupid

One of my pet peeves is the KISS principle: Keep It Simple, Stupid.

Don’t get me wrong: It’s worth striving for simplicity wherever possible. But some problems are not simple and have simple solutions. Sometimes KISS is the shortcut to getting it all wrong.

Another take on simplicity is a quote floating around in social media these days:

Simply Einstein

Oh, so Einstein said that. So you can’t argue with that.

Well, he probably didn’t as Wikiquote reports:

Simply Not Einstein

So let’s stick to a real Einstein quote:

“Everything should be as simple as it can be, but not simpler”

A great quote related to data quality and master data management by the way.

Bookmark and Share

Multi-Channel Data Matching

Most data matching activities going on are related to matching customer, other rather party, master data.

In today’s business world we see data matching related to party master data in those three different channels types:

  • Offline is the good old channel type where we have the mother of all business cases for data matching being avoiding unnecessary costs by sending the same material with the postman twice (or more) to the same recipient.
  • Online has been around for some time. While the cost of sending the same digital message to the same recipient may not be a big problem, there are still some other factors to be considered, like:
    • Duplicate digital messages to the same recipient looks like spam (even if the recipient provided different eMail addresses him/her self).
    • You can’t measure a true response rate
  • Social is the new channel type for data matching. Most business cases for data matching related to social network profiles are probably based on multi-channel issues.

Multi-channel data matchingThe concept of having a single customer view, or rather single party view, involves matching identities over offline, online and social channels, and typical elements used for data matching are not entirely the same for those channels as seen in the figure to the right.

Most data matching procedures are in my experience quite simple with only a few data elements and no history track taking into considering. However we do see more sophisticated data matching environments often referred to as identity resolution, where we have historical data, more data elements and even unstructured data taking into consideration.

When doing multi-channel data matching you can’t avoid going from the popular simple data matching environments to more identity resolution like environments.

Some advices for getting it right without too much complication are:

  • Emphasize on data capturing by getting it right the first time. It helps a lot.
  • Get your data models right. Here reflecting the real world helps a lot.
  • Don’t reinvent the wheel. There are services for this out here. They help a lot.

Read more about such a service in the post instant Single Customer View.

Bookmark and Share

Big Data and Data Matching

Data matching has been an established discipline for many years and most data quality tools have more or less sophisticated features for data matching as well as many MDM (Master Data Management) platforms have data matching capabilities.

BigDataQuality
The LinkedIn Big Data Quality group

In a way the data matching realm has become slightly dull the recent years. People don’t get excited anymore over a discussion about if deterministic matching or probabilistic matching is the right way.  Soundex is old, edit distance has been around for ages and matchcodes may have outlived themselves.

So, it’s good to see a new beast turning up. Data matching with big data.

It may be about deduplicating (deduping) volumes that is bigger than traditional data matching can handle. You know: Dedoop’ing.

But it is also very much about matching big data with small data, first and foremost master data. And having well matched master data. Kimmo Kontra wrote a good post about that recently. The post is called Big Grease, Big Data, and Big Apple – manholes and MDM.

The case presented by Kimmo holds many exciting implementations of data matching like for example proximity matching of locations.

Bookmark and Share