The Country List

It’s the second day of the MDM Summit Europe 2013 in London today.

The last session I attended today was an expert panel on Reference Data Management (RDM).

Country ListI guess the list of countries on this planet is the prime example of what is reference data and today’s session provided no exception from that.

Even though a list of countries is fairly small and there shouldn’t be everyday changes to the list, maintaining a country list isn’t as simple as you should think.

First of all official sources for a country list aren’t in agreement. The range of countries given an ISO code isn’t the same as the range of countries where for example the Universal Postal Union (UPU) says you can make a delivery.

Another example I have had some challenges with is that for example the D&B WorldBase (a large word-wide business directory) has four country codes for what is generally regarded as the United Kingdom, as the D&B country reference data probably is defined by a soccer fan recognizing the distinct national soccer teams from England, Wales, Scotland and Northern Ireland.

The expert panel moderator, Aaron Zornes, went as far as suggesting that a graph database maybe the best technology for reflecting the complexity in reference data. Oh yes, and in master data too you should think then, though I doubt that the relational database and hierarchy management will be out of fashion for a while.

Bookmark and Share

Double Trouble with Social MDM and Big Data

Yesterday was the first day at the MDM Summit Europe 2013 in London.

One of the workshops I attended was called Master Data Governance for Cloud/Social MDM/Big Data. The workshop was lead by Malcolm Chisholm, one of my favorite thought leaders within data management.

According to Malcolm Chisholm, and I totally agree with that, the rise of social networks and big data will have a tremendous impact on future MDM (Master Data Management) architecture. We are not going to see that these new opportunities and challenges will replace the old way of doing MDM. Integration of social data and other big data will add new elements to the existing component landscape around MDM solutions.

Like it or not, things are going to be more complicated than before.

We will have some different technologies and methodologies handling the old systems of record and the new systems of engagement at the same time, for example relational databases (as we know it today) for master data and columnar databases for big data.

Profiling results from analysis of big data will be added to the current identity resolution centric master data elements handled in current master data solutions. Furthermore, there will be new interfaces for social collaboration around master data maintenance on top of the current interfaces.

So, the question is if taking on the double trouble is worth it. Doing nothing, in this case sticking to small data, is always a popular option. But will the organizations choosing that path exist in the next decade? – or will they be outsmarted by newcomers?

MDM Summit Europe 2013

Bookmark and Share

Names, Addresses and National Identification Numbers

When working with customer, or rather party, master data management and related data quality improvement and prevention for traditional offline and some online purposes, you will most often deal with names, addresses and national identification numbers.

While this may be tough enough for domestic data, doing this for international data is a daunting task.

Names

In reality there should be no difference between dealing with domestic data and international data when it comes to names, as people in today’s globalized world move between countries and bring their names with them.

Traditionally the emphasize on data quality related to names has been on dealing with the most frequent issues be that heaps of nick names in the United States and other places, having a “van” in bulks of names in the Netherlands or having loads of surname like middle names in Denmark.

With company names there are some differences to be considered like the inclusion of legal forms in company names as told in the post Legal Forms from Hell.

UPU S42Addresses

Address formats varies between countries. That’s one thing.

The availability of public sources for address reference data varies too. These variations are related to for example:

  • Coverage: Is every part of the country included?
  • Depth: Is it street level, house number level or unit level?
  • Costs: Are reference data expensive or free of charge?

As told in the post Postal Code Musings the postal code system in a given country may be the key (or not) to how to deal with addresses and related data quality.

National Identification Numbers

The post called Business Entity Identifiers includes how countries have different implementations of either all-purpose national identification numbers or single-purpose national identification numbers for companies.

The same way there are different administrative practices for individuals, for example:

  • As I understand it is forbidden by constitution down under to have all-purpose identification numbers for individuals.
  • The United States Social Security Number (SSN) is often mentioned in articles about party data management. It’s an example of a single-purpose number in fact used for several purposes.
  • In Scandinavian countries all-purpose national identification numbers are in place as explained in the post Citizen ID within seconds.

Dealing with diversity

Managing party master data in the light of the above mentioned differences around the world isn’t simple. You need comprehensive data governance policies and business rules, you need elaborate data models and you need a quite well equipped toolbox regarding data quality prevention and exploiting external reference data.

Bookmark and Share

Multi-Channel Data Matching

Most data matching activities going on are related to matching customer, other rather party, master data.

In today’s business world we see data matching related to party master data in those three different channels types:

  • Offline is the good old channel type where we have the mother of all business cases for data matching being avoiding unnecessary costs by sending the same material with the postman twice (or more) to the same recipient.
  • Online has been around for some time. While the cost of sending the same digital message to the same recipient may not be a big problem, there are still some other factors to be considered, like:
    • Duplicate digital messages to the same recipient looks like spam (even if the recipient provided different eMail addresses him/her self).
    • You can’t measure a true response rate
  • Social is the new channel type for data matching. Most business cases for data matching related to social network profiles are probably based on multi-channel issues.

Multi-channel data matchingThe concept of having a single customer view, or rather single party view, involves matching identities over offline, online and social channels, and typical elements used for data matching are not entirely the same for those channels as seen in the figure to the right.

Most data matching procedures are in my experience quite simple with only a few data elements and no history track taking into considering. However we do see more sophisticated data matching environments often referred to as identity resolution, where we have historical data, more data elements and even unstructured data taking into consideration.

When doing multi-channel data matching you can’t avoid going from the popular simple data matching environments to more identity resolution like environments.

Some advices for getting it right without too much complication are:

  • Emphasize on data capturing by getting it right the first time. It helps a lot.
  • Get your data models right. Here reflecting the real world helps a lot.
  • Don’t reinvent the wheel. There are services for this out here. They help a lot.

Read more about such a service in the post instant Single Customer View.

Bookmark and Share

Big Data and Data Matching

Data matching has been an established discipline for many years and most data quality tools have more or less sophisticated features for data matching as well as many MDM (Master Data Management) platforms have data matching capabilities.

BigDataQuality
The LinkedIn Big Data Quality group

In a way the data matching realm has become slightly dull the recent years. People don’t get excited anymore over a discussion about if deterministic matching or probabilistic matching is the right way.  Soundex is old, edit distance has been around for ages and matchcodes may have outlived themselves.

So, it’s good to see a new beast turning up. Data matching with big data.

It may be about deduplicating (deduping) volumes that is bigger than traditional data matching can handle. You know: Dedoop’ing.

But it is also very much about matching big data with small data, first and foremost master data. And having well matched master data. Kimmo Kontra wrote a good post about that recently. The post is called Big Grease, Big Data, and Big Apple – manholes and MDM.

The case presented by Kimmo holds many exciting implementations of data matching like for example proximity matching of locations.

Bookmark and Share

Why You shouldn’t go to the MDM Summit Europe 2013

The weather in London has been awful this March. The forecast for the first week of April doesn’t meet historical standards either. The MDM Summit Europe 2013 will be in London 15th to 17th April. You shouldn’t go there because of the weather based on the trend in the weather forecast:

London Forecast April 2013

On the other hand, it could heat up indoor.

There are quite a lot of exciting sessions, including the ones about:

And hey, it has happened before that the weather has suddenly improved.

Bookmark and Share

Small Data with Big Impact

In an ongoing discussion on LinkedIn there are some good points on: How important is data quality for big data compared to data quality for small data?

A repeated sentiment in the comments is that data quality for small data is going to be more important with the rise of big data.

The small data we are talking about here is first and foremost master data.

Master Data Challenges with Big Data

As with traditional transaction data master data is also describing the who, what, where and when of big data.

If we are having issues with completeness, timeliness and uniqueness in our master data any prediction based on big data matched with master data is going to be as chaotic as weather forecasts.

big small dataWe also need to expand the range of entities embraced by our master data management implementations as exemplified in the post Social MDM and Future Competitive Intelligence.

Matching Big Data with Master Data

Some of the issues in matching big data with master data I have stumbled upon are:

  • Who: How do we link the real world entities reflected in our traditional systems of record with the real world entities behind who’s talking in systems of engagement? This question was touched in post Making Sense with Social MDM.
  • What: How do we manage our product hierarchies and product descriptions so they fulfill both (different) internal purposes and external usage? More on this in the post Social PIM.
  • Where: How do we identify a given place? If you think this is easy, why not read the post Where is the Spot?
  • When: Date and time comes in many formats and relating events to the wrong schedule may have us  Going in the Wrong Direction.

How: You may for example follow this blog. Subscription is in the upper right corner 🙂

Bookmark and Share

Sharing is the Future of MDM

Over at the DataRoundtable blog Dylan Jones recently posted an excellent piece called The Future of MDM?

Herein Dylan examines how a lot of people in different organizations spend a lot of time on trying to get complete, timely and unique data about customers and other business partners.

A better future for MDM (Master Data Management) could certainly be that every organization doesn’t have to do the work over and over and again. While self registration by customers is a way of letting off the burden on private enterprises and public sector bodies, we may even do better by not having the customer being the data entry clerk and typing in the same information over and over and again.

Today there are several available options for customer and other business partner reference data:

  • Public sector registries which are getting more and more open being that for example for the address part or even deeper in due respect of privacy considerations which may be different for business entities and individual entities.
  • Commercial directories often build on top of public registries.
  • Personal data lockers like the Mydex service mentioned by Dylan.
  • Social network profiles.

instant Single Customer ViewMy guess is that the future of MDM is going to be a mashup of exploiting the above options.

Oh, and as representatives of such a mashup service we recently at iDQ made sure we had the accurate, complete and timely information filled in on our Linkedin Company profile.

Bookmark and Share

Making sense with Social MDM

A few days ago Jeff Jonas of IBM made a new blog post called Master Data Management (MDM) vs. Sensemaking.

iDQ microscopeHerein Jeff Jonas ponders the differences in the data matching algorithms we use in traditional MDM, predominately name and address matching, and the kind of identity resolution we need when we for example try to listen to and make sense of the signals in the social media data streams.

Jeff Jonas says: “Different missions, different tools.  Some organizations will use one or the other; most organizations will want both.”  

I tend to disagree slightly with Jeff Jonas. As told in the post The New Year in Identity Resolution I think we will need a connection between the old systems of record and the new systems of engagement.

Indeed the algorithms will be used differently and indeed we need different thresholds of confidence for different tasks. But I think we will have to make the integration story a bit more complicated in order to make sensible decisions across the two missions.

Bookmark and Share

Data Management in the Cloud

We are seeing more and more data management services offered in the cloud.

dnblogo2As I have had a long time experience with data matching services around the Dun & Bradstreet WorldBase, it was good to see a presentation yesterday in Stockholm featuring D&B Europe’s new cloud based data manager service.

Managing World-Wide B2B Master Data

The D&B WorldBase is a business directory with 225 million business entities from all over the world.

D&B’s Data Manager is a self-service application in the cloud around the WorldBase taking care of:

  • Data matching with comprehensive functionality for manual inspection, approval and master data survivorship
  • Data enrichment embracing a wide range of data attributes
  • Data Maintenance subscription for keeping enriched data up to date

The data matching functionality is built on the good old D&B methodology with confidence codes and matchgrades.

Right for QlikTech

QlikTech is the Swedish firm (pretending to be American) behind the prominent business intelligence solution called QlikView.

At the Stockholm event QlikTech presented how and why they use the D&B Data Manager for ensuring the right data quality in their cloud based B2B CRM solution (SalesForce.com).

As QlikTech is operating all over the world having a consistent world-wide business directory as the reference for party master data is extremely important, and the self-service concept is a perfect match for having the right insight and control into achieving the needed level of data quality in CRM master data.

From there the QlikTech CRM team takes its own medicine using QlikView for self-service business intelligence.

Bookmark and Share