Master Data – Page 39 – Liliendahl on Data Quality

The Country List

16th April 2013Henrik Gabs Liliendahl6 Comments

It’s the second day of the MDM Summit Europe 2013 in London today.

The last session I attended today was an expert panel on Reference Data Management (RDM).

I guess the list of countries on this planet is the prime example of what is reference data and today’s session provided no exception from that.

Even though a list of countries is fairly small and there shouldn’t be everyday changes to the list, maintaining a country list isn’t as simple as you should think.

First of all official sources for a country list aren’t in agreement. The range of countries given an ISO code isn’t the same as the range of countries where for example the Universal Postal Union (UPU) says you can make a delivery.

Another example I have had some challenges with is that for example the D&B WorldBase (a large word-wide business directory) has four country codes for what is generally regarded as the United Kingdom, as the D&B country reference data probably is defined by a soccer fan recognizing the distinct national soccer teams from England, Wales, Scotland and Northern Ireland.

The expert panel moderator, Aaron Zornes, went as far as suggesting that a graph database maybe the best technology for reflecting the complexity in reference data. Oh yes, and in master data too you should think then, though I doubt that the relational database and hierarchy management will be out of fashion for a while.

Names, Addresses and National Identification Numbers

11th April 2013Henrik Gabs Liliendahl6 Comments

When working with customer, or rather party, master data management and related data quality improvement and prevention for traditional offline and some online purposes, you will most often deal with names, addresses and national identification numbers.

While this may be tough enough for domestic data, doing this for international data is a daunting task.

Names

In reality there should be no difference between dealing with domestic data and international data when it comes to names, as people in today’s globalized world move between countries and bring their names with them.

Traditionally the emphasize on data quality related to names has been on dealing with the most frequent issues be that heaps of nick names in the United States and other places, having a “van” in bulks of names in the Netherlands or having loads of surname like middle names in Denmark.

With company names there are some differences to be considered like the inclusion of legal forms in company names as told in the post Legal Forms from Hell.

Addresses

Address formats varies between countries. That’s one thing.

The availability of public sources for address reference data varies too. These variations are related to for example:

Coverage: Is every part of the country included?
Depth: Is it street level, house number level or unit level?
Costs: Are reference data expensive or free of charge?

As told in the post Postal Code Musings the postal code system in a given country may be the key (or not) to how to deal with addresses and related data quality.

National Identification Numbers

The post called Business Entity Identifiers includes how countries have different implementations of either all-purpose national identification numbers or single-purpose national identification numbers for companies.

The same way there are different administrative practices for individuals, for example:

As I understand it is forbidden by constitution down under to have all-purpose identification numbers for individuals.
The United States Social Security Number (SSN) is often mentioned in articles about party data management. It’s an example of a single-purpose number in fact used for several purposes.
In Scandinavian countries all-purpose national identification numbers are in place as explained in the post Citizen ID within seconds.

Dealing with diversity

Managing party master data in the light of the above mentioned differences around the world isn’t simple. You need comprehensive data governance policies and business rules, you need elaborate data models and you need a quite well equipped toolbox regarding data quality prevention and exploiting external reference data.

Multi-Channel Data Matching

4th April 20134th April 2013Henrik Gabs LiliendahlLeave a comment

Most data matching activities going on are related to matching customer, other rather party, master data.

In today’s business world we see data matching related to party master data in those three different channels types:

Offline is the good old channel type where we have the mother of all business cases for data matching being avoiding unnecessary costs by sending the same material with the postman twice (or more) to the same recipient.
Online has been around for some time. While the cost of sending the same digital message to the same recipient may not be a big problem, there are still some other factors to be considered, like:
- Duplicate digital messages to the same recipient looks like spam (even if the recipient provided different eMail addresses him/her self).
- You can’t measure a true response rate
Social is the new channel type for data matching. Most business cases for data matching related to social network profiles are probably based on multi-channel issues.

The concept of having a single customer view, or rather single party view, involves matching identities over offline, online and social channels, and typical elements used for data matching are not entirely the same for those channels as seen in the figure to the right.

Most data matching procedures are in my experience quite simple with only a few data elements and no history track taking into considering. However we do see more sophisticated data matching environments often referred to as identity resolution, where we have historical data, more data elements and even unstructured data taking into consideration.

When doing multi-channel data matching you can’t avoid going from the popular simple data matching environments to more identity resolution like environments.

Some advices for getting it right without too much complication are:

Emphasize on data capturing by getting it right the first time. It helps a lot.
Get your data models right. Here reflecting the real world helps a lot.
Don’t reinvent the wheel. There are services for this out here. They help a lot.

Read more about such a service in the post instant Single Customer View.

Big Data and Data Matching

2nd April 2013Henrik Gabs Liliendahl2 Comments

Data matching has been an established discipline for many years and most data quality tools have more or less sophisticated features for data matching as well as many MDM (Master Data Management) platforms have data matching capabilities.

BigDataQuality — The LinkedIn Big Data Quality group

In a way the data matching realm has become slightly dull the recent years. People don’t get excited anymore over a discussion about if deterministic matching or probabilistic matching is the right way. Soundex is old, edit distance has been around for ages and matchcodes may have outlived themselves.

So, it’s good to see a new beast turning up. Data matching with big data.

It may be about deduplicating (deduping) volumes that is bigger than traditional data matching can handle. You know: Dedoop’ing.

But it is also very much about matching big data with small data, first and foremost master data. And having well matched master data. Kimmo Kontra wrote a good post about that recently. The post is called Big Grease, Big Data, and Big Apple – manholes and MDM.

The case presented by Kimmo holds many exciting implementations of data matching like for example proximity matching of locations.

Why You shouldn’t go to the MDM Summit Europe 2013

29th March 20132nd April 2013Henrik Gabs Liliendahl2 Comments

The weather in London has been awful this March. The forecast for the first week of April doesn’t meet historical standards either. The MDM Summit Europe 2013 will be in London 15^th to 17^th April. You shouldn’t go there because of the weather based on the trend in the weather forecast:

On the other hand, it could heat up indoor.

There are quite a lot of exciting sessions, including the ones about:

And hey, it has happened before that the weather has suddenly improved.

Small Data with Big Impact

28th March 2013Henrik Gabs Liliendahl1 Comment

In an ongoing discussion on LinkedIn there are some good points on: How important is data quality for big data compared to data quality for small data?

A repeated sentiment in the comments is that data quality for small data is going to be more important with the rise of big data.

The small data we are talking about here is first and foremost master data.

Master Data Challenges with Big Data

As with traditional transaction data master data is also describing the who, what, where and when of big data.

If we are having issues with completeness, timeliness and uniqueness in our master data any prediction based on big data matched with master data is going to be as chaotic as weather forecasts.

We also need to expand the range of entities embraced by our master data management implementations as exemplified in the post Social MDM and Future Competitive Intelligence.

Matching Big Data with Master Data

Some of the issues in matching big data with master data I have stumbled upon are:

Who: How do we link the real world entities reflected in our traditional systems of record with the real world entities behind who’s talking in systems of engagement? This question was touched in post Making Sense with Social MDM.
What: How do we manage our product hierarchies and product descriptions so they fulfill both (different) internal purposes and external usage? More on this in the post Social PIM.
Where: How do we identify a given place? If you think this is easy, why not read the post Where is the Spot?
When: Date and time comes in many formats and relating events to the wrong schedule may have us Going in the Wrong Direction.

How: You may for example follow this blog. Subscription is in the upper right corner 🙂

Sharing is the Future of MDM

26th March 2013Henrik Gabs Liliendahl2 Comments

Over at the DataRoundtable blog Dylan Jones recently posted an excellent piece called The Future of MDM?

Herein Dylan examines how a lot of people in different organizations spend a lot of time on trying to get complete, timely and unique data about customers and other business partners.

A better future for MDM (Master Data Management) could certainly be that every organization doesn’t have to do the work over and over and again. While self registration by customers is a way of letting off the burden on private enterprises and public sector bodies, we may even do better by not having the customer being the data entry clerk and typing in the same information over and over and again.

Today there are several available options for customer and other business partner reference data:

Public sector registries which are getting more and more open being that for example for the address part or even deeper in due respect of privacy considerations which may be different for business entities and individual entities.
Commercial directories often build on top of public registries.
Personal data lockers like the Mydex service mentioned by Dylan.
Social network profiles.

My guess is that the future of MDM is going to be a mashup of exploiting the above options.

Oh, and as representatives of such a mashup service we recently at iDQ made sure we had the accurate, complete and timely information filled in on our Linkedin Company profile.

Data Management in the Cloud

14th March 201314th March 2013Henrik Gabs Liliendahl2 Comments

We are seeing more and more data management services offered in the cloud.

As I have had a long time experience with data matching services around the Dun & Bradstreet WorldBase, it was good to see a presentation yesterday in Stockholm featuring D&B Europe’s new cloud based data manager service.

Managing World-Wide B2B Master Data

The D&B WorldBase is a business directory with 225 million business entities from all over the world.

D&B’s Data Manager is a self-service application in the cloud around the WorldBase taking care of:

Data matching with comprehensive functionality for manual inspection, approval and master data survivorship
Data enrichment embracing a wide range of data attributes
Data Maintenance subscription for keeping enriched data up to date

The data matching functionality is built on the good old D&B methodology with confidence codes and matchgrades.

Right for QlikTech

QlikTech is the Swedish firm (pretending to be American) behind the prominent business intelligence solution called QlikView.

At the Stockholm event QlikTech presented how and why they use the D&B Data Manager for ensuring the right data quality in their cloud based B2B CRM solution (SalesForce.com).

As QlikTech is operating all over the world having a consistent world-wide business directory as the reference for party master data is extremely important, and the self-service concept is a perfect match for having the right insight and control into achieving the needed level of data quality in CRM master data.

From there the QlikTech CRM team takes its own medicine using QlikView for self-service business intelligence.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph