Multi-Channel Data Matching

Most data matching activities going on are related to matching customer, other rather party, master data.

In today’s business world we see data matching related to party master data in those three different channels types:

  • Offline is the good old channel type where we have the mother of all business cases for data matching being avoiding unnecessary costs by sending the same material with the postman twice (or more) to the same recipient.
  • Online has been around for some time. While the cost of sending the same digital message to the same recipient may not be a big problem, there are still some other factors to be considered, like:
    • Duplicate digital messages to the same recipient looks like spam (even if the recipient provided different eMail addresses him/her self).
    • You can’t measure a true response rate
  • Social is the new channel type for data matching. Most business cases for data matching related to social network profiles are probably based on multi-channel issues.

Multi-channel data matchingThe concept of having a single customer view, or rather single party view, involves matching identities over offline, online and social channels, and typical elements used for data matching are not entirely the same for those channels as seen in the figure to the right.

Most data matching procedures are in my experience quite simple with only a few data elements and no history track taking into considering. However we do see more sophisticated data matching environments often referred to as identity resolution, where we have historical data, more data elements and even unstructured data taking into consideration.

When doing multi-channel data matching you can’t avoid going from the popular simple data matching environments to more identity resolution like environments.

Some advices for getting it right without too much complication are:

  • Emphasize on data capturing by getting it right the first time. It helps a lot.
  • Get your data models right. Here reflecting the real world helps a lot.
  • Don’t reinvent the wheel. There are services for this out here. They help a lot.

Read more about such a service in the post instant Single Customer View.

Bookmark and Share

Putting it Right

Data Governance (DG), Reference Data Management (RDM) and Management Data Management (MDM) are closely related disciplines.

MDM DG RDMConsequently the Data Governance Conference Europe 2013 and the Master Data Management Summit Europe 2013 are co-located and a hot topic this year is Reference Data Management.

The difficulties in putting the sessions on the conference in one right place may be seen by that the session called Establishing Reference Data Governance in the Large Enterprise is part of a MDM track, but is actually mostly about data governance. The session is labeled Product MDM & Reference Data, but will be about governing reference data for multi-domain MDM and the data governance program described was in fact based on a party master data challenge involving reference data for industry classification.

In the session Petter Larsen, Head of Data Governance at Norway’s largest financial services group called DNB, and Thomas T. Thykjaer, Lead MDM Consultant at Capgemini, will connect the dots in the landscape of business vocabularies, data models, the data governance toolbox, data domains and reference data architecture.

I for sure look forward to that Petter and Thomas will put it right.

Bookmark and Share

What’s so special about your party master data?

My last blog post was called Is Managing Master Data a Differentiating Capability? The post is an introduction to a conference session being a case story about managing master data at Philips.

During my years working with data quality and master data management it has always struck me how different organizations are managing the party master data domain while in fact the issues are almost the same everywhere.

business partnersFirst of all party master data are describing real world entities being the same to everyone. Everyone is gathering data about the same individuals and the same companies being on the same addresses and having the same digital identities. The real world also comes in hierarchies as households, company families and contacts belonging to companies which are the same to everyone. We may call that the external hierarchy.

Based on that everyone has some kind of demand for intended duplicates as a given individual or company may have several accounts for specific purposes and roles. We may call that the internal hierarchy.

A party master data solution will optimally reflect the internal hierarchy while most of the business processes around are supported by CRM-systems, ERP-systems and special solutions for each industry.

Fulfilling reflecting the external hierarchy will be the same to everyone and there is no need for anyone to reinvent the wheel here. There are already plenty of data models, data services and data sources out there.

Right now I’m working on a service called instant Data Quality that is capable of embracing and mashing up external reference data sources for addresses, properties, companies and individuals from all over the world.

The iDQ™ service already fits in at several places as told in the post instant Data Quality and Business Value. I bet it fits your party master data too.

Bookmark and Share

Beyond True Positives in Deduplication

The most frequent data quality improvement process done around is deduplication of party master data.

A core functionality of many data quality tools is the capability to find duplicates in large datasets with names, addresses and other party identification data.

When evaluating the result of such a process we usually divide the result of found duplicates into:

  • False positives being automated match results that actually do not reflect  real world duplicates
  • True positives being  automated match results reflecting the same real world entity

The difficulties in reaching the above result aside, you should think the rest is easy. Take the true positives, merge into a golden record and purge the unneeded duplicate records in your database.

Well, I have seen so many well executed deduplication jobs ending just there, because there are a lot of reasons for not making the golden records.

Sure, at lot of duplicates “are bad” and should be eliminated.

But many duplicates “are good” and have actually been put into the databases for a good reason supporting different kind of business processes where one view is needed in one case and another view is needed in another case.

Many, many operational applications, including very popular ERP and CRM systems, do have inferior data models that are not able to reflect the complexity of the real world.

Only a handful of MDM (Master Data Management) solutions are able to do so, but even then the solutions aren’t easy as most enterprises have an IT landscape with all kinds of applications with other business relevant functionality that isn’t replaced by a MDM solution.

What I like to do when working with getting business value from true positives is to build a so called Hierarchical Single Source of Truth.

Bookmark and Share

Hierarchical Single Source of Truth

Most data quality and master data management gurus, experts and practitioners agree that achieving a “single source of truth” is a nice term, but is not what data quality and master data management is really about as expressed by Michele Goetz in the post Master Data Management Does Not Equal The Single Source Of Truth.

Even among those people, including me, who thinks emphasis on real world alignment could help getting better data and information quality opposite to focusing on fitness for multiple different purposes of use, there is acknowledgement around that there is a “digital distance” between real world aligned data and the real world as explained by Jim Harris in the post Plato’s Data. Also, different public available reference data sources that should reflect the real world for the same entity are often in disagreement.

When working with improvement of data quality in party master data, which is the most frequent and common master data domain with issues, you encounter the same issues over and over again, like:

  • Many organizations have a considerable overlap of real world entities who is a customer and a supplier at the same time. Expanding to other party roles this intersection is even bigger. This calls for a 360° Business Partner View.
  • Most organizations divide activities into business-to-business (B2B) and business-to-consumer (B2C). But the great majority of business’s are small companies where business and private is a mixed case as told in the post So, how about SOHO homes.
  • When doing B2C including membership administration in non-profit you often have a mix of single individuals and households in your core customer database as reported in the post Household Householding.
  • As examined in the post Happy Uniqueness there is a lot of good fit for purpose of use reasons why customer and other party master data entities are deliberately duplicated within different applications.
  • Lately doing social master data management (Social MDM) has emerged as the new leg in mastering data within multi-channel business. Embracing a wealth of digital identities will become yet a challenge in getting a single customer view and reaching for the impossible and not always desirable single source of truth.

A way of getting some kind of structure into this possible, and actually very common, mess is to strive for a hierarchical single source of truth where the concept of a golden record is implemented as a model with golden relations between real world aligned external reference data and internal fit for purpose of use master data.

Right now I’m having an exciting time doing just that as described in the post Doing MDM in the Cloud.

Bookmark and Share

The Big Tower of Babel

3 years ago one of the first blog posts on this blog was called The Tower of Babel.

This post was the first of many posts about multi-cultural challenges in data quality improvement. These challenges includes not only language variations but also different character sets reflecting different alphabets and script systems, naming traditions, address formats, measure units, privacy norms, government registration practice to name some of the ones I have experienced.

When organizations are working internationally it may be tempting to build a new Tower of Babel imposing the same language for metadata (probably English) and the same standards for names, addresses and other master data (probably the ones of the country where the head quarter is).

However, building such a high tower may end up the same way as the Tower of Babel known from the old religious tales.

Alternatively a mapping approach may be technically a bit more complex but much easier when it comes to change management.

The mapping approach is used in the Universal Postal Unions’ (UPU) attempt to make a “standard” for worldwide addresses. The UPU S42 standard is mentioned in the post Down the Street. The S42 standard does not impose the same way of writing on envelopes all over the world, but facilitates mapping the existing ways into a common tagging mapped to a common structure.

Building such a mapping based “standard” for addresses, and other master data with international diversity, in your organization may be a very good way to cope with balancing the need for standardization and the risks in change management including having trusted and actionable master data.

The principle of embracing and mapping international diversity is a core element in the service I’m currently working with. It’s not that the instant Data Quality service doesn’t stretch into the clouds. Certainly it is a cloud service pulling data quality from the cloud. It’s not that that it isn’t big. Certainly it is based on big reference data.

Bookmark and Share

Broken Links

When passing the results of data cleansing activities back to source systems I have often encountered what one might call broken links, which have called for designing data flows that doesn’t go by book, doesn’t match the first picture of the real world and eventually prompts last minute alternate ways of doing things.

I have had the same experience when passing some real (and not real) world bridges lately.

The Trembling Lady: An Unsound Bridge

When walking around in London a sign on the Albert Bridge caught my eye. The sign instructs troops to break steps when marching over.

In researching the Albert Bridge on Wikipedia I learned that the bridge has an unsound construction that makes it vibrate not at least when a bunch of troops marches across in rhythm. The bridge has therefore got the nickname “The Trembling Lady”.

It’s an old sign. The bridge is an old bridge. But it’s still standing.

The same way we often have to deal with old systems running on unstable databases with unsound data models. That’s life. Though it’s not the way we want to see it, we most break the rhythm of else perfectly cleansed data as discussed in the post Storing a Single Version of the Truth.  

The Øresund Bridge: The Sound Link

The sound between the city of Malmö in Sweden and København (Copenhagen) in Denmark can be crossed by the Øresund Bridge. If looking at a satellite picture you may conclude that the bridge isn’t finished. That’s because a part of the link is in fact an undersea tunnel as told in the post Geocoding from 100 Feet Under.

Your first image about what can be done and what can’t be done isn’t always the way of the world. Dig into some more sources, find some more charts and you may find a way.

However, life isn’t always easy. Sometimes charts and maps can be deceiving.

Wodna: The Sound of Silence.

As reported in the post Troubled Bridge over Water I planned a cycling trip last summer. The route would take us across the Polish river Świna by a bridge I found on Google Maps.

When, after a hard day’s ride in the saddle, we reached the river, the bridge wasn’t there. We had to take a ferry across the river instead.

I maybe should have known. The bridge on the map was named Wodna. That is Polish for (something with) water.

Bookmark and Share

Small Business Owners

A challenge I encounter over and over again within Data Matching and customer Master Data Management is what to do with small business owners.

Examples of small business owners are:

  • Farmers
  • Healthcare professionals with an own clinic
  • Small family driven shop owners
  • Modest membership organisation administrators
  • Local hospitality providers as Basil Fawlty of Fawlty Towers
  • Independent Data Quality consultants as myself

When handling customer master data we often like to divide those into Business-to-consumer (B2C) or Business-to-business (B2B). We may have different source systems, different data models and different data owners and data stewards for each of the two divisions.

But small business owners usually belong to both divisions. In some transactions they act as private persons (B2C) and in some other transactions they act as a business contact (B2B). If you like to know your customer, have a single customer view , engage in social media and all that jazz, you must have a unique view of the person, the business and the household.

In several industries small business owners, the business and the household is a special target group with unique product requirements. This is true for industries as banking, insurance, telco, real estate, law.

So here are plenty of business cases for multi-domain Master Data Management embracing customer master data and product master data.

The capability to handle a single customer view of small business owners is in my experience very poorly fulfilled in Data Quality and Master Data Management solutions around. Here is certainly room for improvement and entrepreneurship.

Bookmark and Share

Party, Product, Place. Period.

In a recent post here on this blog the Master Data Management domain usually called locations was examined and followed by excellent comments.

Also in the DAMA International LinkedIn group there was a great discussion around the location domain.

The comments touched two subjects:

  • Are locations just geographic locations or can we deal with “digital locations” as eMail addresses, phone numbers, websites, go-to-meeting ID’s, social network ID’s and so as locations as well?
  • How do we model the relations between parties, products and locations?

Sometimes I like to use the word places instead of locations as we then have a P-trinity of parties, products and places.

I’m not sure if places have a stronger semantic link to geography than locations have. Anyway, my thoughts on the location domain were merely connected to geography. The digital locations mentioned also in my eyes are more related to parties and not so much products. The same is true for another good old substitute for a location or address being a mailbox (like “Postbox 1234”), which is a valid notion for the destination of a letter or small package, and often seen in database columns else filled with geographic locations.

So, sticking to places being physical, geographic locations: How do we model parties, products and places?

First of all it’s important that we are able to model different concepts within each domain in one single way. A very common situation in many enterprise data landscapes is that different forms of parties exist with different models, like a model for customers, a model for suppliers, a model for employees and other models for other business partner roles.

The association between a party entity and a location entity is in most cases a time dependent relation like this consumer was billed on this address in this period. The relation between the party and the product is the good old basic data model, that we invoiced this and this product on that date. The product and place relation is very industry specific. One example will be that an on-site service contract applies to this address in this period.

Time, often handled as a period, will indeed add a fourth P to the P-trinity of party, product and place.

Bookmark and Share

The Location Domain

When talking master data management we usually divide the discipline into domains, where the two most prominent domains are:

  • Customer, or rather party, master data management
  • Product, sometimes also named “things”, master data management

One the most frequent mentioned additional domains are locations.

But despite that locations are all around we seldom see a business initiative aimed at enterprise wide location data management under a slogan of having a 360 degree view of locations. Most often locations are seen as a subset of either the party master data or in some cases the product master data.  

Industry diversity

The need for having locations as focus area varies between industries.

In some industries like public transit, where I have been working a lot, locations are implicit in the delivered services. Travel and hospitality is another example of a tight connection between the product and a location. Also some insurance products have a location element. And do I have to mention real estate: Location, Location, Location.

In other industries the location has a more moderate relation to the product domain. There may be some considerations around plant and warehouse locations, but that’s usually not high volume and complex stuff.  

Locations as a main factor in exploiting demographic stereotypes are important in retailing and other business-to-consumer (B2C) activities. When doing B2C you often want to see your customer as the household where the location is a main, but treacherous, factor in doing so. We had a discussion on the house-holding dilemma in the LinkedIn Data Matching group recently.

Whenever you, or a partner of yours, are delivering physical goods or a physical letter of any kind to a customer, it’s crucial to have high quality location master data. The impact of not having that is of course dependent on the volume of deliveries.   

Globalization

If you ask me about London, I will instinctively think about the London in England. But there is a pretty big London in Canada too, that would be top of mind to other people. And there are other smaller Londons around the world.

Master data with location attributes does increasingly come in populations covering more than one country. It’s not that ambiguous place names don’t exist in single country sets. Ambiguous place names were the main driver behind that many countries have a postal code system. However the British, and the Canadians, invented a system including letters opposite to most other systems only having numbers typically with an embedded geographic hierarchy.

Apart from the different standards used around the possibilities for exploiting external reference data is very different concerning data quality dimensions as timeliness, consistency, completeness, conformity – and price.

Handling location data from many countries at the same time ruins many best practices of handling location data that have worked for handling location for a single country.

Geocoding

Instead of identifying locations in a textual way by having country codes, state/province abbreviations, postal codes and/or city names, street names and types or blocks and house numbers and names it has become increasingly popular to use geocoding as supplement or even alternative.

There are different types of geocodes out there suitable for different purposes. Examples are:

  • Latitude and longitude picturing a round world,
  • UTM X,Y coordinates picturing peels of the world
  • WGS84 X, Y coordinates picturing a world as flat as your computer screen.

While geocoding has a lot to offer in identifying and global standardization we of course has a gap between geocodes and everyday language. If you want to learn more then come and visit me at N55’’38’47, E12’’32’58.

Bookmark and Share