Process of consolidating Master Data

stormp1

In my previous blog post “Multi-Purpose Data Quality” we examined a business challenge where we have multiple purposes with party master data.

The comments suggested some form of consolidation should be done with the data.

How do we do that?

I have made a PowerPoint show “Example process of consolidating master data” with a suggested way of doing that.

The process uses the party master data types explained here.

The next questions in solving our business challenge will include:

  • Is it necessary to have master data in optimal shape real time – or is it OK to make periodic consolidation?
  • How do we design processes for maintaining the master data when:
    • New members and customers are inserted?
    • We update existing members and customers?
    • External reference data changes?   
  • What changes must be made with the existing applications handling the member database and the eShop?

Also the question of what style of Master Data Hub is suitable is indeed very common in these kinds of implementations.

Bookmark and Share

Multi-Purpose Data Quality

Say you are an organisation within charity fundraising. Since many years you had a membership database and recently you also introduced an eShop with related accessories.

The membership database holds the following record (Name, Address, City, YearlyContribution):

  •  Margaret & John Smith, 1 Main Street, Anytown, 100 Euro

The eShop system has the following accounts (Name, Address, Place, PurchaseInAll):

  • Mrs Margaret Smith, 1 Main Str, Anytown, 12 Euro
  • Peggy Smith, 1 Main Street, Anytown, 218 Euro
  • Local Charity c/o Margaret Smith, 1 Main Str, Anytown, 334 Euro

Now the new management wants to double contributions from members and triple eShop turnover. Based on the recommendations from “The One Truth Consulting Company” you plan to do the following:

  • Establish a platform for 1-1 dialogue with your individual members and customers
  • Analyze member and customer behaviour and profiles in order to:
    • Support the 1-1 dialogue with existing members and customers
    • Find new members and customers who are like your best members and customers

As the new management wants to stay for many years ahead, the solution must not be a one-shot exercise but must be implemented as a business process reengineering with a continuous focus on the best fit data governance, master data management and data (information) quality.

question-marksSo, what are you going to do with your data so they are fit for action with the old purposes and the new purposes?

Recently I wrote some posts related to these challenges:

Any other comments on the issues in how to do it are welcome.

Bookmark and Share

Upstream prevention by error tolerant search

Fuzzy matching techniques were originally developed for batch processing in order to find duplicates and consolidate database rows with no unique identifiers with the real world.

These processes have traditionally been implemented for downstream data cleansing.

As we know that upstream prevention is much more effective than tidy up downstream, real time data entry checking is becoming more common.

But we are able to go further upstream by introducing error tolerant search capabilities.

A common workflow when in-house personnel are entering new customers, suppliers, purchased products and other master data are, that first you search the database for a match. If the entity is not found, you create a new entity. When the search fails to find an actual match we have a classic and frequent cause for either introducing duplicates or challenge the real time checking.

An error tolerant search are able to find matches despite of spelling differences, alternative arranged words, various concatenations and many other challenges we face when searching for names, addresses and descriptions.

SOA componentImplementation of such features may be as embedded functionality in CRM and ERP systems or as my favourite term: SOA components. So besides classic data quality elements for monitoring and checking we can add error tolerant search to the component catalogue needed for a good MDM solution.

Bookmark and Share

So, how about SOHO homes

This post is the 3rd in a series of challenges in Data Matching with Party Master Data hierarchies.

80 % of all business entities are one-man-bands operated from so called SOHO’s (Small-Office-Home-Office). The home part is very often seen as a business is sharing a private residence address with a household.

farm

Examples are:

  • Farmers
  • Healthcare professionals
  • Small shops
  • Small membership organisation administrations
  • Fawlty Towers
  • Independent Data Quality consultants

Here we have a 3 layer relationship:

  • An ADDRESS occupied by a HOUSEHOLD and a BUSINESS (if not several)
  • The HOUSEHOLD consists of one or several CONSUMERS
  • The BUSINESS(s) has an EMPLOYEE being the Business Owner / Representative

One of the CONSUMERs and the EMPLOYEE is the same real world individual.

(About party master data entity types please have a look here.)

This very, very common construction creates some challenges in Data Matching and Master Data hierarchy building such as:

  • If you focus on B2B (Business-to-Business) you want to include the Business and Owner in that role, but not the same individual in the consumer role.
  • If you focus on B2C (Business-to-Consumer) you want to include the consumer role of that individual, but not the business (owner) role.
  • If you do both B2B and B2C you may want to assign either a B2B or a B2C category, and that’s tricky with those individuals
  • In several industries business owners, the business and the household is a special target group with unique product requirements. This is true for industries as banking, insurance, telco, real estate, law.

In my previous post on B2B (E2E) and B2C hierarchies methods for solving this is fuzzy matching, exploiting external reference data and other investigations – and so it is with this challenge as well. This makes Data Matching and Master Data hierarchy building a very exciting profession were you need both business and technology skills – and a real world perspective – to go all the way.

Bookmark and Share

Household Householding

When doing B2C (business-to-consumer) activities often you really want to do B2H (business-to-household). But sometimes you also actually want B2C, having a dialogue with the individual customer. So yet again we have a Party Master Data hierarchy, here households each consisting of one or several consumers (typically a nuclear family). In Data Model language there is a parent-child relationship between households and consumers.

The classic reason for wanting to identify households is that it’s a waste of money sending several printed catalogues and other offline mailings to the same household. But a lot of other good reasons based on a shared household budget exist too.

Data captured about consumers could look like this (name, address, city):

  • Margaret Smith, 1 Main Street, Anytown
  • Margaret & John Smith, 1 Main Str, Anytown
  • John Smith, 1 Main Street, Anytown
  • Peggy Smith, 1 Main Street, Anytown
  • Mr. J. Smith, 1 Main Street, Anytown

Here it seems fair to assume that we have:

  • A HOUSEHOLD being the Smith family consisting of
  • A CONSUMER being Margaret nicknamed Peggy
  • And a CONSUMER being John

(About party master data entity types please have a look here.)

But this is an easy example compared to what you see when working with names and addresses. Among complications I have seen are:

  • Households consisting of individuals with separate family names
  • Multi adult generation households and other kinds of households
  • Not having unique addresses may cause forming not existing households
  • Some addresses are not for traditional households, but are nursing homes, campus residence halls and the like
  • The time dimension: un-synchronous relocation capture, marriage (couples), divorce (split)

Families_USIn other words: The real world is not that simple and the picture of how households are forming does change.

Available composable methods for maintaining household information are:

  • Ask your customers. An obvious choice but not easy to keep on going – your ROI may not be positive.
  • Fuzzy Data Matching. The higher percent of all citizens in a given region you have in your database the better your matching may be aligned with the real world.
  • Exploiting external reference data. Having knowledge about public address data helps a lot. Such data may tell you about uniqueness of addresses and the attributes of the buildings there. Availability differs around the world, but the trend in open government data may help.

This is the second post in a series around hierarchies in Party Master Data and how this must be handled in data matching. Previous post was about B2B (E2E) data. Next post planned is about SOHO’s.

Bookmark and Share

Echoes in the Database

A basic structure of B2B (Business-to-Business) Party Master Data is that you have accounts being business entities each having one or several contacts being employees in each business entity. These employees act in the roles of decision makers, gate keepers, invoice receivers and so on. In Data Model language there is a parent-child relationship between accounts and contacts.

When doing deduplication with such data you aim to make a golden copy with unique business entities having unique contacts.

After achieving that you may gaze the data and stumble over rows in the golden copy as these (function, contact name, account name, address):

  • HR, John Smith, Smashing Estates Ltd, Same Place in Anytown
  • HR, John Smith, Smashing Solicitors Ltd, Same Place in Anytown
  • IT, Tushnelda von Keine-Mustermann, The Old Treadmill Ltd, Anytown
  • IT, Tushnelda von Keine-Mustermann, Brand New Brands Ltd, Anytown

Duplicates? Probably it’s the same real world individuals.

Chang-eng-bunker-PDJohn Smith is the ultimate Anglo common name, but if your favorite external business directory tells you that the 2 companies has the same mother and are modest size organizations, the possibility of John Smith being the same person having the same role at the same time in 2 companies is very high.

Tushnelda has a very unique name, so here there is a high possibility that she has got a new job in a new company, which makes one of the entries inactive. If one is going to be selected as the active survivor it may be chosen from newest update, found in external reference data or investigated otherwise.

B2B is often not actually Business-to-Business but also E2E – Employee-to-Employee – as the relationship exists between employees in the selling and buying business entities and it is not unusual that the relation may follow the employees when they change employer.

So striving for “one version of the truth” through “360 degree view on customer” is not a one layer exercise. This fact must be modeled in the Master Data structure, supported by functionality and prevented by feasible data quality implementations.

It’s my plan to do some blog posts around hierarchies in Party Master Data and how this must be handled in data matching. Next post will be about B2C data.

Bookmark and Share

Master Data Audit

In the recent cycling sport paramount of the year ”Le Tour de France” one of the leading teams was “Team Saxo Bank”. The name should actually have been “Team Saxo Bank IT Factory”. But “IT Factory” is gone.

IT Factory was during the last years a comet in the Danish IT industry with fast increasing turnover and revenues verified by leading auditors. Only a few people led by a (now) known blogger asked about the customer base. 1st December 2008 it all blew up and it was revealed that 99% of the turnover was a fairy tale. More details on wiki.

If the auditors had spent 10 minutes (or so) on the Master Data besides looking at Transaction Data making the Financial Statements, the auditors would have found the mismatch between the customer base (and linked products) and the real world – and several banks and others would not have lost a lot of money.

Master Data Management is first of all a benefit to the organisation having these data. But as shown in the above example, it is like with financial statements also of interest to the surrounding world that the Master Data has a reasonable data quality and alignment with the real world. Often financial statements are followed by market and other assessments built on the Master Data of the organisation.

Without comparison with the IT Factory case I remember another case from Denmark this year where Telia, a leading Telco in the Nordics, in addition to the Financial Statement told that the they had 44.000 more customers in the database during the year. Asked how it was counted the answer revealed, that it actually was 44.000 more active SIM-cards. So the case was, that it could have been 1 new customer with 1 SIM-card and 43.999 existing customers having more SIM-cards. Link in Danish here.

We already know SOX and EuroSOX as compliance approaches with financial statements and Basel II also affects the data quality and real world alignment of Master Data in banking. My guess is that we will see more focus on the Data Quality and real world alignment of Master Data from outside the organisation adding to ongoing awareness on the subject already existing inside many organisations.

Bookmark and Share

Sweden meets United States

obama-ikea

Finding duplicate customers may be very different tasks depending on from which country you are and from which country the data origins.

Besides all the various character sets, naming traditions and address formats also the alternative possibilities with external reference data makes something easy – and then something very hard.

Most technology, descriptions and presented examples around are from the United States.

But say you are a Swedish company having Swedish persons in your database and among those these 2 rows (name, address, postal code and city):

  • Oluf Palme, Sveagatan 67, 10001 Stockholm
  • Oluf Palme, Savegatan 76, 10001 Stockholm

What you do is that you plug into the government provided citizen master data hub and ask for a match. The outcome can be:

  • The same citizen ID is returned because the person has relocated. It’s a duplicate.
  • Two different citizen ID’s is returned. It’s not a duplicate.
  • Either only one or no citizen ID is returned. Leave it or do fuzzy matching.

If you go for fuzzy matching then you better be good, because all the easy ones are handled and you are left with the ones where false positives and false negatives are most likely. Often you will only do fuzzy matching if you have phone numbers, email addresses or other data to support the match.

Another angle is that it is almost only Swedish companies who use this service with the government provided reference data – but everyone having Swedish data may use it upon an approval.

Data quality solutions with party master data is not only about fuzzy matching but also about integrating with external reference data exploiting all the various world wide possibilities and supporting the logic and logistics in doing that. Also we know that upstream prevention as close to the root as possible is better than downstream cleansing.

Deployment of such features as composable SOA components is described in a previous post here.

Master Data meets the Customer

In the old days Master Data was predominately created, maintained and used by the staff in the organisation having these data. This is in many cases not the fact anymore. Besides exchanging data with partners in doing business, today the customer – and prospect – has become an important person to be considered when doing Data Governance and implementing technology around Master Data.

In the online world the customer works with your Master Data when:

  • The customer creates and maintains name, address and communication information by using registration functions
  • The customer searches for and reads product information on web shops and information sites

Having the prospects and customers helping with the name and address (party) data is apparently great news for lowering costs in the organisation. But in the long run you got yourself another silo with data and your Data Quality issues has become yet more challenging.

First thing to do is to optimise your registration forms. An important thing to consider here is that online is worldwide (unless you restrict your site to visitors from a single country). When doing business online with multi national customers then take care that the sequence, formats and labels are useful to everyone and that mandatory checks and other validations are in line with rules for the country in question.

External reference data may be used for lookup and validation integrated in the registration forms.

The concept of “one version of the truth” is a core element in most Master Data Management solutions. Doing deduplication within online registration have privacy considerations. When asking for personal data you can’t prompt “Possible duplicate found” and then present the data about someone else. Here you need more than one data quality firewall.

Many organisations are not just either offline or online but are operating in both worlds. To maintain the 360 degree view on customer in this situation you need strong data matching techniques capable of working with offline and online captured data. As the business case for online registration is very much about reducing staff involvement, this is about using technology and keeping human interaction to a minimum.

Search and navigationWhen a prospect comes to your site and tries to find information about your products, the first thing to do is very often using the search function. From deduplication of names and addresses we know that spelling is difficult and that sometimes we use other synonyms than used in the Master Data descriptions. Add to that the multi-cultural aspect. The solution here is that you use the same fuzzy search techniques that we use for data matching. This is a kind of reuse. I like that.

Bookmark and Share

Follow Friday Master Data Hub

Social Networking needs Master Data Management.

brownbird_leftA recurring event every Friday on Twitter is the #FollowFriday with the acronym #FF, where people on Twitter tweets about who to follow.

I do it too and as every one else sometimes I perhaps forget someone, and then (s)he gets angry and don’t #FF me and that’s bad. Bad Data Management. Bad #mdm.

So now I have started building a Master Data Hub fit for the purpose of doing consistent #FF. I do see other purposes for this as well as I recognize the advantages of combining data sources, so I did a #datamatching with LinkedIn connections to improve #dataquality through Identity Resolution.

This is as far I am now (very convenient that WordPress lets me edit my blog posts):

@ReferenceData where http://www.linkedin.com/pub/carla-mangado/11/467/239 is Staff Writer

@KenOConnorData is http://www.linkedin.com/in/kenoconnor00

@ocdqblog is a blog where http://www.linkedin.com/in/jimharris is blogger-in-chief

@dataqualitypro is a community founded by http://www.linkedin.com/in/dylanjones

Dylan was a @Datanomic partner where @SteveTuck is http://www.linkedin.com/in/stevetuck

@InitiateSystems has a CTO = @wmmarty who is http://www.linkedin.com/pub/marty-moseley/0/57/43b

@VishAgashe is http://www.linkedin.com/in/vishagashe

@KeithMesser is http://www.linkedin.com/in/keithmesser running @GlobalMktgPros

@fionamacd is at @TrilliumSW as seen here http://www.linkedin.com/in/fionamacd

So is @stevesarsfield being http://www.linkedin.com/pub/steve-sarsfield/2/675/47a

Trillium is owned by Harte-Hanks where @MarkGoloboy also was http://www.linkedin.com/in/markgoloboy

@biknowledgebase is operated by http://www.linkedin.com/in/barryharmsen

@Dataexperts has a managing director who is http://www.linkedin.com/pub/gary-holland/1/101/135

@IDResolution (Infoglide) has several Data Matching members in http://www.linkedin.com/groups?gid=2107798 including http://www.linkedin.com/in/dougwood

@rdrijsen is http://www.linkedin.com/in/rdrijsen with possible duplicate http://www.linkedin.com/pub/resa-drijsen/1/389/58

@grahamrhind is http://www.linkedin.com/in/grahamrhind

@omathurin is http://www.linkedin.com/in/oliviermathurin

@zzubbuzz is probably http://www.linkedin.com/pub/charles-proctor/14/591/31

@CharlesBurleigh is http://www.linkedin.com/in/charlesburleigh

@wesharp is http://www.linkedin.com/in/williamesharp doing @dqchronicle

@decisionstats has an editor being http://www.linkedin.com/in/ajayohri

@jeric40 is my colleague at Omikron as shown here http://www.linkedin.com/in/janerikingvaldsen