Data Matching – Page 22 – Liliendahl on Data Quality

Process of consolidating Master Data

27th September 20096th July 2010Henrik Gabs Liliendahl4 Comments

stormp1

In my previous blog post “Multi-Purpose Data Quality” we examined a business challenge where we have multiple purposes with party master data.

The comments suggested some form of consolidation should be done with the data.

How do we do that?

I have made a PowerPoint show “Example process of consolidating master data” with a suggested way of doing that.

The process uses the party master data types explained here.

The next questions in solving our business challenge will include:

Is it necessary to have master data in optimal shape real time – or is it OK to make periodic consolidation?
How do we design processes for maintaining the master data when:
- New members and customers are inserted?
- We update existing members and customers?
- External reference data changes?
What changes must be made with the existing applications handling the member database and the eShop?

Also the question of what style of Master Data Hub is suitable is indeed very common in these kinds of implementations.

The new face of Data Matching

28th August 200919th June 2010Henrik Gabs Liliendahl2 Comments

When matching database records holding data about a person we traditionally use string attributes as Citizen/Tax ID, Name, Address, Phone, Email.

PolarRose Today I stumbled over a company called Polar Rose that specialize in recognition of peoples faces on pictures. Current use is tagging people on Facebook pictures, but really, this technology could make Data Matching, Identity Resolution and Deduplication better.

We already know fuzzy matching with names and addresses have plenty of challenges with false positives and false negatives. Surely I also do imaging same issues with facial recognition. But we also know from comparing with strings that the more different information we may gather, the better we are at avoiding false matching. So combining fuzzy string matching and facial recognition (where picture is available) could add more human mimic to matching technology reliability.

Right now I am considering whether to add this feature to Data Quality 2.0 or leave it for Data Quality 3.0.

So, how about SOHO homes

23rd August 20091st February 2012Henrik Gabs Liliendahl3 Comments

This post is the 3^rd in a series of challenges in Data Matching with Party Master Data hierarchies.

80 % of all business entities are one-man-bands operated from so called SOHO’s (Small-Office-Home-Office). The home part is very often seen as a business is sharing a private residence address with a household.

farm

Examples are:

Farmers
Healthcare professionals
Small shops
Small membership organisation administrations
Fawlty Towers
Independent Data Quality consultants

Here we have a 3 layer relationship:

An ADDRESS occupied by a HOUSEHOLD and a BUSINESS (if not several)
The HOUSEHOLD consists of one or several CONSUMERS
The BUSINESS(s) has an EMPLOYEE being the Business Owner / Representative

One of the CONSUMERs and the EMPLOYEE is the same real world individual.

(About party master data entity types please have a look here.)

This very, very common construction creates some challenges in Data Matching and Master Data hierarchy building such as:

If you focus on B2B (Business-to-Business) you want to include the Business and Owner in that role, but not the same individual in the consumer role.
If you focus on B2C (Business-to-Consumer) you want to include the consumer role of that individual, but not the business (owner) role.
If you do both B2B and B2C you may want to assign either a B2B or a B2C category, and that’s tricky with those individuals
In several industries business owners, the business and the household is a special target group with unique product requirements. This is true for industries as banking, insurance, telco, real estate, law.

In my previous post on B2B (E2E) and B2C hierarchies methods for solving this is fuzzy matching, exploiting external reference data and other investigations – and so it is with this challenge as well. This makes Data Matching and Master Data hierarchy building a very exciting profession were you need both business and technology skills – and a real world perspective – to go all the way.

Household Householding

13th August 200923rd June 2010Henrik Gabs Liliendahl11 Comments

When doing B2C (business-to-consumer) activities often you really want to do B2H (business-to-household). But sometimes you also actually want B2C, having a dialogue with the individual customer. So yet again we have a Party Master Data hierarchy, here households each consisting of one or several consumers (typically a nuclear family). In Data Model language there is a parent-child relationship between households and consumers.

The classic reason for wanting to identify households is that it’s a waste of money sending several printed catalogues and other offline mailings to the same household. But a lot of other good reasons based on a shared household budget exist too.

Data captured about consumers could look like this (name, address, city):

Margaret Smith, 1 Main Street, Anytown
Margaret & John Smith, 1 Main Str, Anytown
John Smith, 1 Main Street, Anytown
Peggy Smith, 1 Main Street, Anytown
Mr. J. Smith, 1 Main Street, Anytown

Here it seems fair to assume that we have:

A HOUSEHOLD being the Smith family consisting of
A CONSUMER being Margaret nicknamed Peggy
And a CONSUMER being John

(About party master data entity types please have a look here.)

But this is an easy example compared to what you see when working with names and addresses. Among complications I have seen are:

Households consisting of individuals with separate family names
Multi adult generation households and other kinds of households
Not having unique addresses may cause forming not existing households
Some addresses are not for traditional households, but are nursing homes, campus residence halls and the like
The time dimension: un-synchronous relocation capture, marriage (couples), divorce (split)

In other words: The real world is not that simple and the picture of how households are forming does change.

Available composable methods for maintaining household information are:

Ask your customers. An obvious choice but not easy to keep on going – your ROI may not be positive.
Fuzzy Data Matching. The higher percent of all citizens in a given region you have in your database the better your matching may be aligned with the real world.
Exploiting external reference data. Having knowledge about public address data helps a lot. Such data may tell you about uniqueness of addresses and the attributes of the buildings there. Availability differs around the world, but the trend in open government data may help.

This is the second post in a series around hierarchies in Party Master Data and how this must be handled in data matching. Previous post was about B2B (E2E) data. Next post planned is about SOHO’s.

Echoes in the Database

11th August 200930th June 2010Henrik Gabs Liliendahl3 Comments

A basic structure of B2B (Business-to-Business) Party Master Data is that you have accounts being business entities each having one or several contacts being employees in each business entity. These employees act in the roles of decision makers, gate keepers, invoice receivers and so on. In Data Model language there is a parent-child relationship between accounts and contacts.

When doing deduplication with such data you aim to make a golden copy with unique business entities having unique contacts.

After achieving that you may gaze the data and stumble over rows in the golden copy as these (function, contact name, account name, address):

HR, John Smith, Smashing Estates Ltd, Same Place in Anytown
HR, John Smith, Smashing Solicitors Ltd, Same Place in Anytown
…
IT, Tushnelda von Keine-Mustermann, The Old Treadmill Ltd, Anytown
IT, Tushnelda von Keine-Mustermann, Brand New Brands Ltd, Anytown

Duplicates? Probably it’s the same real world individuals.

Chang-eng-bunker-PD John Smith is the ultimate Anglo common name, but if your favorite external business directory tells you that the 2 companies has the same mother and are modest size organizations, the possibility of John Smith being the same person having the same role at the same time in 2 companies is very high.

Tushnelda has a very unique name, so here there is a high possibility that she has got a new job in a new company, which makes one of the entries inactive. If one is going to be selected as the active survivor it may be chosen from newest update, found in external reference data or investigated otherwise.

B2B is often not actually Business-to-Business but also E2E – Employee-to-Employee – as the relationship exists between employees in the selling and buying business entities and it is not unusual that the relation may follow the employees when they change employer.

So striving for “one version of the truth” through “360 degree view on customer” is not a one layer exercise. This fact must be modeled in the Master Data structure, supported by functionality and prevented by feasible data quality implementations.

It’s my plan to do some blog posts around hierarchies in Party Master Data and how this must be handled in data matching. Next post will be about B2C data.

Sweden meets United States

5th August 200919th June 2010Henrik Gabs Liliendahl2 Comments

obama-ikea

Finding duplicate customers may be very different tasks depending on from which country you are and from which country the data origins.

Besides all the various character sets, naming traditions and address formats also the alternative possibilities with external reference data makes something easy – and then something very hard.

Most technology, descriptions and presented examples around are from the United States.

But say you are a Swedish company having Swedish persons in your database and among those these 2 rows (name, address, postal code and city):

Oluf Palme, Sveagatan 67, 10001 Stockholm
Oluf Palme, Savegatan 76, 10001 Stockholm

What you do is that you plug into the government provided citizen master data hub and ask for a match. The outcome can be:

The same citizen ID is returned because the person has relocated. It’s a duplicate.
Two different citizen ID’s is returned. It’s not a duplicate.
Either only one or no citizen ID is returned. Leave it or do fuzzy matching.

If you go for fuzzy matching then you better be good, because all the easy ones are handled and you are left with the ones where false positives and false negatives are most likely. Often you will only do fuzzy matching if you have phone numbers, email addresses or other data to support the match.

Another angle is that it is almost only Swedish companies who use this service with the government provided reference data – but everyone having Swedish data may use it upon an approval.

Data quality solutions with party master data is not only about fuzzy matching but also about integrating with external reference data exploiting all the various world wide possibilities and supporting the logic and logistics in doing that. Also we know that upstream prevention as close to the root as possible is better than downstream cleansing.

Deployment of such features as composable SOA components is described in a previous post here.

Master Data meets the Customer

2nd August 20091st July 2010Henrik Gabs LiliendahlLeave a comment

In the old days Master Data was predominately created, maintained and used by the staff in the organisation having these data. This is in many cases not the fact anymore. Besides exchanging data with partners in doing business, today the customer – and prospect – has become an important person to be considered when doing Data Governance and implementing technology around Master Data.

In the online world the customer works with your Master Data when:

The customer creates and maintains name, address and communication information by using registration functions
The customer searches for and reads product information on web shops and information sites

Having the prospects and customers helping with the name and address (party) data is apparently great news for lowering costs in the organisation. But in the long run you got yourself another silo with data and your Data Quality issues has become yet more challenging.

First thing to do is to optimise your registration forms. An important thing to consider here is that online is worldwide (unless you restrict your site to visitors from a single country). When doing business online with multi national customers then take care that the sequence, formats and labels are useful to everyone and that mandatory checks and other validations are in line with rules for the country in question.

External reference data may be used for lookup and validation integrated in the registration forms.

The concept of “one version of the truth” is a core element in most Master Data Management solutions. Doing deduplication within online registration have privacy considerations. When asking for personal data you can’t prompt “Possible duplicate found” and then present the data about someone else. Here you need more than one data quality firewall.

Many organisations are not just either offline or online but are operating in both worlds. To maintain the 360 degree view on customer in this situation you need strong data matching techniques capable of working with offline and online captured data. As the business case for online registration is very much about reducing staff involvement, this is about using technology and keeping human interaction to a minimum.

When a prospect comes to your site and tries to find information about your products, the first thing to do is very often using the search function. From deduplication of names and addresses we know that spelling is difficult and that sometimes we use other synonyms than used in the Master Data descriptions. Add to that the multi-cultural aspect. The solution here is that you use the same fuzzy search techniques that we use for data matching. This is a kind of reuse. I like that.

Master Data Quality: The When Dimension

28th July 20091st July 2010Henrik Gabs Liliendahl6 Comments

Often we use the who, what and where terms in defining master data opposite to transaction data, like saying:

Transaction data accurately identifies who, what, where and when and…
Master data accurately describes who, what and where

Who is easily related to our business partners, what to the products we sell, buy and use – where is the locations of the events.

In some industries when is also easily related to master data entities like in public transportation a time table valid for a given period. Also a fiscal year in financial reporting belongs to the when side of things.

But when is also a factor in improving and preventing data quality related to our business partners, products and locations and assigned categories because the description of these entities do change over time.

This fact is named as “slowly changing dimensions” when building data warehouses and attempting to make sense of data with business intelligence.

But also in matching, deduplication and identity resolution the “when” dimension matters. Having data with the finest actuality doesn’t necessary lead to a good match as you may compare with data not having the same actuality. Here history tracking is a solution by storing former names, addresses, phones, e-mail addresses, descriptions, roles and relations.

Such a complexity is often not handled in master data containers around – and even less in matching environments.

My guess is that the future will bring public accessible reference data in the cloud describing our master data entities with a rich complexity including the when – the time – dimension and capable matching environments around.

The art of Business Directory Matching

22nd July 20091st September 2010Henrik Gabs LiliendahlLeave a comment

A business directory is a list of companies in a given area and perhaps a given industry. One very useful type of such a directory related to data quality is a list of all companies in a given country. In many countries the authorities maintains such a list, other places it’s a matter of assembling local lists or other forms of data capture. Many private service providers offer such lists often with added information value of different kinds.

If you take the customer/prospect master table from an enterprise doing B2B in a given country one should believe that the rows in that table would match 100% to the business directory of that country. I am not talking about that all data are spelled exactly as in the directory but “only” about that it’s the same real world object reflected.

neural1 During many years of providing solutions for business directory match and tuning these as well as handling such match services from colleagues in the business I have very, very seldom seen a 100% match – even 90% matches are very rare.

Why is that so? Some of the reasons – related to the classic data quality dimensions – I have stumbled over has been:

Completeness of business directories varies from country to country and between the lists provided by vendors. Some countries like those of the old Czechoslovakia, some English speaking countries in the Pacifics, the Nordics and others have a tight registration and then it is less tight from countries in North America, other European countries and the rest of the world.

Actuality in business directories also differs a lot. Also it is important if the business directory covers dissolved entities and includes history tracking like former names and addresses. Then take the actuality of the customer/prospect table to be matched and once again the time dimension has a lot to say.

Validity, accuracy, consistency both concerning the directory and the table to be matched is a natural course of mismatch. Also many B2B customer/prospect tables holds a lot of entities not being a formal business entity but being a lot of other types of party master data.

Uniqueness may be different defined in the directory and table to be matched. This includes the perception of hierachies of legal entities and branches – not at least governmental and local authority bodies is a fuzzy crowd. Also different roles as those of a small business owner makes challenges. The same is true about roles as franchise takers and the use of trading styles.

Then of course the applied automated match technique and the human interaction executed are factors of the resulting match rate and the quality of the match measured as frequency of false positives.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph