Household – Liliendahl on Data Quality

Multi-Occupancy

26th January 201229th May 2012Henrik Gabs Liliendahl2 Comments

The fact that many people doesn’t live in a single family house but live in a flat sharing the same building number on a street with people living in other flats in the same building is a common challenge in data quality and data matching.

The same challenge also applies to companies sharing the same building number with other companies and not to say when companies and households are in the same building. So this is a common party master data issue.

Address verification and geocoding is seen as important methods for achieving data quality improvement related to the top data quality pain all over being quality of party master data and aiming at getting a single customer view.

Multi-occupancy is a pain in the (you know) getting there.

My pain

I have had some personal experiences living at multi-occupancy addresses lately.

One and a half years ago I was living a painless life in single family house in a Copenhagen suburb.

Then I moved closer to downtown Copenhagen in a flat as mentioned in post Down the Street.

The tradition in Denmark is to send letters and make deliveries and register master data with a common format of units within a building and having separate mailboxes with flat ID and names for each flat. I have received most of my post since then and got all deliveries I’m aware of.

Then I moved to London in a flat. Here the flats in my building have numbers. But the postman delivers the letters in one batch in the street door, and there are no names on the doorbells in front of the door.

So now I sense I don’t get many letters and today I had to order the same stuff trice from amazon.co.uk, because I haven’t received the first two packages despite of their state of the art online accessible package tracking systems that tells me that delivery was successful.

Master data pains unresolved

Address reference data at building number level and related geocodes are becoming commonly available many places around these days.

But having reference data and real world aligned location and related party master data at the unit level is still a challenge most places. Therefore we are still struggling with using address verification and geocoding for single customer view where a given building number has more than a single occupancy.

New Eyes on Iceland

24th May 20117th June 2011Henrik Gabs LiliendahlLeave a comment

This eights Data Quality World Tour blog post is about Iceland.

Patronymics

Rather than using family names, the Icelanders use patronymics. This means that the first Icelandic President Sveinn Björnsson must have been son of Björn and I guess current Prime Minister Jóhanna Sigurðardóttir is the daughter of Sigurð. This must create some havoc for well proven algorithms for finding households. (Add to that that the Prime Minister is in a same-sex marriage).

Volcanoes

In the good old days air traffic wasn’t concerned with the recurring volcanic eruptions on Iceland. Today it seems to be a repeating cause of travel havoc. A bit like poor data quality wasn’t taken seriously in the good old days, but today dirty data creates havoc in business intelligence implementations.

Previous Data Quality World Tour blog posts:

Relational Data Quality

20th May 201019th June 2010Henrik Gabs Liliendahl2 Comments

Most of the work related to data quality improvement I do is done with data in relational databases and is aimed at creating new relations between data. Examples (from party master data) are:

Make a relation between a postal address in a customer table and a real world address (represented in an official address dictionary).
Make a relation between a business entity in a vendor table and a real world business (represented in a business directory most often derived from an official business register).
Make a relation between a consumer in one prospect table and a consumer in another prospect table because they are considered to represent the same real world person.

When striving for multi-purpose data quality it is often necessary to reflect further relations from the real world like:

Make a relation in a database reflecting that two (or more) persons belongs to the same household (on the same real world address)
Make a relation in the database reflecting that two (or more) companies have the same (ultimate) mother.

Having these relations done right is fundamental for any further data quality improvement endeavors and all the exciting business intelligence stuff. In doing that you may continue to have more or less fruitful discussions on say the classic question: What is a customer?

But in my eyes, in relation to data quality, it doesn’t matter if that discussion ends with that a given row in your database is a customer, an old customer, a prospect or something else. Building the relations may even help you realize what that someone really is. Could be a sporadic lead is recognized as belonging to the same household as a good customer. Could be a vendor is recognized as being a daughter company of a hot prospect. Could be someone is recognized as being fake. And you may even have some business intelligence that based on the relations may report a given row as a customer role in one context and another role in another context.

Enterprise Data Mashup and Data Matching

6th April 20107th July 2010Henrik Gabs Liliendahl3 Comments

A mashup is a web page or application that uses or combines data or functionality from two or many more external sources to create a new service. Mashups can be considered to have an active role in the evolution of social software and Web 2.0. Enterprise Mashups are secure, visually rich web applications that expose actionable information from diverse internal and external information sources. So says Wikipedia.

I think that Enterprise Mashups will need data matching – and data matching will improve from data mashups.

The joys and challenges of Enterprise Mashups was recently touched in the post “MDM Mashups: All the Taste with None of the Calories” by Amar Ramakrishnan of Initiate. Data needs to be cleansed and matched before being exposed in an Enterprise Mashup. An Enterprise Mashup is then a fast way to deliver Master Data Management results to the organization.

Party Data Matching has typically been done in these two often separated contexts:

Matching internal data like deduplicating and consolidating
Matching internal data against an external source like address correction and business directory matching

Increased utilization of multiple functions and multiple sources – like a mashup – will help making better matching. Some examples I have tried includes:

If you know whether an address is unique or not this information is used to settle a confidence of an individual or household duplicate.
If you know if an address is a single residence or a multiple residence (like a nursing home or campus) this information is used to settle a confidence of an individual or household duplicate.
If you know the frequency of a name (in a given country) this information is used to settle a confidence of a private, household or contact duplicate.

As many data quality flaws (not surprisingly) are introduced at data entry, mashups may help during data entry, like:

An address may be suggested from an external source.
A business entity may be picked from an external business directory.
Various rules exist in different countries for using consumer/citizen directories – why not use the best available where you do business.

Also the rise of social media adds new possibilities for mashup content during data entry, data maintenance and for other uses of MDM / Enterprise Mashups. Like it or not, your data on Facebook, Twitter and not at least LinkedIn are going to be matched and mashed up.

55 reasons to improve data quality

22nd November 200928th June 2010Henrik Gabs Liliendahl9 Comments

The business value in data quality improvement is an ever recurring topic in the realm of data quality.

In the following I will list the first 55 reasons that comes to my mind for improving data quality related to the single most frequent data quality issue around, which is duplicates (and unresolved hierarchies) in party master data – names and addresses.

It goes like this:

1. It’s a waste of money sending the same printed material twice or more times to the same individual consumer.

2. Allowing the same customer enter twice or more times for an introduction offer challenges the return of investment in such campaigns.

3. When measuring churn and win-back two or more unrelated accounts for the same business hierarchy will produce an incomplete result leading to a wrong decision.

4. Sending the same promotion eMail twice or more times to the same individual consumer looks like spam even if different eMail addresses are used. Spam has more offending than selling power.

5. It’s probably a waste of money sending the same printed material with presentation and offerings to a household already having a customer.

6. Assigning different credit terms for two or more unrelated accounts for the same business hierarchy will make uncontrolled financial risk.

7. When measuring cross selling results two or more unrelated accounts for the same household will produce an incomplete result leading to a wrong decision.

8. When measuring life time value two or more unrelated accounts for the same individual consumer will produce a wrong result leading to a wrong decision.

9. It’s probably a waste of money sending the same printed material twice or more times to the same household.

10. When measuring life time value two or more unrelated accounts for the same individual being a consumer and a business owner will produce an incomplete result leading to a wrong decision.

11. When wanting a 1-1 dialogue two or more unrelated accounts for the same individual consumer will not lead to a 1-1 dialogue.

12. Having companies represented in two or more unrelated accounts for the same company with a different line-of-business assigned will produce an incomplete segmentation.

13. When trying to point at your best customers being households in order to find similar households two or more unrelated accounts for the same household will produce an incomplete segmentation.

14. When measuring cross selling results two or more unrelated accounts for the same individual consumer will produce a wrong result leading to a wrong decision.

15. It’s a waste of money sending printed material with presentation and offerings to an individual consumer already being a customer.

16. When wanting a 1-1 dialogue two or more unrelated accounts for the same business hierarchy will not lead to a complete 1-1 dialogue.

17. When measuring life time value two or more unrelated accounts for the same business hierarchy will produce an incomplete result leading to a wrong decision.

18. Assigning different credit terms for two or more unrelated accounts for the same individual consumer will increase financial risk.

19. When measuring cross selling results two or more unrelated accounts for the same individual being a consumer and a business owner will produce only an incoherent result leading to a wrong decision.

20. When wanting a 1-1 dialogue two or more unrelated accounts for the same household will not lead to a true 1-1 dialogue.

21. Assigning different credit terms for two or more unrelated accounts for the same business entity could increase financial risk.

22. Having activities related to companies attached to two or more unrelated accounts for the same company will show an incomplete customer history with the risk of taking damaging actions.

23. It’s a waste of money and credibility sending printed material with presentation and offerings to an individual business decision maker in a business entity already being a customer.

24. When buying from a supplier having two or more unrelated accounts despite being the same business entity you may miss discount opportunities.

25. Having companies represented in two or more unrelated accounts for the same company with a different lead source assigned will produce a false measure of marketing and sales performance.

26. Sending the same promotion eMail or newsletter twice or more times to the same individual business decision maker looks like spam even if different eMail addresses are used. Spam has more offending than selling power.

27. When measuring churn and win-back two or more unrelated accounts for the same household will produce an incomplete result leading to a wrong decision.

28. Having activities related to influencers attached to two or more unrelated business contact records for the same person will show an incomplete business partner history with the risk of retaking already made actions.

29. When buying from a supplier having two or more unrelated accounts despite they are belonging the same business hierarchy you could miss discount opportunities.

30. Having activities related to households attached to two or more unrelated accounts for the same household will show an incomplete customer history with the risk of taking insufficient actions.

31. When trying to point at your best customers being individual consumers in order to find similar individuals two or more unrelated accounts for the same individual consumer will produce a wrong segmentation.

32. Having companies represented in two or more unrelated accounts for the same company with a different address assigned will produce an incomplete segmentation.

33. When measuring life time value two or more unrelated accounts for the same business entity will produce a false result leading to a wrong decision.

34. Having activities related to decision makers in companies attached to two or more unrelated contacts for the same person will show an incomplete customer contact history with the risk of not taking appropriate actions.

35. When wanting a 1-1 dialogue two or more unrelated accounts for the same business entity will not lead to a real 1-1 dialogue.

36. When trying to point at your best customers being companies in order to find similar companies two or more unrelated accounts for the same company will produce a false segmentation.

37. Maintaining data related to two or more unrelated accounts for the same real world entity will probably be more costly than necessary when exploiting external reference data.

38. It’s probably a waste of money sending printed material with presentation and offerings to a business entity already being a customer at a higher or lower hierarchy level.

39. Having individual consumers represented in two or more unrelated accounts for the same individual consumer with a different lead source assigned will produce a wrong measure of marketing and sales performance.

40. Allowing the same customer re-enter for an offer already turned down (e.g. credit services) will create unnecessary double validation work.

41. When measuring churn and win-back two or more unrelated accounts for the same business entity will produce a false result leading to a wrong decision.

42. When wanting a 1-1 dialogue two ore more unrelated accounts for the same individual being a consumer and a business owner will not lead to a sensible 1-1 dialogue.

43. When measuring cross selling results two or more unrelated accounts for the same business entity will produce a false result leading to a wrong decision.

44. Having activities related to individual consumers attached to two or more unrelated accounts for the same individual consumer will show an incomplete customer history with the risk of taking wrong actions.

45. When measuring life time value two or more unrelated accounts for the same household will produce an incomplete result leading to a wrong decision.

46. Having activities related to customers attached to two or more unrelated accounts for the same real world entity may lead to that different sales representatives are working against each other.

47. Allowing sales representatives creating new accounts for already existing customers may create time consuming commission disputes.

48. Having households represented in two or more unrelated accounts for the same household with a different lead source assigned will produce an incomplete measure of marketing and sales performance.

49. Maintaining data related to two or more unrelated accounts for the same real world entity will consume more manual work than necessary.

50. When measuring churn and win-back two or more unrelated accounts for the same individual consumer will produce a wrong result leading to a wrong decision.

51. When buying from a supplier having two or more unrelated accounts despite being the same business entity you may have multiple unnecessary inventory costs.

52. It’s a waste of money and credibility sending the same printed material twice or more times to the same individual business decision maker.

53. When measuring churn and win-back two or more unrelated accounts for the same individual being a consumer and a business owner will produce only an incoherent result leading to a wrong decision.

54. Assigning different credit terms for two or more unrelated accounts for the same household may increase financial risk.

55. When measuring cross selling results two or more unrelated accounts for the same business hierarchy will produce an incomplete result leading to a wrong decision.

Mu

7th October 20095th January 2011Henrik Gabs LiliendahlLeave a comment

The term ”Mu” has several meanings including being a lost continent. In this post I will use the meaning of “mu” being the answer to a question that can’t be answered with a simple “yes” or “no” or even “unknown” as explained on Wikipedia here.

When working with data quality you often encounter situations where the answer to a simple question must be “mu”.

Let’s say you are looking for duplicates in a customer file and have these two rows (Name, Address, City):

Margaret Smith, 1 Main Street, Anytown

Margaret & John Smith, 1 Main Street, Anytown

Is this a duplicate situation?

In a given context like preparing for a direct mail the answer could be “yes”. But in most other contexts the answer is “mu”. Here the question should be something like: How do you handle hierarchy management with these two rows? And the answer could be something like the process presented in my recent post here.

Similar considerations apply to this example (Name, Address, City):

One Truth Consultants att: John Smith, 3 Main Street, Anytown

One Truth Consultants Ltd, 3 Main Street, Anytown

And this (Contact, Company, Address, City):

John Smith, One Truth Consultants, 3 Main Street, Anytown

John Smith, One Truth Services, 3 Main Street, Anytown

The latter example is explained in more details in this post.

Process of consolidating Master Data

27th September 20096th July 2010Henrik Gabs Liliendahl4 Comments

stormp1

In my previous blog post “Multi-Purpose Data Quality” we examined a business challenge where we have multiple purposes with party master data.

The comments suggested some form of consolidation should be done with the data.

How do we do that?

I have made a PowerPoint show “Example process of consolidating master data” with a suggested way of doing that.

The process uses the party master data types explained here.

The next questions in solving our business challenge will include:

Is it necessary to have master data in optimal shape real time – or is it OK to make periodic consolidation?
How do we design processes for maintaining the master data when:
- New members and customers are inserted?
- We update existing members and customers?
- External reference data changes?
What changes must be made with the existing applications handling the member database and the eShop?

Also the question of what style of Master Data Hub is suitable is indeed very common in these kinds of implementations.

Multi-Purpose Data Quality

24th September 200924th September 2011Henrik Gabs Liliendahl3 Comments

Say you are an organisation within charity fundraising. Since many years you had a membership database and recently you also introduced an eShop with related accessories.

The membership database holds the following record (Name, Address, City, YearlyContribution):

Margaret & John Smith, 1 Main Street, Anytown, 100 Euro

The eShop system has the following accounts (Name, Address, Place, PurchaseInAll):

Mrs Margaret Smith, 1 Main Str, Anytown, 12 Euro
Peggy Smith, 1 Main Street, Anytown, 218 Euro
Local Charity c/o Margaret Smith, 1 Main Str, Anytown, 334 Euro

Now the new management wants to double contributions from members and triple eShop turnover. Based on the recommendations from “The One Truth Consulting Company” you plan to do the following:

Establish a platform for 1-1 dialogue with your individual members and customers
Analyze member and customer behaviour and profiles in order to:
- Support the 1-1 dialogue with existing members and customers
- Find new members and customers who are like your best members and customers

As the new management wants to stay for many years ahead, the solution must not be a one-shot exercise but must be implemented as a business process reengineering with a continuous focus on the best fit data governance, master data management and data (information) quality.

So, what are you going to do with your data so they are fit for action with the old purposes and the new purposes?

Recently I wrote some posts related to these challenges:

Any other comments on the issues in how to do it are welcome.

So, how about SOHO homes

23rd August 20091st February 2012Henrik Gabs Liliendahl3 Comments

This post is the 3^rd in a series of challenges in Data Matching with Party Master Data hierarchies.

80 % of all business entities are one-man-bands operated from so called SOHO’s (Small-Office-Home-Office). The home part is very often seen as a business is sharing a private residence address with a household.

farm

Examples are:

Farmers
Healthcare professionals
Small shops
Small membership organisation administrations
Fawlty Towers
Independent Data Quality consultants

Here we have a 3 layer relationship:

An ADDRESS occupied by a HOUSEHOLD and a BUSINESS (if not several)
The HOUSEHOLD consists of one or several CONSUMERS
The BUSINESS(s) has an EMPLOYEE being the Business Owner / Representative

One of the CONSUMERs and the EMPLOYEE is the same real world individual.

(About party master data entity types please have a look here.)

This very, very common construction creates some challenges in Data Matching and Master Data hierarchy building such as:

If you focus on B2B (Business-to-Business) you want to include the Business and Owner in that role, but not the same individual in the consumer role.
If you focus on B2C (Business-to-Consumer) you want to include the consumer role of that individual, but not the business (owner) role.
If you do both B2B and B2C you may want to assign either a B2B or a B2C category, and that’s tricky with those individuals
In several industries business owners, the business and the household is a special target group with unique product requirements. This is true for industries as banking, insurance, telco, real estate, law.

In my previous post on B2B (E2E) and B2C hierarchies methods for solving this is fuzzy matching, exploiting external reference data and other investigations – and so it is with this challenge as well. This makes Data Matching and Master Data hierarchy building a very exciting profession were you need both business and technology skills – and a real world perspective – to go all the way.

Household Householding

13th August 200923rd June 2010Henrik Gabs Liliendahl11 Comments

When doing B2C (business-to-consumer) activities often you really want to do B2H (business-to-household). But sometimes you also actually want B2C, having a dialogue with the individual customer. So yet again we have a Party Master Data hierarchy, here households each consisting of one or several consumers (typically a nuclear family). In Data Model language there is a parent-child relationship between households and consumers.

The classic reason for wanting to identify households is that it’s a waste of money sending several printed catalogues and other offline mailings to the same household. But a lot of other good reasons based on a shared household budget exist too.

Data captured about consumers could look like this (name, address, city):

Margaret Smith, 1 Main Street, Anytown
Margaret & John Smith, 1 Main Str, Anytown
John Smith, 1 Main Street, Anytown
Peggy Smith, 1 Main Street, Anytown
Mr. J. Smith, 1 Main Street, Anytown

Here it seems fair to assume that we have:

A HOUSEHOLD being the Smith family consisting of
A CONSUMER being Margaret nicknamed Peggy
And a CONSUMER being John

(About party master data entity types please have a look here.)

But this is an easy example compared to what you see when working with names and addresses. Among complications I have seen are:

Households consisting of individuals with separate family names
Multi adult generation households and other kinds of households
Not having unique addresses may cause forming not existing households
Some addresses are not for traditional households, but are nursing homes, campus residence halls and the like
The time dimension: un-synchronous relocation capture, marriage (couples), divorce (split)

In other words: The real world is not that simple and the picture of how households are forming does change.

Available composable methods for maintaining household information are:

Ask your customers. An obvious choice but not easy to keep on going – your ROI may not be positive.
Fuzzy Data Matching. The higher percent of all citizens in a given region you have in your database the better your matching may be aligned with the real world.
Exploiting external reference data. Having knowledge about public address data helps a lot. Such data may tell you about uniqueness of addresses and the attributes of the buildings there. Availability differs around the world, but the trend in open government data may help.

This is the second post in a series around hierarchies in Party Master Data and how this must be handled in data matching. Previous post was about B2B (E2E) data. Next post planned is about SOHO’s.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph