What’s in an eMail Address?

When you are deduping, consolidating or doing identity resolution with party master data the elements that may be used includes names, postal addresses and places, phone numbers, national ID’s and eMail addresses.

Types of eMail addresses

In this post I will look closer into eMail addresses based on a general list of types of party master data.

You may divide eMail addresses into these types:

CONSUMER/CITIZEN:

This is a private eMail address belonging to an individual person.

Typical formats are myname@hotmail.com and nickname@gmail.com and name123@anymail.com

You may change your eMail address as a private person as time goes or have several such addresses at a time depending on your favourite providers of eMail services and other reasons to split your personality.

HOUSEHOLD:

A household/family may choose to have a shared eMail Address for private use.

Typical format will be xyz-family@anymail.com where the word family of course could be in a lot of different languages like famiglia-italiano@email.it

A special usage is the GROUP where two (or more) names are included like mary-and-john@anymail.com

EMPLOYEE:

This is the eMail address you are assigned as an employee (including owner) at a company.

Common formats are abc@company.com and name.name@company.com

When you change employer you also change eMail address and you may have several employers or other assignments at the same time. Also different formats like initials and full name may point to the same inbox.

DEPARTMENT:

Here the eMail address is not pointed at a particular person but some sort of a team within a company.

Formats are like sales@company.com and salg@firma.dk and vertrieb@firma.de choosing the sales team in some different languages.

Some eMail are referring to a specific FUNCTION like webmaster@company.com

BUSINESS:

This is an eMail address for the entire company.

Most common formats are info@company.com and company@company.com

INVALID:

Often a field designed for an eMail address is populated with invalid values going from obvious wrong values like XXX to harder detectable syntax errors and not existing domains.

Real world duplication

Many online services are based on registration via an eMail address assuming that one eMail represents one real world entity which of course is not the case.

Even on a service like LinkedIn where you may attach several eMail addresses to one profile you do encounter persons with obvious duplicate profiles.

Multi-channel marketing and sales

An increasing number of organisations are doing both offline and online operations today and when building enterprise wide master data hubs the eMail address becomes an more and more important element in matching party master data.

In such matching activities the eMail address can not stand alone but must be combined with the other elements as names, postal addresses, phone numbers and national ID’s upon availability.

Success in automating such processes is based on advanced algorithms in flexible and configurable solutions.

Comment or eMail me

If you also have been battling here I will be glad to have your comments here or by mail. My mail is hlsgr@mail.tele.dk and hls@omikron.net and hls@locus.dk and hls@dmpartner.dk and nordic@omikron.net

Bookmark and Share

55 reasons to improve data quality

The business value in data quality improvement is an ever recurring topic in the realm of data quality.

In the following I will list the first 55 reasons that comes to my mind for improving data quality related to the single most frequent data quality issue around, which is duplicates (and unresolved hierarchies) in party master data – names and addresses.

It goes like this:

1.  It’s a waste of money sending the same printed material twice or more times to the same individual consumer.

2.  Allowing the same customer enter twice or more times for an introduction offer challenges the return of investment in such campaigns.

3.  When measuring churn and win-back two or more unrelated accounts for the same business hierarchy will produce an incomplete result leading to a wrong decision.

4.  Sending the same promotion eMail twice or more times to the same individual consumer looks like spam even if different eMail addresses are used. Spam has more offending than selling power.

5.  It’s probably a waste of money sending the same printed material with presentation and offerings to a household already having a customer.

6.  Assigning different credit terms for two or more unrelated accounts for the same business hierarchy will make uncontrolled financial risk.

7.  When measuring cross selling results two or more unrelated accounts for the same household will produce an incomplete result leading to a wrong decision.

8.  When measuring life time value two or more unrelated accounts for the same individual consumer will produce a wrong result leading to a wrong decision.

9.  It’s probably a waste of money sending the same printed material twice or more times to the same household.

10.  When measuring life time value two or more unrelated accounts for the same individual being a consumer and a business owner will produce an incomplete result leading to a wrong decision.

11.  When wanting a 1-1 dialogue two or more unrelated accounts for the same individual consumer will not lead to a 1-1 dialogue.

12.  Having companies represented in two or more unrelated accounts for the same company with a different line-of-business assigned will produce an incomplete segmentation.

13.  When trying to point at your best customers being households in order to find similar households two or more unrelated accounts for the same household will produce an incomplete segmentation.

14.  When measuring cross selling results two or more unrelated accounts for the same individual consumer will produce a wrong result leading to a wrong decision.

15.  It’s a waste of money sending printed material with presentation and offerings to an individual consumer already being a customer.

16.  When wanting a 1-1 dialogue two or more unrelated accounts for the same business hierarchy will not lead to a complete 1-1 dialogue.

17.  When measuring life time value two or more unrelated accounts for the same business hierarchy will produce an incomplete result leading to a wrong decision.

18.  Assigning different credit terms for two or more unrelated accounts for the same individual consumer will increase financial risk.

19.  When measuring cross selling results two or more unrelated accounts for the same individual being a consumer and a business owner will produce only an incoherent result leading to a wrong decision.

20.  When wanting a 1-1 dialogue two or more unrelated accounts for the same household will not lead to a true 1-1 dialogue.

21.  Assigning different credit terms for two or more unrelated accounts for the same business entity could increase financial risk.

22.  Having activities related to companies attached to two or more unrelated accounts for the same company will show an incomplete customer history with the risk of taking damaging actions.

23.  It’s a waste of money and credibility sending printed material with presentation and offerings to an individual business decision maker in a business entity already being a customer.

24.  When buying from a supplier having two or more unrelated accounts despite being the same business entity you may miss discount opportunities.

25.  Having companies represented in two or more unrelated accounts for the same company with a different lead source assigned will produce a false measure of marketing and sales performance.

26.  Sending the same promotion eMail or newsletter twice or more times to the same individual business decision maker looks like spam even if different eMail addresses are used. Spam has more offending than selling power.

27.  When measuring  churn and win-back two or more unrelated accounts for the same household will produce an incomplete result leading to a wrong decision.

28.  Having activities related to influencers attached to two or more unrelated business contact records for the same person will show an incomplete business partner history with the risk of retaking already made actions.

29.  When buying from a supplier having two or more unrelated accounts despite they are belonging the same business hierarchy you could miss discount opportunities.

30.  Having activities related to households attached to two or more unrelated accounts for the same household will show an incomplete customer history with the risk of taking insufficient  actions.

31.  When trying to point at your best customers being individual consumers in order to find similar individuals two or more unrelated accounts for the same individual consumer will produce a wrong segmentation.

32.  Having companies represented in two or more unrelated accounts for the same company with a different address assigned will produce an incomplete segmentation.

33.  When measuring life time value two or more unrelated accounts for the same business entity will produce a false result leading to a wrong decision.

34.  Having activities related to decision makers in companies attached to two or more unrelated contacts for the same person will show an incomplete customer contact history with the risk of not taking appropriate actions.

35.  When wanting a 1-1 dialogue two or more unrelated accounts for the same business entity will not lead to a real 1-1 dialogue.

36.  When trying to point at your best customers being companies in order to find similar companies two or more unrelated accounts for the same company will produce a false segmentation.

37.  Maintaining data related to two or more unrelated accounts for the same real world entity will probably be more costly than necessary when exploiting external reference data.

38.  It’s probably a waste of money sending printed material with presentation and offerings to a business entity already being a customer at a higher or lower hierarchy level.

39.  Having individual consumers represented in two or more unrelated accounts for the same individual consumer with a different lead source assigned will produce a wrong measure of marketing and sales performance.

40.  Allowing the same customer re-enter for an offer already turned down (e.g. credit services) will create unnecessary double validation work.

41.  When measuring churn and win-back two or more unrelated accounts for the same business entity will produce a false result leading to a wrong decision.

42.  When wanting a 1-1 dialogue two ore more unrelated accounts for the same individual being a consumer and a business owner will not lead to a sensible 1-1 dialogue.

43.  When measuring cross selling results two or more unrelated accounts for the same business entity will produce a false result leading to a wrong decision.

44.  Having activities related to individual consumers attached to two or more unrelated accounts for the same individual consumer will show an incomplete customer history with the risk of taking wrong actions.

45.  When measuring life time value two or more unrelated accounts for the same household will produce an incomplete result leading to a wrong decision.

46.  Having activities related to customers attached to two or more unrelated accounts for the same real world entity may lead to that different sales representatives are working against each other.

47.  Allowing sales representatives creating new accounts for already existing customers may create time consuming commission disputes.

48.  Having households represented in two or more unrelated accounts for the same household with a different lead source assigned will produce an incomplete measure of marketing and sales performance.

49.  Maintaining data related to two or more unrelated accounts for the same real world entity will consume more manual work than necessary.

50.  When measuring churn and win-back two or more unrelated accounts for the same individual consumer will produce a wrong result leading to a wrong decision.

51.  When buying from a supplier having two or more unrelated accounts despite being the same business entity you may have multiple unnecessary inventory costs.

52.  It’s a waste of money and credibility sending the same printed material twice or more times to the same individual business decision maker.

53.  When measuring churn and win-back two or more unrelated accounts for the same individual being a consumer and a business owner will produce only an incoherent result leading to a wrong decision.

54.  Assigning different credit terms for two or more unrelated accounts for the same household may increase financial risk.

55.  When measuring cross selling results two or more unrelated accounts for the same business hierarchy will produce an incomplete result leading to a wrong decision.

Bookmark and Share

Master Data Survivorship

A Master Data initiative is often described as making a “golden view” of all Master Data records held by an organization in various databases used by different applications serving a range of business units.

In doing that (either in the initial consolidation or the ongoing insertion and update) you will time and again encounter situations where two versions of the same element must be merged into one version of the truth.

In some MDM hub styles the decision is to be taken at consolidation time, in other styles the decision is prolonged until the data (links) is consumed in a given context.

In the following I will talk about Party Master Data being the most common entity in Master Data initiatives.

mergeThis spring Jim Harris made a brilliant series of articles on DataQualityPro on the subject of identifying duplicate customers ending with part number 5 dealing with survivorship. Here Jim describes all the basic considerations on how some data elements survives a merge/purge and others will be forgotten and gives good examples with US consumer/citizens.

Taking it from there Master Data projects may have the following additional challenges and opportunities:

  • Global Data adds diversity into the rule set of consolidation data on record level as well as field level. You will have to comprise on simple global rules versus complex optimized rules (and supporting knowledge data) for each country/culture.
  • Multiple types of Party Master Data must be handled when Business Partners includes business entities having departments and employees and not at least when they are present together with consumers/citizens.
  • External Reference Data is becoming more and more common as part of MDM solutions adding valid, accurate and complete information about Business Partners. Here you have to set rules (on field level) of whether they override internal data, fills in the blanks or only supplements internal data.
  • Hierarchy building is closely related to survivorship. Rules may be set for whether two entities goes into two hierarchies with surviving parts from both or merges as one with survivorship. Even an original entity may be split into two hierarchies with surviving parts.

What is essential in survivorship is not loosing any valuable information while not creating information redundancy.

An example of complex survivorship processing may be this:

A membership database holds the following record (Name, Address, City):

  • Margaret & John Smith, 1 Main Street, Anytown

An eShop system has the following accounts (Name, Address, Place):

  • Mrs Margaret Smith, 1 Main Str, Anytown
  • Peggy Smith, 1 Main Street, Anytown
  • Local Charity c/o Margaret Smith, 1 Main Str, Anytown

A complex process of consolidation including survivorship may take place. As part of this example the company Local Charity is matched with an external source telling it has a new name being Anytown Angels. The result may be this “golden view”:

ADDRESS in Anytown on Main Street no 1 having
• HOUSEHOLD having
– CONSUMER Mrs. Margaret Smith aka Peggy
– CONSUMER Mr. John Smith
• BUSINESS Anytown Angels having
– EMPLOYEE Mrs. Margaret Smith aka Peggy

Observe that everything survives in a global applicable structure in a fit hierarchy reflecting local rules handling multiple types of party entities using external reference data.

But OK, we didn’t have funny names, dirt, misplaced data…..

Bookmark and Share

Splitting names

When working through a list of names in order to make a deduplication, consolidation or identity resolution you will meet name fields populated as these:

  • Margaret & John Smith
  • Margaret Smith. John Smith
  • Maria Dolores St. John Smith
  • Johnson & Johnson Limited
  • Johnson & Johnson Limited, John Smith
  • Johnson Furniture Inc., Sales Dept
  • Johnson, Johnson and Smith Sales Training

SplitSome of the entities having these names must be split into two entities before we can do the proper processing.

When you as a human look at a name field, you mostly (given that you share the same culture) know what it is about.

Making a computer program that does the same is an exiting but fearful journey.

What I have been working with includes the following techniques:

  • String manipulation
  • Look up in list of words as given names, family names, titles, “business words”, special characters. These are country/culture specific.
  • Matching with address directories, used for checking if the address is a private residence or a business address.
  • Matching with business directories, used for checking if it is in fact a business name and which part of a name string is not included in the corresponding name.
  • Matching with consumer/citizen directories, used for checking which names are known on an address.
  • Probabilistic learning, storing and looking up previous human decisions.

As with other data quality computer supported processes I have found it useful having the computer dividing the names into 3 pots:

  • A: The ones the computer may split automatically with an accepted failure rate of false positives
  • B: The dubious ones, selected for human inspection
  • C: The clean ones where the computer have found no reason to split (with an accepted failure rate of false negatives)

For the listed names a suggestion for the golden single version of the truth could be:

  • “Margaret & John Smith” will be split into CONSUMER “Margaret Smith” and CONSUMER “John Smith”
  • “Margaret Smith. John Smith” will be split into CONSUMER “Margaret Smith” and CONSUMER “John Smith”
  • “Maria Dolores St. John Smith” stays as CONSUMER “Maria Dolores St. John Smith”
  • “Johnson & Johnson Limited” stays as BUSINESS “Johnson & Johnson Limited”
  • “Johnson & Johnson Limited, John Smith” will be split into BUSINESS “Johnson & Johnson Limited” having EMPLOYEE “John Smith”
  • “Johnson Furniture Inc., Sales Dept” will be split into “BUSINESS “Johnson Furniture Inc.” having “DEPARTMENT “Sales Dept”
  • “Johnson, Johnson and Smith Sales Training” stays as BUSINESS “Johnson, Johnson and Smith Sales Training”

For further explanation of the Master Data Types BUSINESS, CONSUMER, DEPARTMENT, EMPLOYEE you may have a look here.

Bookmark and Share

Business Rules and Duplicates

When finding or avoiding duplicates or doing similar kind of consolidation with party master data you will encounter lots of situations, where it is disputable what to do.

The “political correct” answer is: Depends on your business rules.

Yea right. Easier said than done.

Often you face the following:

  • Business rules doesn’t exist. Decisions are based on common sense.
  • Business rules differs between data providers.

Lets have an example.

We have these business rules (Owner, Brief):

Finance, No sales and deliveries to dissolved business entities
Logistics, Access to premises must be stated in Address2 if different from Address1
Sales, Every event must be registered with an active contact
Customer Service, In case of duplicate contacts the contact with the first event date wins

In a CRM system we have these 2 accounts (AccountID, CompanyName, Address1, Address2, City):

1, Restaurant San Remo, 2 Main Street, entrance thru no 4, Anytown
2, Ristorante San Remo, 2 Main Street, , Anytown

Also we have some contacts (AccountID, ContactID, JobTitle, ContactName, Status, StartYear. EventCount):

1, 1, Manager, Luigi Calda, Inactive, 2001, 2
1, 2, Chef de la Cusine, John Hothead, Active, 2002, 87
2, 1, Chef de la Cuisine, John Hothead, Duplicate, 2008, 2
2, 2, Owner, Gordon Testy, Active, 2008, 7

We are so lucky that a business directory is available now. Here we have (NationalID, Name, Address, City, Owner, Status):

3, Ristorante San Remo, 2 Main Street, Anytown, Luigi Calda, Dissolved
4, Ristorante San Remo, 2 Main Street, Anytown, Gordon Testy, Active

Under New ManagementSo, I don’t think we will produce a golden view of this business relationship based on the data (structure) available and the business rules available.

Building and aligning business rules and data structures to solve this example – and a lot of other examples with different challenges – may seem difficult and are often omitted in the name of simplicity. But:

  • Master data – not at least business partners – is a valuable asset in the enterprise, so why treat it with simplicity while we do complex handling with a lot of other (transaction) data.
  • Common sense may help you a lot. Many of these questions are not specific to your business but are shared among most other enterprises in your industry and many others in the whole real world.
  • I guess the near future will bring increased number of available services with software and external data support that helps a lot in selecting common business rules and apply these in the master data processing landscape.

Bookmark and Share

Mu

muThe term ”Mu” has several meanings including being a lost continent. In this post I will use the meaning of “mu” being the answer to a question that can’t be answered with a simple “yes” or “no” or even “unknown” as explained on Wikipedia here.

When working with data quality you often encounter situations where the answer to a simple question must be “mu”.

Let’s say you are looking for duplicates in a customer file and have these two rows (Name, Address, City):

Margaret Smith, 1 Main Street, Anytown
Margaret & John Smith, 1 Main Street, Anytown

Is this a duplicate situation?

In a given context like preparing for a direct mail the answer could be “yes”. But in most other contexts the answer is “mu”. Here the question should be something like: How do you handle hierarchy management with these two rows? And the answer could be something like the process presented in my recent post here.

Similar considerations apply to this example (Name, Address, City):

One Truth Consultants att: John Smith, 3 Main Street, Anytown
One Truth Consultants Ltd, 3 Main Street, Anytown

And this (Contact, Company, Address, City):

John Smith, One Truth Consultants, 3 Main Street, Anytown
John Smith, One Truth Services, 3 Main Street, Anytown

The latter example is explained in more details in this post.

Bookmark and Share

Process of consolidating Master Data

stormp1

In my previous blog post “Multi-Purpose Data Quality” we examined a business challenge where we have multiple purposes with party master data.

The comments suggested some form of consolidation should be done with the data.

How do we do that?

I have made a PowerPoint show “Example process of consolidating master data” with a suggested way of doing that.

The process uses the party master data types explained here.

The next questions in solving our business challenge will include:

  • Is it necessary to have master data in optimal shape real time – or is it OK to make periodic consolidation?
  • How do we design processes for maintaining the master data when:
    • New members and customers are inserted?
    • We update existing members and customers?
    • External reference data changes?   
  • What changes must be made with the existing applications handling the member database and the eShop?

Also the question of what style of Master Data Hub is suitable is indeed very common in these kinds of implementations.

Bookmark and Share

So, how about SOHO homes

This post is the 3rd in a series of challenges in Data Matching with Party Master Data hierarchies.

80 % of all business entities are one-man-bands operated from so called SOHO’s (Small-Office-Home-Office). The home part is very often seen as a business is sharing a private residence address with a household.

farm

Examples are:

  • Farmers
  • Healthcare professionals
  • Small shops
  • Small membership organisation administrations
  • Fawlty Towers
  • Independent Data Quality consultants

Here we have a 3 layer relationship:

  • An ADDRESS occupied by a HOUSEHOLD and a BUSINESS (if not several)
  • The HOUSEHOLD consists of one or several CONSUMERS
  • The BUSINESS(s) has an EMPLOYEE being the Business Owner / Representative

One of the CONSUMERs and the EMPLOYEE is the same real world individual.

(About party master data entity types please have a look here.)

This very, very common construction creates some challenges in Data Matching and Master Data hierarchy building such as:

  • If you focus on B2B (Business-to-Business) you want to include the Business and Owner in that role, but not the same individual in the consumer role.
  • If you focus on B2C (Business-to-Consumer) you want to include the consumer role of that individual, but not the business (owner) role.
  • If you do both B2B and B2C you may want to assign either a B2B or a B2C category, and that’s tricky with those individuals
  • In several industries business owners, the business and the household is a special target group with unique product requirements. This is true for industries as banking, insurance, telco, real estate, law.

In my previous post on B2B (E2E) and B2C hierarchies methods for solving this is fuzzy matching, exploiting external reference data and other investigations – and so it is with this challenge as well. This makes Data Matching and Master Data hierarchy building a very exciting profession were you need both business and technology skills – and a real world perspective – to go all the way.

Bookmark and Share

Household Householding

When doing B2C (business-to-consumer) activities often you really want to do B2H (business-to-household). But sometimes you also actually want B2C, having a dialogue with the individual customer. So yet again we have a Party Master Data hierarchy, here households each consisting of one or several consumers (typically a nuclear family). In Data Model language there is a parent-child relationship between households and consumers.

The classic reason for wanting to identify households is that it’s a waste of money sending several printed catalogues and other offline mailings to the same household. But a lot of other good reasons based on a shared household budget exist too.

Data captured about consumers could look like this (name, address, city):

  • Margaret Smith, 1 Main Street, Anytown
  • Margaret & John Smith, 1 Main Str, Anytown
  • John Smith, 1 Main Street, Anytown
  • Peggy Smith, 1 Main Street, Anytown
  • Mr. J. Smith, 1 Main Street, Anytown

Here it seems fair to assume that we have:

  • A HOUSEHOLD being the Smith family consisting of
  • A CONSUMER being Margaret nicknamed Peggy
  • And a CONSUMER being John

(About party master data entity types please have a look here.)

But this is an easy example compared to what you see when working with names and addresses. Among complications I have seen are:

  • Households consisting of individuals with separate family names
  • Multi adult generation households and other kinds of households
  • Not having unique addresses may cause forming not existing households
  • Some addresses are not for traditional households, but are nursing homes, campus residence halls and the like
  • The time dimension: un-synchronous relocation capture, marriage (couples), divorce (split)

Families_USIn other words: The real world is not that simple and the picture of how households are forming does change.

Available composable methods for maintaining household information are:

  • Ask your customers. An obvious choice but not easy to keep on going – your ROI may not be positive.
  • Fuzzy Data Matching. The higher percent of all citizens in a given region you have in your database the better your matching may be aligned with the real world.
  • Exploiting external reference data. Having knowledge about public address data helps a lot. Such data may tell you about uniqueness of addresses and the attributes of the buildings there. Availability differs around the world, but the trend in open government data may help.

This is the second post in a series around hierarchies in Party Master Data and how this must be handled in data matching. Previous post was about B2B (E2E) data. Next post planned is about SOHO’s.

Bookmark and Share

Echoes in the Database

A basic structure of B2B (Business-to-Business) Party Master Data is that you have accounts being business entities each having one or several contacts being employees in each business entity. These employees act in the roles of decision makers, gate keepers, invoice receivers and so on. In Data Model language there is a parent-child relationship between accounts and contacts.

When doing deduplication with such data you aim to make a golden copy with unique business entities having unique contacts.

After achieving that you may gaze the data and stumble over rows in the golden copy as these (function, contact name, account name, address):

  • HR, John Smith, Smashing Estates Ltd, Same Place in Anytown
  • HR, John Smith, Smashing Solicitors Ltd, Same Place in Anytown
  • IT, Tushnelda von Keine-Mustermann, The Old Treadmill Ltd, Anytown
  • IT, Tushnelda von Keine-Mustermann, Brand New Brands Ltd, Anytown

Duplicates? Probably it’s the same real world individuals.

Chang-eng-bunker-PDJohn Smith is the ultimate Anglo common name, but if your favorite external business directory tells you that the 2 companies has the same mother and are modest size organizations, the possibility of John Smith being the same person having the same role at the same time in 2 companies is very high.

Tushnelda has a very unique name, so here there is a high possibility that she has got a new job in a new company, which makes one of the entries inactive. If one is going to be selected as the active survivor it may be chosen from newest update, found in external reference data or investigated otherwise.

B2B is often not actually Business-to-Business but also E2E – Employee-to-Employee – as the relationship exists between employees in the selling and buying business entities and it is not unusual that the relation may follow the employees when they change employer.

So striving for “one version of the truth” through “360 degree view on customer” is not a one layer exercise. This fact must be modeled in the Master Data structure, supported by functionality and prevented by feasible data quality implementations.

It’s my plan to do some blog posts around hierarchies in Party Master Data and how this must be handled in data matching. Next post will be about B2C data.

Bookmark and Share