What’s In a Given Name?

18th June 20108th January 2011Henrik Gabs Liliendahl

I use the term ”given name” here for the part of a person name that in most western cultures is called a ”first name”.

When working with automation of data quality, master data management and data matching you will encounter a lot of situations where you will like to mimic what we humans do, when we look at a given name. And when you have done this a few times you also learn the risks of doing so.

Here is some of the learning I have been through:

Gender

Most given names are either for males or for females. So most times you instinctively know if it is a male or a female when you look at a name. Probably you also know those given names in your culture that may be both. What often creates havoc is when you apply rules of one culture to data coming from a different culture. The subject was discussed on DataQualityPro here.

Salutation

In some cultures salutation is paramount – not at least in Germany. A correct salutation may depend on knowing the gender. The gender may be derived from the given name. But you should not use the given name itself in your greeting.

So writing to “Angela Merkel” will be “Sehr geehrte Frau Merkel” – translates to “Very honored Mrs. Merkel”.

If you have a small mistake as the name being “Angelo Merkel”, this will create a big mistake when writing “Sehr geehrter Herr Merkel” (Very honored Mr. Merkel) to her.

Age

In a recent post on the DataFlux Community of Experts Jim Harris wrote about how he received tons of direct mails assuming he was retired based on where he lives.

I have worked a bit with market segmentation and data (information) quality. I don’t know how it is with first names in the United States, but in Denmark you may have a good probability with estimating an age based on your given name. The statistical bureau provides statistics for each name and birth year. So combining that with the location based demographic you will get a better response rate in direct marketing.

Nicknames

Nicknames are used very different in various cultures. In Denmark we don’t use them that much and definitely very seldom in business transactions. If you meet a Dane called Jim his name is actually Jim. If you have a clever piece of software correcting/standardizing the name to be James, well, that’s not very clever.

Jim Harris 18th June 2010 / 14:19

Sehr geehrter Herr Sørensen,

Excellent post about the common data quality challenge represented by the diversity of given names.

Unfortunately, in most of the implementations in the United States, assumptions are made based on the English language, and therefore “Jim” would always be standardized as “James” for matching purposes (although usually into a separate matching field to retain the original value for survivorship where if in five matching James records, Jim was the original given name on three or more of the records, then we would probably assume that the customer preferred to be called Jim).

I am not aware of any similar statistical tables combining given name and birth year available in the United States. However, it certainly would make sense. For example, I would assume that the 1960s had a disproportionately large distribution of Moonchild, Starflower, and Aquarius — however, probably for both genders.

Best Regards,

Non-Danish Jim, whose given name is James, not to be confused with Danish Jim, whose given name is Jim

🙂

Reply
Henrik Liliendahl Sørensen 18th June 2010 / 15:05

Jezus, Jim, for a moment I thought my German CEO was commenting on a blog 🙂

Reply
Dario Bezzina 18th June 2010 / 18:58

A little late reply maybe but this is how SNL (Saturday Night Live) tackled this problem:
http://en.wikipedia.org/wiki/Pat_(Saturday_Night_Live)

🙂

Reply
Henrik Liliendahl Sørensen 18th June 2010 / 19:10

Thanks for sharing Dario.

The Wiki article says: The central aspect of sketches featuring Pat was the inability of others to determine the character’s sex.

This reminds me about a metadata pet peeve of mine. I am in no way opposed to “sex” but I don’t like when a column is labeled “Sex”. I think “Gender” is better for data modeling.

Reply
William Sharp 18th June 2010 / 20:44

Great post on one of the issues that plagues data quality tools at the moment. Each of the subcategories (nickname conversion, gender identification and salutation derviation) are all cases for one of your favorite tools/plugins … reference data! External reference data is the key to overcoming each of these variables!
Thanks for the reminder of how important external reference data is to the data quality / data matching realm!

Reply
Henrik Liliendahl Sørensen 18th June 2010 / 20:57

Thanks William. Yes, you need external reference data that is specific to each culture.

Reply
Crysta Anderson 18th June 2010 / 20:57

Great points all around. Have you worked with data from some of the Slavic countries, where surnames differ based on gender? I have a female Czech friend whose surname is Keleova; her father and brother use a surname of simply Kele. I believe Russian follows the same format. How does that affect DQ tools?

Reply
Henrik Liliendahl Sørensen 18th June 2010 / 21:07

Crysta, indeed, many challenges arises when dealing with global data. Some of these are explained in the product sheet about the Omikron WorldMatch tool, like in Russian:

Михаил Горбачёв = Michail Gorbatschow
Раиса Горбачёва = Raissa Gorbatschowa

Reply
Dario Bezzina 19th June 2010 / 09:43

I would like to add to Henrik’s response. There are some tools on the market that are country and culture “aware”. This is a great feature. I would like to see it for other domains than names and addresses too but I guess that will be hard to develop?

Reply
Postcode Anywhere (@pca_plus) 18th June 2012 / 14:08

Great article and great points raised both; in the content and the comments above.

Reply

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph

Liliendahl on Data Quality

A blog about Master Data Management, Product Information Management, Data Quality Management and more

What’s In a Given Name?

Related

10 thoughts on “What’s In a Given Name?”

Leave a comment Cancel reply