I remember some years ago when I started SMS’ing I had an old mobile phone that defaulted the text in upper case. After I while my son answered back: “Why are you always yelling at me in SMSes”.
So I learned that you can use lower case in SMSes as well, and only using all caps in SMSes, as in any other writing, usually means that YOU ARE YELLING.
Examining a text for upper case use can, together with polarity classifiers and all that jazz, be used today in sentiment analysis for example within social media data.
Within data parsing using words in upper case in person names may tell you something too. Especially in France it is common to indicate a surname with only upper case characters, so for example in the name “AUGUST Michel” the first name is the surname and the last name is the given name.
When matching company names a word in upper case may indicate an abbreviation. So “THE Ltd” and “The Happy Entrepreneur Ltd” may be a good match despite of a horrible edit distance.
In data migration within handling names from older systems where all caps have been used, it is common to try to make better looking names. “JOHN SMITH” will be “John Smith” and “SAM MCCLOUD” should be “Sam McCloud”. In environments with other alphabets than English national characters may be reintroduced as well. For example in a German context “JURGEN VON LOW” may come out as “Jürgen von Löw”.
What about you? Have you stumbled upon some fun with upper case in data management?
Casing can be “fun” when you encounter two byte forenames.
Mrs JO Brown (Janet Olivia) vs Mrs Jo Brown (Joanne?)
Don’t get me started on Ng….
The one that occurs to me straight away is MS SOCIETY being interpreted as a female “Ms Society” but there are many similar pitfalls…
I don’t know if it is typically Belgian, but in translating all caps to mixed case names, some people are very sensitive to the way it is written. Writing VAN LAERE as ‘Van Laere’ will not be appreciated by some people (nobility) as their name is often written as ‘van Laere’. The little ‘v’ can be oh so sensitive.
Yes – in Holland and Belgium we’d usually case “van” in lower case, as in “Piet van der Valk”. The equivalent in French is lower case as well (“de”) but I believe that “La” and “Le” are capitalized so you could have “Pierre de La Haye”, I guess. Does anyone have a definite answer? Most Scots care about the capital letter folllowing Mac e.g. MacDonald but I once worked with a Mr Macfarlane with a lower case “f” 🙂
As a general rule The Netherlands (Holland? Where that?) have lower case prepositions (van de) and Belgium upper case (Van De) because it’s part of the name and not a preposition in Belgium, as illustrated by Van De being under V in the telephone book in Belgium but under the letter that follows in The Netherlands.
However, if you use the Dutch name without a given name but with a form of address, the V is upper case (Mr Van den Broek).
The trick is the write it as the owner of that name wants – which is why I counsel against processing names in any way and collecting that data correctly at source!
End of lesson 🙂
Oh, and … I suspect “de la” and not “de La”, but your example includes the name of a city (La Haye – The Hague) – which may be why it is cased in that way.
A fount of useless knowledge, me!
or even den Haag or ‘s Gravenhage to carry on casing… I never knew about the differences in Dutch and Belgian casing and indexing!
I completely agree about trying to collect data correctly at source, but sometimes we have to “start from here” and use either our best guess or try and adopt a rule which is least likely to offend.
One rule unlikely to offend is to leave the name alone. Sometimes easier to explain all caps than to explain why the name has been wrecked. But I’ve never found out how many people are offended, and how many people just accept that some people can’t spell their surname (I certainly fall into that category fairly often).
Thanks Andrew, Steve, Lynn, Graham and Oliver for adding in.