This blog is written in English. Therefore the letters used are normally restricted to A to Z.
The English alphabet is one of many alphabets using Latin (or Roman) letters. Other alphabets like the Russian uses Cyrillic letters. Then there are other script systems in the world which besides alphabets are abjads, abugidas, syllabic scripts and symbol scripts. Learn more about these in the post Script Systems.
My last surname is “Sørensen”. This word contains the lower case letter “ø” which is “Ø” in upper case. This letter is part of two alphabets: The Danish/Norwegian and the Faroese. Sometimes data has to be transformed into the English alphabet. Then the letter “ø” may be transformed to either “o” or “oe”. So my last surname will be either “Sorensen” or “Soerensen” in the English alphabet.
The town part of my address is “København”. The word “København” is what we call an endonym, which is the local word for place or a person. The opposite of endonym is exonym. The English exonym for “København” is “Copenhagen” which of course only has letters from the English alphabet. The Swedish exonym for “København” is “Köpenhamn”. Here we have a variant of “Ø” being “Ö”. The letter “Ö” exists in a lot of alphabets as Swedish, German, Hungarian and Turkish.
Usually “Ø” is transformed to “Ö” between Danish/Norwegian and these alphabets. The other way we usually accept the letter “Ö” in Danish/Norwegian master data.
These issues are of course a problem area in data quality, data matching and master data management. And with the complexity only between alphabets using Latin characters there is of course much more land to cover when including Cyrillic and Greek letters and then the other scripts systems with their hierarchical elements.