The Letter Æ

This blog is written in English. Therefore the letters used are normally restricted to A to Z.

The English alphabet is one of many alphabets using Latin (or Roman) letters. Other alphabets like the Russian uses Cyrillic letters. Then there are other script systems in the world which besides alphabets are abjads, abugidas, syllabic scripts and symbol scripts. Learn more about these in the post Script Systems.

Æ, which in lower case is æ, was part of the old English alphabet. For example an old English king was called Æthelred the Unready.

The letter Æ is a combined AE and is pronounced in English as the first letter in Edmund and Edward.

Today Æ exists in a few alphabets: The Danish/Norwegian, the Faroese and the Icelandic. People and places from the corresponding Viking territories  may have the letter Æ/æ as part of the string. For example the home of Microsoft Dynamics AX and NAV is the town Vedbæk north of Copenhagen. When represented in the English alphabet the town name will be Vedbaek.

So Vedbæk and Vedbaek should be a 100% match when doing data matching. And so should Vedbæk and Vedb%C%A6k when systems are as bad as Æthelred the Unready was in handling the Vikings.

And oh, Æthelred wasn’t actually unready. He was unræd meaning bad-counseled.

Bookmark and Share

5 thoughts on “The Letter Æ

  1. Tirthankar Ghosh 16th November 2012 / 11:08

    Hi,
    This is interesting.
    In fact, not just Æ , but all the diacritical marks (such as é,è,ç, ê etc.) needs proper replacement during cleansing phase before matching takes place.

    Regards,
    Tirthankar Ghosh

  2. Henrik Liliendahl Sørensen 16th November 2012 / 11:37

    Thanks for commenting Thirtankar.

    I have worked with two different approaches to this.

    The first one is transliteration. Here you for example replace æ with ae and é with e before matching.

    The second one is embedding where the possible correspondence between æ and ae and é and e is taken into consideration within matching.

    The same goes for transcription (transforming from one script system to another script system; for example Arabic to Roman). An alternative to transcription before matching is embedding in matching. Hereby you avoid mismatch because of many possible transcriptions. For example the transcript from Arabic to Roman of the name of the former Libyan dictator could be Gaddafi, Gadhafi, Kadafi and many more outcomes.

    The same actually goes for handling nicknames. If you standardize Peggy to Margaret before matching you miss the match between Peggy and the typo Pegy.

  3. Tirthankar Ghosh 16th November 2012 / 13:52

    You are right. Transliteration will not cover everything though it may be a simple solution.

    I like the example of “Peggy” and “Pegy”

  4. Graham Rhind 18th November 2012 / 20:23

    Henrik, I’m not a linguist, so may be on dodgy ground here, but æ in English is not quite extinct, though it’s not in the alphabet, being a digraph/dipthong/ligature (make your choice!). Those of us of a certain age might still write encyclopædia, for example, though computer hardware has made this rather more troublesome than it used to be when we could still use pens!

  5. Henrik Liliendahl Sørensen 19th November 2012 / 11:39

    Thanks for adding in Graham. I also always thought that the new source of truth should be called wikipædia.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s