The Letter Æ

15th November 201215th November 2012Henrik Gabs Liliendahl

This blog is written in English. Therefore the letters used are normally restricted to A to Z.

The English alphabet is one of many alphabets using Latin (or Roman) letters. Other alphabets like the Russian uses Cyrillic letters. Then there are other script systems in the world which besides alphabets are abjads, abugidas, syllabic scripts and symbol scripts. Learn more about these in the post Script Systems.

Æ, which in lower case is æ, was part of the old English alphabet. For example an old English king was called Æthelred the Unready.

The letter Æ is a combined AE and is pronounced in English as the first letter in Edmund and Edward.

Today Æ exists in a few alphabets: The Danish/Norwegian, the Faroese and the Icelandic. People and places from the corresponding Viking territories may have the letter Æ/æ as part of the string. For example the home of Microsoft Dynamics AX and NAV is the town Vedbæk north of Copenhagen. When represented in the English alphabet the town name will be Vedbaek.

So Vedbæk and Vedbaek should be a 100% match when doing data matching. And so should Vedbæk and Vedb%C%A6k when systems are as bad as Æthelred the Unready was in handling the Vikings.

And oh, Æthelred wasn’t actually unready. He was unræd meaning bad-counseled.

Tirthankar Ghosh 16th November 2012 / 11:08

Hi,
This is interesting.
In fact, not just Æ , but all the diacritical marks (such as é,è,ç, ê etc.) needs proper replacement during cleansing phase before matching takes place.

Regards,
Tirthankar Ghosh

Reply
Henrik Liliendahl Sørensen 16th November 2012 / 11:37

Thanks for commenting Thirtankar.

I have worked with two different approaches to this.

The first one is transliteration. Here you for example replace æ with ae and é with e before matching.

The second one is embedding where the possible correspondence between æ and ae and é and e is taken into consideration within matching.

The same goes for transcription (transforming from one script system to another script system; for example Arabic to Roman). An alternative to transcription before matching is embedding in matching. Hereby you avoid mismatch because of many possible transcriptions. For example the transcript from Arabic to Roman of the name of the former Libyan dictator could be Gaddafi, Gadhafi, Kadafi and many more outcomes.

The same actually goes for handling nicknames. If you standardize Peggy to Margaret before matching you miss the match between Peggy and the typo Pegy.

Reply
Tirthankar Ghosh 16th November 2012 / 13:52

You are right. Transliteration will not cover everything though it may be a simple solution.

I like the example of “Peggy” and “Pegy”

Reply
Graham Rhind 18th November 2012 / 20:23

Henrik, I’m not a linguist, so may be on dodgy ground here, but æ in English is not quite extinct, though it’s not in the alphabet, being a digraph/dipthong/ligature (make your choice!). Those of us of a certain age might still write encyclopædia, for example, though computer hardware has made this rather more troublesome than it used to be when we could still use pens!

Reply
Henrik Liliendahl Sørensen 19th November 2012 / 11:39

Thanks for adding in Graham. I also always thought that the new source of truth should be called wikipædia.

Reply

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph

Liliendahl on Data Quality

A blog about Master Data Management, Product Information Management, Data Quality Management and more

The Letter Æ

Related

5 thoughts on “The Letter Æ”

Leave a comment Cancel reply