Diversity – Page 10 – Liliendahl on Data Quality

Happy Days

23rd December 201023rd December 2010Henrik Gabs Liliendahl4 Comments

Whether you are celebrating Christmas or not, whether you say Merry Christmas, Feliz Navidad, Frohe Weihnachten, Joyeux Noël, God Jul or plenty of other greetings from around the world: May these days be a wonderful time for you and yours and thanks for reading this blog.

Matching Down Under

17th December 2010Henrik Gabs Liliendahl4 Comments

As a data matching geek I always love reading about how others have made the great but fearful journey into the data matching world.

This week Wayne Colless of the Australian Attorney-General’s Department kindly made a document about data matching public on the DataQualityPro site. The full title is “Improving the Integrity of Identity Data – Data Matching Better Practice Guidelines, 2009”. Link here.

As Wayne explains in a discussion in the LinkedIn Data Matching group: Australia has no national unique identifier for individuals (such as the US SSN or the number recorded on national ID cards used in many other countries) that can be used, so the matching has to involve only non-unique values such as name, address and dates of birth.

The document gives a very thorough step by step guidance into matching individual’s names, addresses and birthdays. As the document says you may either build all the logic yourself or you may buy commercial software that does the same. But anyway you have to understand what the software does in order to tune the processes and set the thresholds meaningful to you.

As Australia is a nation mainly born through immigration the challenges with adapting the ruling Anglo-Saxon naming conventions to the reality of name formats coming from all over the world is very apparent. I like that the diversity issues is given a good thought in the document.

I also like that the document addresses a subject not mentioned as often as it should be, namely the challenges with embracing historical values in settling a match as seen in this figure taken from the document:

Whether you think you already know the dos and don’ts in data matching (and I guess you never know that) I really find the document worth reading.

Hell in Norway

27th November 201027th November 2010Henrik Gabs LiliendahlLeave a comment

Looking for inappropriate words in customer data is always a risky business. Most times there is always a legitimate name or a place somewhere with that word.

Like if you see a city name called “Hell”.

Outside the English speaking parts of the world you will find “Hell” in Norway. It’s a village with its own postal code (NO-7517) situated in the Trondheim metropolitan area. Not at least at this time of year with winter on the Northern hemisphere it is surely considerable colder than the religious “Hell”.

But even in the English speaking world you will find a semi legitimate “Hell” in Michigan, United States.

Despite Best Intentions

26th November 201026th November 2010Henrik Gabs LiliendahlLeave a comment

Sometimes you have the best intentions in improving things as data quality and a lot of other things, but somewhere you failed seeing the big picture and it is too late to correct.

From the sports world this apparently happened to the Singapore water polo team at the current Asian Games.

They have new designed speedos honoring the nation’s flag.

But now some ministry tells them, that the swimsuit is inappropriate. But you can’t change outfit during the games.

By the way: I also work at a company with this logo:

Fortunately we haven’t got company speedos.

Legal Forms from Hell

27th October 20109th October 2014Henrik Gabs Liliendahl2 Comments

When doing data matching with company names a basic challenge is that a proper company name in most cultures in most cases have two elements:

The actual company name
The legal form

Some worldwide examples:

Informatica Corporation
Talend SA
SAP Deutschland AG & Co. KG
Sony Kabushiki Kaisha
LEGO A/S

There are hundreds of different legal forms in full and abbreviated forms. Wikipedia has a list here (here called types of business entity).

However, when typing in company names in databases the legal form is often omitted. And even where legal forms are present they may be represented differently in full or abbreviated forms, with varying spelling and punctuation and so on. As the actual company names also suffer from this fuzziness, the complexity is overwhelming.

A common way of handling this issue in data matching is to separate the legal form and then emphasize on comparing the remaining part being the actual company name. When doing that it has to be done country specific or else you may remove the entire name of a company like with a name of an Italian company called Société Anonyme, which is a French legal form.

While the practice of having legal forms in company names may serve well for the original purpose of knowing the risk of doing business with that entity, it is certainly not serving the purpose of having the uniqueness data quality dimension solved.

One should think that it is time for changing the bad (legal demanded) practice of mixing legal forms with company names and serve the original purpose in another more data quality friendly way.

Magic Quadrant Diversity

12th October 201022nd July 2011Henrik Gabs Liliendahl2 Comments

The Magic Quadrants from Gartner Inc. ranks the tool vendors within a lot of different IT disciplines. Related to my work the quadrants for data quality tools and master data management is the most interesting ones.

However, the quadrants examine the vendors in a global scope. But, how are the vendors doing in my country?

I tried to look up a few of the vendors in a local business directory for Denmark provided (free to use on the web) by the local Experian branch.

DataFlux

First up is DataFlux, the (according to Gartner) leading data quality tool vendor.

Result: No hits.

Knowing that DataFlux is owned by SAS Institute will however, with a bit of patience, finally bring you to information about the DataFlux product deep down on the SAS local website.

PS: Though SAS is more known here as the main airline (Scandinavian Airlines System), SAS Institute is actually very successful in Denmark having a much larger part of the Business Intelligence market here than most places else.

Informatica

Next up is Informatica, a well positioned company in both the quadrant for data quality tools and customer master data management.

Result: No Hits.

Here you have to know that Informatica is represented in the Nordic area by a company called Affecto. You will find information about the Informatica products deep down on the Affecto website – along with the competing product FirstLogic owned by Business Objects (owned by SAP) also historically represented by Affecto.

Stibo Systems

Stibo Systems may not be as well known as the two above, but is tailing the mega vendors in the quadrant for Product Master Data Management, as mentioned recently in a blog post by Dan Power.

Result: Hit:

They are here with over 500 employees – at least in the legal entity called Stibo where Stibo Systems is an alternate name and brand. And it’s no kidding; I visited them last month at the impressive head quarter near Århus (the second largest city in Denmark).

Follow Friday Diversity

1st October 20107th September 2011Henrik Gabs Liliendahl8 Comments

Every Friday on Twitter people are recommending other tweeps to follow using the #FollowFriday (or simply #FF) hashtag.

So do I.

Below please find my follow Friday recommendations grouped by global region:

Canada: @carrni @datamartist @sheezaredhead @andrewsinfotech @aniagl @DQamateur @bivcons @projmgr @DQStudent @datachick; United States: @GarnieBolling @stevesarsfield @UtopiaInc @bbreidenbach @fionamacd @RobertsPaige @BIMarcom @IDResolution @FirstSanFranMDM @dan_power @merv @NISSSAMSI @jilldyche @howarddresner @GartnerTedF @RobPaller @marc_hurst @dcervo @datamentors @VishAgashe @IBMInitiate @RamonChen @JackieMRoberts @philsimon @Nick_Giuliano @DataInfoCom @juliebhunt @Futureratti @dqchronicle @jonrcrowell @elc @Experian_QAS @paulboal @im4infomgt @WinstonChen @ocdqblog @KeithMesser @murnane @BrendaSomich @alanmstein @JGoldfed @jaimefitzgerald @tedlouie @bslarkin

Venezuela: @pigbar

Ireland: @daraghobrien @KenOConnorData @MapMyBusiness: United KIngdom: @SteveTuck @VeeMediaFactory @mktginsightguy @Daryl70 @Teresacottam @AnishRaivadera @ExperianQAS_UK @DataQualityPro @SarahBurnett @faropress @jschwa1 @mikeferguson1 @jtonline @Master_OBASHI @Nicola_Askham; France: @DataChannel @mydatanews @jmichel_franco @ydemontcheuil;Switzerland: @alexej_freund @openmethodology; Austria: @omathurin; Germany: @stiebke @dwhp @dakoller @marketingBOERSE; Belgium: @guypardon; Netherlands: @harri00413 @GrahamRhind; Denmark: @jeric40 @eobjects @StiboSystems;Norway @Orvei; Sweeden: @MrPerOlsson @DarioBezzina; Finland: @JoukoSalonen; Lithuania: @googlea; Italy: @Stray__Cat

Algeria: @aboussaidi; South Africa: @MarkGStacey

Pakistan: @monisiqbal; India: @MDMAnswers @twitrvenky @ashwinmaslekar; Indonesia: @VaiaTweets

Australia: @emx5 @vmcburney;New Zeeland: @JohnIMM @Intelligentform

It’s my hope, that I in the future will be able to interact even more diverse.

Military Intelligence

2nd September 20102nd September 2010Henrik Gabs LiliendahlLeave a comment

Many data quality issues may be prevented by having some intelligent (error tolerant) search going on. I wrote a post about it called Upstream prevention by error tolerant search.

Intelligent search may have a lot of other advantages too.

A scam related to the Danish Military has been going on for a while. The short story is:

A member of the Special Forces wrote a book about combat actions in Afghanistan. The Military tried to stop it, because it could help the enemy. In that process they by some reason made an Arabic translation and by some mistake leaked that to the press. The key person at the military around doing that has the surname “Sønderskov”.

Police “experts” were assigned to find the leak. For a month they unsuccessful searched for an e-mail address including “Sønderskov” only to realize: Oh, e-mail addresses can’t have the national character “ø”. It must either be “oe” or “o” instead as “Soenderskov” or “Sonderskov”.

The story (in Danish) here from the online computer media Version2.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph