As a data matching geek I always love reading about how others have made the great but fearful journey into the data matching world.
This week Wayne Colless of the Australian Attorney-General’s Department kindly made a document about data matching public on the DataQualityPro site. The full title is “Improving the Integrity of Identity Data – Data Matching Better Practice Guidelines, 2009”. Link here.
As Wayne explains in a discussion in the LinkedIn Data Matching group: Australia has no national unique identifier for individuals (such as the US SSN or the number recorded on national ID cards used in many other countries) that can be used, so the matching has to involve only non-unique values such as name, address and dates of birth.
The document gives a very thorough step by step guidance into matching individual’s names, addresses and birthdays. As the document says you may either build all the logic yourself or you may buy commercial software that does the same. But anyway you have to understand what the software does in order to tune the processes and set the thresholds meaningful to you.
As Australia is a nation mainly born through immigration the challenges with adapting the ruling Anglo-Saxon naming conventions to the reality of name formats coming from all over the world is very apparent. I like that the diversity issues is given a good thought in the document.
I also like that the document addresses a subject not mentioned as often as it should be, namely the challenges with embracing historical values in settling a match as seen in this figure taken from the document:
Whether you think you already know the dos and don’ts in data matching (and I guess you never know that) I really find the document worth reading.
Thanks for spreading news of Wayne’s contribution. I think it is so important for government departments to open their research in this way and share their approaches, by fostering collaboration between public and private sector on a global level we can create some excellent conversations and new thinking around problems like data matching.
It all starts with great mentions like this so thanks again.
This seems simplistic but it is anything but obvious in a country with very diverse ancestry, e.g. Australia or the U.S.A. (and probably many others, actually). Even more of a problem, no matter the ethnic diversity or lack thereof of the population, is integration of legacy data sources. Data matching on as few as 4 fields, if not done correctly, can completely befoul a data repository.
Thank you Dylan and Ellie for the comments and kind support for my blog.
Thanks Henrik for mentioning the Guidelines. My colleagues and I put a lot of effort into their development. They were the product of cooperation from a number of Australian government data matching practitioners with many years total experience. Hopefully even experienced data matching people will find some gem of insight in there somewhere.
They were also the produced with the knowledge gained from cross-agency data matching exercises designed to produce quantifiable evidence upon which practical advice could be developed and shared.
We sincerely hope that the Data Matching Better Practice Guidelines prove useful to all who access them.