In my last blog post the term “single version of the truth” was discussed. Some prerequisites for having raw data stored in one version that meets all known purposes are that:
- They are kept with the granularity needed for all purposes
- They have the most advanced precisions with all purposes
- They reflect all time states asked for regarding all purposes
In the following I will go through some challenges with postal addresses. Don’t take this as an attempt to list all challenges in the world around this subject – it is only what I have been up to.
The country is the highest level in the address hierarchy. A source of truth may be a list of ISO 2 character country codes. But there are other lists and between these lists there a different perceptions of the fact that even countries are internally in hierarchies. Some examples related to the Olympic contest as my last blog post was part of are:
- York (the old one) is placed in England – or is it Great Britain – or is it United Kingdom?
- Referring to United States of America may or may not include Puerto Rico, US Virgin Islands, Guam, Samoa and Northern Mariana Islands.
- The Kingdom of Denmark is not Denmark but Denmark, Faroe Islands and Greenland.
An example of a very slow changing dimension in here is that US Virgin Islands was part of the Kingdom of Denmark until 1917.
I had a great deal of fun with country codes and names when setting up a data matching solution around the D&B WorldBase and the world picture kept in there opposite to what is contained in other data samples.
Some countries have states, some countries have provinces and some other countries don’t have states or provinces. In some countries the state is a mandatory part of a postal address like in the US. In other countries having states the state is not a part of a printed address like in Germany, but you may have other purposes for storing the data anyway.
Postal codes and districts
Often local postal code systems are translated to the term ZIP-code – but ZIP code is actually the name of the US system.
The granularity of postal code systems differs a lot around the world. The UK postal codes are very specific while a postal code in other countries may refer to a large city. In most countries the postal code system is a hierarchy of numbers. The UK system is different. The Irish is very different – no postal codes until now.
In many countries companies are assigned a postal code of their own. The same goes for post office box addresses. In France the name of the referring district is followed by the word CEDEX for these addresses. So, be careful when matching or grouping city names in French addresses. Paris not Cedex is the centre of the universe in that country.
Locations, streets, blocks, house names, whatever
A lot of different hierarchies in various levels exist around the world – and the custom sequence also varies. This is a too complex and comprehensive subject for a blog post. So I will only emphasis a few selected subjects:
- Vanity addressing is a phenonemen not at least in the UK where keeping up appearances rules. Here you may have to include a lie in the single version of truth.
- Coding rules in my home country Denmark as we have a way of assigning a unique code to every real world entity. It helps with automated taxation. So a main road in central Copenhagen may be known to people as “H.C. Andersens Boulevard” but is stored in any mature database as “1010148”.
- When matching party entities don’t make a false negative with an entity having a visit (geographical) address versus an entity having a mail address.
Entrance – most often referred to as house number – is where addressing meets geocoding. Here you by using geocodes can point to an exact value identifying an address. When comparing with other addresses you just have to make sure whether you are talking latitude/longitude in a round world or WGS84 x-y coordinates or other geographic coordinate systems in a flat world and whether we are pointing at the centre of the building, at the door, at the spot where a public road is reachable or it is interpolated values.
Larger buildings, high rising buildings and skyscrapers are usually not one address but is an entrance having multiple family apartments and/or multiple business addresses. These may be presented in many formats and in many depths including floors, sides, door numbers, you name it.
Large business entities may occupy a range of entrances.
Some entrances may in first impression look like a single address occupied by a nuclear family, but are in fact a nursing home or a campus occupied by a number of named individuals living on the same address.
The postal (geographical and mailing) address elements are in many data models just some of the attributes in a party entity. By separating the postal address elements in a specific entity with granulated attributes you will be more aligned with the real world and thereby have a better chance of fulfilling all purposes with the raw data. One of the most obvious advantages will be history tracking as business’ and consumers/citizens relocates from time to time.
Good analogies. I admit to having a lot of issues trying to understand this need to create a single version of truth, when data which is accurate, relevant, complete and up-to-date (or, rather, of the correct time) can be used to create any version of the truth which may be required for any purpose.
If your address data is complete, then yes, whilst York is indeed in the nation of England on the island of Great Britain and in the country of The United Kingdom of Great Britain and Northern Ireland, you don’t need to fix that information – you can apply the required level depending on your need. It’s dead easy to identify an address as being in Puerto Rico if your data has quality, so use that information as required … or not. The “single version of the truth” issue strikes me too often as being yet another attempt to sell a “solution” to companies which doesn’t tackle the root cause of their problems: data quality.
Anyway …. ISO 2 country codes are certinly not a version of the truth on the planet I inhabit. Still no code for Kosovo, for example (highlighting ISO’s essentially political brief), and for many years there were no codes for Jersey, Guernsey and the Isle of Man, all of which are NOT part of the United Kingdom and require a code for addressing purposes. I have to maintain my own coding system, and advise companies to do the same.
Added complications can be an organisation with multiple offices, but only one central postal address. Alternatively, the post code may refer to the local postal sorting office and not the office/factory itself (which causes problems when trying to get to meetings using SatNav).
Thanks Graham and Julian.
Graham, what prompted me about UK was that I noticed that British athletes are competing in the Olympics under the name Great Britain and the letters GB. GB is also an alternative ISO country code for UK. In football (world sport no 1 known to some as soccer) British players forms the separate national teams England, Wales, Scotland or Northern Ireland. I think the same goes for Rugby. In the D&B WorldBase there are also separate country codes for England, Wales, Scotland and Northern Ireland, but international enterprise party master data tables I have seen use either UK or GB or other coding or free text. My point is that even at the highest hierarchy level of reference data we have identification issues.
Julian, precisely, visiting the wrong office is a real life experience facilitated by data that is not basically bad – but not structured well enough for that purpose. Been there, done that, bought the fool T-shirt.
Yep, your point was well taken – as a British man myself I have to struggle with the issue on a daily basis (I have to correct every person who calls me English – a tiring affair). And there was, understandably, a lot of protest from athletes and supporters from Northern Ireland when the team that was fielded in the Beijing Olympics was named “Team GB”.
But my point remains the same. If you know that an address is in York, you can use that data to provide the information required according to purpose – be that Yorkshire, North Yorkshire, Northern England, England, Great Britain, United Kingdom, Europe, European Union, Western Hemisphere, Northern Hemisphere …. is just doesn’t matter. There’s no need to store just the one tag/label/category. Accurate and complete data (at, as you say, the right granularity) lets you use that data for any information purpose.
We’re actually probably agreeing with each other ….
And then there’s potential address mapping. I have had the USPS.com website map a post office box to an actual street address.
Thanks Anthony. Do I read this correct as the USPS has a service that will convert a post office box to a geographical address of the box owner?
Hi Henrik. I was using USPS.com to prepare a shipping label and entered a post office box. I guess I wasn’t paying much attention because it wasn’t until I was walking to the post office that I noticed the address had been changed to an actual street address. BTW, it was a state tax authority- I cannot recall which one. -AJG
Kudos for your observations. May I offer a few more insights to yours, as someone who has pondered the data modeling implications of the complex geographic world.
Containment: some geographic areas are contained in hierarchies, a state is contained in a country, a county is contained in a state. A city is contained in a state but not in a county (Aurora, CO split betweeen Adams and Arapahoe County. States are usually a single bounded, albeit amorphous shape except, Michigan (Upper Peninsula and Hawaiian Islands). In US postal codes are bound within a state but not necessarilly a county or city. Houses within the boundaries of a zipcode are net necessarily in the city limits of the city that zipcode serves (Niwot, CO). Zipcodes have a proximety to a single city they serve but not necessarily bound by that city. Area codes used to refer to a geography, but now with electronic switching, cell phones and telephone number portability, that references is increasingly tentative. A building can only be in one county, but it can be in multiple area codes (720,303 in Denver) thanks to overlay. But onl certain states have adopted area code overlay. So in other states area codes do not overlap. Bring in congressional districts, tax jurisdicitons, voting precints and things get really dicey from a data modeling standpoint. And they can change with the stroke of legislative or regulatory pen.
Thanks a lot Bob. I have been tooling around in the same circles when modeling detailed real world geographical entities in Denmark.
It’s only very few organizations that might get ROI from building a detailed model of the world in in-house databases. But when such models are available in the cloud my guess is that many organizations will find ROI in attaching own master data to such complex models.
@bob: “A building can only be in one county …”. Really? Is that because of planning regulations in the USA?
In Europe, with our somewhat stormier history, country borders can run through buildings, never mind other administrative boundaries. In Baarle-Hertog (http://en.wikipedia.org/wiki/Baarle-Hertog) / Baerle-Nassau, for example, women in houses on the border have been known to have asked to be moved to a different room in their house when giving birth so that their child is born with a different nationality, and I have an interesting photograph of a house where the border runs through the front door, so it has two building numbers and two street addresses, depending on the country of the address being used.
I’m Chuck Frey, author of the Mind Mapping Software Blog, the leading source of news, trends and best practices in the world of visual mapping. Much has changed since the last edition of this book was published in 2007. In writing this 3rd edition, I looked at the existing content with a critical eye, updating those tips that needed it, eliminating topics that aren’t as relevant today and – most importantly – populated it with a wealth of new, practical and actionable information that is designed with today’s programs, applications and resources in mind.
OK Tresa, Chuck, whatever, and the link go to a site with roadmaps.