In my last blog post the term “single version of the truth” was discussed. Some prerequisites for having raw data stored in one version that meets all known purposes are that:
- They are kept with the granularity needed for all purposes
- They have the most advanced precisions with all purposes
- They reflect all time states asked for regarding all purposes
In the following I will go through some challenges with postal addresses. Don’t take this as an attempt to list all challenges in the world around this subject – it is only what I have been up to.
The country is the highest level in the address hierarchy. A source of truth may be a list of ISO 2 character country codes. But there are other lists and between these lists there a different perceptions of the fact that even countries are internally in hierarchies. Some examples related to the Olympic contest as my last blog post was part of are:
- York (the old one) is placed in England – or is it Great Britain – or is it United Kingdom?
- Referring to United States of America may or may not include Puerto Rico, US Virgin Islands, Guam, Samoa and Northern Mariana Islands.
- The Kingdom of Denmark is not Denmark but Denmark, Faroe Islands and Greenland.
An example of a very slow changing dimension in here is that US Virgin Islands was part of the Kingdom of Denmark until 1917.
I had a great deal of fun with country codes and names when setting up a data matching solution around the D&B WorldBase and the world picture kept in there opposite to what is contained in other data samples.
Some countries have states, some countries have provinces and some other countries don’t have states or provinces. In some countries the state is a mandatory part of a postal address like in the US. In other countries having states the state is not a part of a printed address like in Germany, but you may have other purposes for storing the data anyway.
Postal codes and districts
Often local postal code systems are translated to the term ZIP-code – but ZIP code is actually the name of the US system.
The granularity of postal code systems differs a lot around the world. The UK postal codes are very specific while a postal code in other countries may refer to a large city. In most countries the postal code system is a hierarchy of numbers. The UK system is different. The Irish is very different – no postal codes until now.
In many countries companies are assigned a postal code of their own. The same goes for post office box addresses. In France the name of the referring district is followed by the word CEDEX for these addresses. So, be careful when matching or grouping city names in French addresses. Paris not Cedex is the centre of the universe in that country.
Locations, streets, blocks, house names, whatever
A lot of different hierarchies in various levels exist around the world – and the custom sequence also varies. This is a too complex and comprehensive subject for a blog post. So I will only emphasis a few selected subjects:
- Vanity addressing is a phenonemen not at least in the UK where keeping up appearances rules. Here you may have to include a lie in the single version of truth.
- Coding rules in my home country Denmark as we have a way of assigning a unique code to every real world entity. It helps with automated taxation. So a main road in central Copenhagen may be known to people as “H.C. Andersens Boulevard” but is stored in any mature database as “1010148”.
- When matching party entities don’t make a false negative with an entity having a visit (geographical) address versus an entity having a mail address.
Entrance – most often referred to as house number – is where addressing meets geocoding. Here you by using geocodes can point to an exact value identifying an address. When comparing with other addresses you just have to make sure whether you are talking latitude/longitude in a round world or WGS84 x-y coordinates or other geographic coordinate systems in a flat world and whether we are pointing at the centre of the building, at the door, at the spot where a public road is reachable or it is interpolated values.
Larger buildings, high rising buildings and skyscrapers are usually not one address but is an entrance having multiple family apartments and/or multiple business addresses. These may be presented in many formats and in many depths including floors, sides, door numbers, you name it.
Large business entities may occupy a range of entrances.
Some entrances may in first impression look like a single address occupied by a nuclear family, but are in fact a nursing home or a campus occupied by a number of named individuals living on the same address.
The postal (geographical and mailing) address elements are in many data models just some of the attributes in a party entity. By separating the postal address elements in a specific entity with granulated attributes you will be more aligned with the real world and thereby have a better chance of fulfilling all purposes with the raw data. One of the most obvious advantages will be history tracking as business’ and consumers/citizens relocates from time to time.