Where the Streets have Two Names

As told in post The Art in Data Matching a common challenge in matching names and addresses is that in some parts of the world the streets have more than one name at the same time because more than one language is in use.

We have the same challenge when building functionality for rapid addressing, being functionality that facilitates fast and quality assured entry of addresses supported by reference data that knows about postal codes / cities and street names.

The below example is taken from the instant Data Quality tool address form:

Finish Swedish

The Finnish capital Helsinki also has an official name in Swedish being Helsingfors and the streets in Helsinki/Helsingfors have both Finnish and Swedish names. So when you start typing a letter suggestions could be in both Finnish and Swedish.

What challenges have you encountered with street names in multiple languages?

Bookmark and Share

Rapid Addressing, Structured or Unstructured Approach

Systems supporting faster and more accurate registration of addresses are becoming more and more common along with that they are becoming better and better.

I have noticed a structured and an unstructured approach to rapid addressing – and hybrids of course.

Structured Approach

The general concept is that you target in on the address like this:

  • First you choose a country from a country list (unless it’s always the same country).
  • Then you select a state or province if a state or province is a mandatory part of an address in that country like it is in the United States, Canada, Australia and India
  • Then you type a postal code if the country has a postal code system. It may be suggested as you write.
  • Then you type a street if the country has thoroughfare based addressing. It may be suggested as you write. For some countries, like the United Kingdom, or part of a country the street is unique by the postal code.
  • Then you type a building number. May be suggested if present in reference data.
  • Then you type a unit or other section of building where applicable. May be suggested if present in reference data.

Rapid AddressingUnstructured Approach

You type in the sequence in a single string as it suites you and the system figures out along the way what matches and makes suggestions.

This approach may better fit the way the address is known to you, but does on the other hand sometimes require you to start again and thereby the rapidness disappears a bit.

Hybrid Approach

A common hybrid solution as that you select the country before going unstructured. That cures the worst system glitches.

What’s Your Approach?

What are your experiences as a user? Maybe you are developing rapid addressing and have had your considerations. Where do you stand?

Bookmark and Share

Names, Addresses and National Identification Numbers

When working with customer, or rather party, master data management and related data quality improvement and prevention for traditional offline and some online purposes, you will most often deal with names, addresses and national identification numbers.

While this may be tough enough for domestic data, doing this for international data is a daunting task.

Names

In reality there should be no difference between dealing with domestic data and international data when it comes to names, as people in today’s globalized world move between countries and bring their names with them.

Traditionally the emphasize on data quality related to names has been on dealing with the most frequent issues be that heaps of nick names in the United States and other places, having a “van” in bulks of names in the Netherlands or having loads of surname like middle names in Denmark.

With company names there are some differences to be considered like the inclusion of legal forms in company names as told in the post Legal Forms from Hell.

UPU S42Addresses

Address formats varies between countries. That’s one thing.

The availability of public sources for address reference data varies too. These variations are related to for example:

  • Coverage: Is every part of the country included?
  • Depth: Is it street level, house number level or unit level?
  • Costs: Are reference data expensive or free of charge?

As told in the post Postal Code Musings the postal code system in a given country may be the key (or not) to how to deal with addresses and related data quality.

National Identification Numbers

The post called Business Entity Identifiers includes how countries have different implementations of either all-purpose national identification numbers or single-purpose national identification numbers for companies.

The same way there are different administrative practices for individuals, for example:

  • As I understand it is forbidden by constitution down under to have all-purpose identification numbers for individuals.
  • The United States Social Security Number (SSN) is often mentioned in articles about party data management. It’s an example of a single-purpose number in fact used for several purposes.
  • In Scandinavian countries all-purpose national identification numbers are in place as explained in the post Citizen ID within seconds.

Dealing with diversity

Managing party master data in the light of the above mentioned differences around the world isn’t simple. You need comprehensive data governance policies and business rules, you need elaborate data models and you need a quite well equipped toolbox regarding data quality prevention and exploiting external reference data.

Bookmark and Share

The Dangers of being a Global Shopper

The global shopper is a multi-channel beast.

A global shopper may be a tourist or a business traveler buying goods in exciting cities around the world in shops most probable operated by the very same brands that occupies his local high street. The global shopper may also do his business from his living room by shopping online on sites with strange foreign privacy rules and unusual registration forms.

Oxford_StreetBeing a global shopper is risky business.

For example it’s unbelievable why Oxford Street in London hasn’t been made into a pedestrian street long time ago like any other respectable high street in major cities. But no, global shoppers on Oxford Street are constantly in danger of being hit by a red double-decker bus when crossing the street for a good bargain while looking to the right wrong side.

And how about shoe sizes? Measuring systems and standards around the world is a jungle and as a global shopper you will in 8 ½ out of 10 trials pick the wrong number 42.

Going online isn’t any better.

When registering your home address on a foreign site you are on very slippery ground.

If the site is from the United States, and you are not, you have to choose living in one of 50 different states meaning nothing to you. But there is no way around. My favorite state then is Alaska usually being on the top of the list.

Having a postal code with letters in it can be a no go. Not having a postal code is much like not existing at all.

But don’t give up. As a global shopper you will be able to find sites online with absolutely no clue about what an address looks like. Only thing of course will be the question about if you actually will get your goods or have to settle with the credit card withdrawal only.

Bookmark and Share

The Real Estate Domain

In the comments on the recent blog post about multidomain MDM (Master Data Management) it was discussed in what degree multidomain MDM is much more than CDI (Customer Data Integration) and PIM (Product Information Management).

While customer (or rather party) and product are important master entity types, there are of course a lot of other master entity types. The location domain is often mentioned as the third domain in MDM, and then there are some entity types most relevant for specific industries like an insurance policy or a vehicle in public transit, and in public transit we also have the calendar as an important master entity type.

Real estateOne of the entity types that doesn’t belong to party and in many ways is a different thing than a product is real estate (or real property or just property if you like).

For a realtor a real estate looks like a product of course. And it’s all about location, location, location.

Right now I’m working with the instant Data Quality framework. Here we are embracing the party domain by having access to external reference sources about individuals and companies, we are embracing the location domain by having access to external reference sources about addresses and then we are also embracing the real estate domain by having access to external reference sources about properties.

Real properties have addresses in many cases and are therefore close to the location domain. For some business processes it is a product with a product key like mentioned for realtors. For some business processes it is a security often identified by other keys than the postal address. It is related to different party roles like an occupier (or several) and an owner (or several) that may or may not be the same party (or parties).

What about you. Do you feel at home with the real estate entity type?

Bookmark and Share

Postal Code Musings

When working with master data management and data quality including data matching one of the most frequent pieces of information you work with is a postal code.

Postal codesWikipedia has a good article about postal code.

Some of the data quality issues related to the datum postal code are:

Metadata

Over the world different words are used for a postal code:

  • ZIP code, the United States implementation of a postal code, is often used synonymously for a postal code in many databases and user interfaces. This is not seriously wrong, but not right either.
  • In India a postal code (in English) is called a PIN Code (Postal Index Number). This could definitely trick me.

Format

There are basically two different formats of postal codes around:

  • Numeric postal codes are the most common ones. The number of digits does however differ between countries. And there may be some additional considerations:
    •  For example the 9 digit United States ZIP code is split into the original 5 digits and the additional 4 digits implemented later.
    • Postal codes may begin with 0 which may create formatting errors when treated as numeric.
  • Some countries, for example the United Kingdom, the Netherlands, Canada and Argentina, have alphanumeric postal codes.

Embedded Information

Numeric postal codes usually forms some kind of hierarchy in which you can guess the geographical position within the country and make ranges representing smaller or larger geographical areas. But you never know.

This also goes for Dutch (you know, the ones in the Netherlands) postal codes as the first 4 characters are numeric.

The UK postal codes usually start with a mnemonic of the main city in the area, except in a lot of cases.

Precision

Some postal code systems have postal codes covering larger areas with many streets and some postal code systems are very granular where each street, or part of a street, has a distinct postal code.

The UK postal code system is very granular which have paved the way for using rapid addressing as told in a recent article on the UK Database Marketing Magazine.

Coverage

Utilizing rapid addressing requires that reference data for postal codes practically covers every spot in the country and updates are available on a near real time basis.

Some countries have postal code systems not covering every corner and some countries haven’t a postal code system at all.

Uniqueness

The main reason for implementing postal code systems is that a town or city name in many cases isn’t unique within a country.

But that doesn’t mean that uniqueness works the other way as well. A postal code may in many countries cover several town names. France is an example.

Consistency

While we basically have granular and not so granular postal code systems we of course also have hybrids.

In Denmark for example there is a granular system in the capital Copenhagen with a postal code for each street, named by the street, and a system in the rest of country with a postal code for an area named by the suburban or town.

Fit for purpose

A postal code is a hierarchical element in a postal address. We basically have two forms of postal addresses:

  • A geographical address where the postal address including the postal code points to place you also can visit and meet the people receiving the things sent to there
  • A post-office box which may have more or less geographical connection to where the people receiving the things sent to there are

Penetration of post-office boxes differs around the world. In Namibia it is mandatory. In Sweden most companies have a post-office box address.

Trying to compare data with these different concepts is like comparing apples and oranges, which often goes bananas.

Bookmark and Share

Multi-Domain MDM, Santa Style

How would a Multi-Domain Master Data Management (MDM) solution look like at Santa Claus’s organization?

julemandenI think it may look like this:

Santa’s MDM solution covers all 4 classic domains:

  • Party
  • Product
  • Location
  • Calendar

Party

A main business improvement achieved through Santa’s MDM solution is better Nice or Naughty management. The old CRM system didn’t have a dedicated field for Nice or Naughty assignment, so this information was found in many different fields used during the years including as part of a street address or as a “send Christmas card” check mark. Today Santa handles Nice and Naughty information including historical tracking as a kid may be Nice one year but Naughty the next. This also helps with predictive analysis for future present demand. Ho ho ho.

Party master data management at Santa’s also includes keeping track of all the business partners as manufacturers of toys and other stuff, the shopping malls where Santa has to sit in December and so on. A given legal entity may have different roles in different business processes. For example a reindeer insurance company may also require Santa’s presence at the company’s Christmas tree family party.

Product

Product Information Management (PIM) has always been a complex operation at Santa’s. In Wish List Fulfillment (Wishful) you may have kids wishing for the same thing with different wording. The new MDM solutions flexible hierarchy management features helps a lot when the wishes are matched with specifications obtained by the purchase elves. At Santa’s they increasingly work with the suppliers in sharing complete and timely product descriptions and specifications.

Location

Handling location information relates to different locations where Santa is supposed to live be that at the North Pole, in Greenland, in Lapland or any other believes as discussed in the post Notes about the North Pole.

Also related to knowing where to deliver all the presents Santa has realized that maintaining an address as part of the record for each boy and girl isn’t the best way. Today each boy and girl record has a relation with a start and end date to a location entity where location specific information, including precise chimney positions, are kept.

Calendar

Christmas present delivery timing is crucial for Santa. In some countries Christmas morning the 25th December is the right time for the stuff to be there. In other countries Christmas evening the 24th December is the right time. Add to that doing present delivery across all time zones. Ho ho ho.

The MDM implementation at Santa’s has indeed helped a lot with Santa Quality. But it is an ongoing journey.

Right now Santa is looking for a smart Information management firm to help with defining to what time zone the North Pole belongs.

Anyone out there?

Bookmark and Share

Some Kinds of Reference Data

The term ”reference data” and related Reference Data Management (RDM) is used commonly in the data quality and Master Data Management (MDM) realm.

As with most terms it may be used with slightly different meanings. Usually, but not necessarily always, reference data are core data entities defined outside a given organization.

I have come across the below discussed kinds of reference data:

Reference Data in Investment Banking

The term “reference data” is well established in investment banking. Reference data are core master data entities as counterparties, securities and currencies. These are the things you deal with in investment banking. They are not made up for a given bank or other single financial institution but are shared across the whole market and should optimally be the same to every institution at exactly the same point of time.

RDMSmall Reference Data

In Master Data Management in general we usually see reference data as value lists helping describing and standardizing internal master data.

One example will be a country list. A list of countries should be the same for every organization in the world. However available lists does differ though most variations usually don’t have any business impact as the academic question about if Antarctica should be in the list or not.

A list of codes describing to which industry a given company belongs is another example of reference data. As examined in the post What are they doing? you may choose to standardize on SIC codes or standardise on NACE codes or develop your own set of codes for that purpose.

Big Reference Data

In geography a country list is in the top levels of defining locations. Further deep we may have postal code systems within each country as ZIP codes in the United States, PLZ codes in Germany and PIN codes in India. Yet further deep we have every single valid postal address eventually all over the world. This is what I call big reference data.

A way of sourcing industry codes for your customers, suppliers and other business partners will be picking from or enriching from a business directory like for example the D&B WorldBase or any other of the many business directories around. Such directories may also be seen as big reference data.

The dramatic increase in the use of social media and related social network profiles has emerged as a new kind of big reference data serving as links to our internal master data.

Bookmark and Share

Addressing Digital Identity

A physical address has traditionally been a core element of doing identity resolution. Stating a name and an address is the most widespread way of telling with which person or which company we are (aiming at) having a business and other form of relationship.

However, during the last 25 years a lot of things have moved from the physical world to the online world. Not at least a lot of things start in the online world while in many cases ends up in the physical world. Today selling, the smart way, starts in social media. Final delivery may be digital or may be sending a package or a consultant to a physical address. A thing like dating most often starts in the online world today but surely the aim is a physical encounter.

This new way of life has a tremendous affect on data quality and master data management. Within quality of contact data, the most frequent domain for data quality issues, we have traditionally dealt with verifying names and addresses and deduplicating names and addresses.

As the best way of preventing data quality issues is looking at the root we must address that onboarding of contact data often starts with a digital identity where a physical address isn’t present in the first place but often will be updated at a later stage.

As described in the post Social MDM and Systems of Engagement a new trend in master data management is to establish a link between the new systems of engagement and the old systems of record.

In the same way data quality prevention and improvement will have to cover establishing a link between a new discipline being digital identity resolution and the good old address verification stuff.

Bookmark and Share

Rapid and Vanity Addressing – and the Apple Hotel

Mid next month iDQ will move our London office to a new address:

iDQ A/S
2nd Floor
Berkeley Square House
Berkeley Square
London
W1J 6BD
United Kingdom

It’s a good old English address including a lot of lines on an envelope.

The address could be either shorter or longer.

The address below will in fact be enough to have a letter delivered:

iDQ A/S
2nd Floor
W1J 6BD
UK

Due to the granular UK postal code system a single post code may either be a single address a part of a long road or a small street.

This structure is also what is exploited in what is called rapid addressing, where you only type in the need data and the rest is supplied by a (typically cloud) service.

But sometimes people want their addresses presented in a different way than the official way. Maybe I want our address to be:

iDQ A/S
2nd Floor
Berkeley Square House
Berkeley Square
Mayfair
London
W1J 6BD
United Kingdom

Mayfair is a nice part of London. Insisting in including this element in the address is an example of vanity addressing.

Here’s the map of the area:

Notice the place in the upper right corner of the Google Map: Apple Store Regent Street. With an icon with a bed. This means it’s a hotel. Is the Apple Store really a hotel? No – except for some while ago when people slept in front of the store waiting for a product with a notable map service as reported by Richard Northwood (aka The Data Geek) in the post Data Quality Failure – Apple Style.

Well Google, you can’t win them all.

Bookmark and Share