Cleansing International Addresses

A problem in data cleansing I have come across several times is when you have some name and address registrations where it is uncertain to which country the different addresses belong.

Many address-cleansing tools and services requires a country code as the first parameter in order to utilize external reference data for address cleansing and verification. Most business cases for address cleansing is indeed about a large number of business-to-consumer (B2C) addresses within a particular country. But sometimes you have a batch of typical business-to-business (B2B) addresses with no clear country registration.

The problem is that many location names applies to many different places. That is true within a given country – which was the main driver for having postal codes around. If a none-interactive tool or service have to look for a location all over the world that gets really difficult.

For example I’m in Richmond today. That could actually be a lot of places all over the world as seen on Wikipedia.

popeI am actually in the Richmond in the London, England, UK area. If I were in the state capital of the US state of Virginia, I could have written I’m in “Richmond, VA”. If an international address-cleansing tool looked at that address, I guess it would first look for a country code, quickly find VA as a two-character country code in the end of the string and firmly conclude I’m at something called Richmond in the Vatican City State.

Have you tried using or constructing an international address cleansing process? Where did you end up?

Bookmark and Share

Two Kinds of Business Rules within Data Governance

Yin and yangWhen laying out data policies and data standards within a data governance program one the most important input is the business rules that exist within your organization.

I have often found that it is useful to divide business rules into two different types:

  • External business rules, which are rules based on laws, regulations within industries and other rules imposed from outside your organization.
  • Internal business rules, which are rules made up within your organization in order to make you do business more competitive than colleagues in your industry do.

External imposed business rules are most often different from country to country (or group of countries like the EU). Internal business rules may be that too but tend to be rules that apply worldwide within an organization.

The scope of external business rules tend to be fairly fixed and so does the deadline for implementing the derived data policy and standard. With internal business rules you may minimize and maximize the scope and be flexible about the timetable for bringing them into force and formalizing the data governance around the rules. It is often a matter of prioritizing against other short term business objectives.

The distinctions between these two kinds of business rules may not be so important in the first implementation of a data governance program but comes very much into play in the ongoing management of data policies and data standards.

Bookmark and Share

Foreign Addresses

The New YorkerThere is a famous poster called The New Yorker. This poster perfectly illustrates the centricity we often have about the town, region or country we live in.

The same phenomenon is often seen in data management as told in the post Foreign Affaires.

If we for example work with postal addresses we tend to think that postal addresses in our own country has a well-known structure while foreign addresses is a total mess.

In Denmark where I am born and raised and has worked most of my life we have two ways of expressing an address:

  • The envelope way where there are a certain range of possibilities especially on how to spell a street name and how to write the exact unit within a high rise building, though there is a structure more or less known to native people.
  • The code way, as every street has a code too and there is a defined structure for units (known as the KVHX code). This code is used by the public sector as well as in private sectors as financial services and utility companies and this helps tremendously with data quality.

But around 3.5 percent of Danes, including yours truly, has a foreign address. And until now the way of registering and storing those addresses in the public sector and elsewhere has been totally random.

This is going to change. The public authorities has, with a little help from yours truly, made the first standard and governance principles for foreign addresses as seen in this document (in Danish).

At iDQ A/S we have simultaneously developed Master Data Management (MDM) services that helps utility companies, financial services and other industries in getting foreign addresses right as well.

Bookmark and Share

Parkinson, Murphy, Finagle and Data Quality

One of the cleverest things said ever is in my eyes Parkinson ’s Law that states: “Work expands so as to fill the time available for its completion”.

There is even a variant for data that says: “Data expands to fill the space available for storage”. This is why we have big data today.

Another similar law that seems to be true is Murphy’s Law saying: “Anything that can go wrong will go wrong”. The sharper version of that is Finagle’s Law that warns: “Anything that can go wrong, will—at the worst possible moment”.

Perfect StormWhen I started working with data quality the most common trigger for data quality improvement initiatives were after a perfect storm encompassing these laws like saying: “The quality of data will decrease until everything goes wrong at the worst possible moment”.

Fortunately more and more organizations are becoming proactive about data quality these days. In doing that I recommend reversing Finagle, Murphy and Parkinson by doing this:

Bookmark and Share

The Good, the Bad, and the Ugly Data Governance Role

More and more of my work within data quality and Master Data Management (MDM) is around data governance. One side of data governance is the organizational issues and the roles of people involved.

Some of the common roles are:Data Roles

Data Steward: This is a good role in my eyes and how you select and empower data stewards is in my experience often the difference between failure and success. Data stewards are in most cases already known in the organization as data champions and subject matter experts. A successful data governance program lays out the organizational structure for the of work data stewards and supply the means for the data stewards in the daily struggle for maintaining an optimal degree of data quality.

Data Owner: I don’t like the term data owner as told and discussed several years ago in the post Bad Word:? Data Owner. The existence of data owners is unfortunately why we need data governance. Data owners are heads of data silos. Especially when it comes to master data the problem is that data owners and data silos makes it difficult to look at data as an enterprise asset.

Chief Data Officer (CDO): This is a relatively new term but we have had the concept for many years earlier for example known as a data czar. We need such a person because data owners are bad for the idea of data being an enterprise asset. But how long will CDOs remain in office compared to data owners? Not long I’m afraid.

Bookmark and Share

Using External Data in Data Matching

One of the things that data quality tools does is data matching. Data matching is mostly related to the party master data domain. It is about comparing two or more data records that does not have exactly the same data but are describing the same real world entity.

Common approaches for that is to compare data records in internal master data repositories within your organization. However, there are great advantages in bringing in external reference data sources to support the data matching.

Some of the ways to do that I have worked with includes these kind of big reference data:

identityBusiness directories:

The business-to-business (B2B) world does not have privacy issues in the degree we see in the business-to-consumer (B2C) world. Therefore there are many business directories out there with a quite complete picture of which business entities exists in a given country and even in regions and the whole world.

A common approach is to first match your internal B2B records against a business directory and obtain a unique key for each business entity. The next step of matching business entities with that unique is a no brainer.

The problem is though that an automatic match between internal B2B records and a business directory most often does not yield a 100 % hit rate. Not even close as examined in the post 3 out of 10.

Address directories:

Address directories are mostly used in order to standardize postal address data, so that two addresses in internal master data that can be standardized to an address written in exactly the same way can be better matched.

A deeper use of address directories is to exploit related property data. The probability of two records with “John Smith” on the same address being a true positive match is much higher if the address is a single-family house opposite to a high-rise building, nursery home or university campus.

Relocation services:

A common cause of false negatives in data matching is that you have compared two records where one of the postal addresses is an old one.

Bringing in National Change of Address (NCOA) services for the countries in question will help a lot.

The optimal way of doing that (and utilizing business and address directories) is to make it a continuous element of Master Data Management (MDM) as explored in the post The Relocation Event.

Bookmark and Share

Where to put Master Data?

The core of most Master Data Management (MDM) solutions is a master data hub. MDM solutions as those appearing in analyst reports revolves around a store for master data that is a new different place than where master data usually are. That is for example being in CRM, SCM and ERP systems.

For large organizations with a complex IT landscape having a MDM hub is usually the only sensible solution.

However for many midsize and smaller organizations, and even large organizations with a dominant ERP system as well, the choice is often naming one of the application databases to be the main master data hub for a given master data domain as customer, supplier, product and what else is considered a master data entity.

In such cases you may apply things as data quality services as described in the post Lean MDM and other master data related services as told in post Service Oriented MDM.

scaleThere are arguments for and against both approaches. The probably most used argument against the MDM hub approach is that why you should solve the issue of having X data silos with creating data silo X + 1. The argument against naming a given application as the place of master data is that an application is built for a specific purpose and therefore is not good for other purposes of master data use.

Where do you put your master data? Why?

Bookmark and Share