Omni-purpose Data Quality

QualityA recent post on this blog was called Omni-purpose MDM. Herein it is discussed in what degree MDM solutions should cover all business cases where Master Data Management plays a part.

Master Data Management (MDM) is very much about data quality. A recurring question in the data quality realm is about if data quality should be seen as in what degree data are fit for the purpose of use or if the degree of real world alignment is a better measurement.

The other day Jim Harris published a blog post called Data Quality has a Rotating Frame of Reference. In a comment Jim takes up the example of having a valid address in your database records and how measuring address validity may make no sense for measuring how data quality supports a certain business objective.

My experience is that if you look at each business objective at a time measuring data quality against the purpose of use is sound of course. However, if you have several different business objectives using the same data you will usually discover that aligning with the real world fulfills all the needs. This is explained further within the concept of Data Quality 3.0.

Using the example of a valid address measurements, and actual data quality prevention, typically work with degrees of validity as notably:

  • The validity in different levels as area, entrance and specific unit as examined in the post A Universal Challenge.
  • The validity of related data elements as an address may be valid but the addressee is not as examined in the post Beyond Address Validation.

Data quality needs for a specific business objective also changes over time. As a valid address may be irrelevant for invoicing if either the mail carrier gets it there anyway or we invoice electronically, having a valid address and addressee suddenly becomes fit for the purpose of use if the invoice is not paid and we have to chase the debt.

Bookmark and Share

Sharing Big Location Reference Data

In the post Location Data Quality for MDM the different ways of handling location master data within many companies was examined.

A typical “as is” picture could be this:

Location1

Location data are handled for different purposes using different kinds of systems. Customer data may be data quality checked by using address validation tools and services, which also serves as prerequisite for better utilization of these data in a Geographical Information System (GIS) and in using internal customer master data in marketing research for example by utilizing demographic classifications for current and prospective customers.

Often additional external location data are used for enrichment and for supplementing internal master data downstream in these specialized systems. It may very well be that the external location reference data used at different points does not agree in terms of precision, timeliness, conformity and other data quality dimensions.

A desired “to be” picture could be this:

Location2

In this set-up everything that can be shared across different purposes are kept as common (big) reference data and/or are accessible within a data-as-a-service environment maintained by third party data providers.

Bookmark and Share

Location Data Quality for MDM

The location domain is after the customer, or rather party, domain and the product domain the most frequent addressed domain for Master Data Management (MDM).

In my recent work I have seen a growing interest in handling location data as part of a MDM program.

Traditionally location data in many organizations have been handled in two main ways:

  • As a part of other domains typically as address attributes for customer and other party entities
  • As a silo for special business processes that involves spatial data using Geographic Information Systems (GIS) as for example in engineering and demographic market research.

Handling location data most often involves using external reference data as location data doesn’t have the same privacy considering as party data, not at least data describing natural personals, tend to have and opposite to product data location data are pretty much the same to everyone.

MDM for the location domain is very much about bringing the two above mentioned ways of working with locations together while consistently exploiting external reference data.

As in all MDM work data quality is the important factor and the usual data quality dimensions are indeed in place here as well. Some challenges are:

  • Uniqueness and precision: Locations comes in hierarchies. As told in the post The Postal Address Hierarchy we when referring to textual addresses have levels as country, region, city or district, thoroughfare (street) or block, building number and unit within a building. Uniqueness may be defined within one of these levels. A discussed in the post Where is the Spot? the precision and use case for coordinates may cause uniqueness issues too.
  • locationTimeliness and accuracy: Though it doesn’t happen too often locations do change names as reported in the post MDM in LED and features on new locations does show up every day. I remember a recent press coverage in the United Kingdom over people who couldn’t get car and other insurances because the address of their newly build house wasn’t in the database at the insurance company.
  • Completeness and conformity: Availability of all “points of interest” in reference data is an issue. The available of all attributes of interest at the desired level is an issue too. The available formats and possible mappings between them is a usual challenge. Addresses in both local and standardized alphabets and script systems using endonyms and exonyms is a problem as told in the posts Where the Streets have Two Names and Where the Streets have one Name but Two Spellings.

Bookmark and Share

When High Quality Data doesn’t Yield High Quality Service

Better data quality is a prerequisite of better quality of service but unfortunately high quality data doesn’t necessarily lead to high quality service when the data flow is broken. This happened to me last night.

ubicabs2When landing in London Heathrow Airport I usually, economically as I am, use the train to reach my doorstep. However, when I have to catch an early morning flight I order a cab, which actually has a very reasonable price. So yesterday I decided to book a cab in order to cut 30 to 40 minutes of the journey home on the expense of a minor amount of extra pounds.

Excellent data capture

Usually I just call the cab, but as I arrived by airplane and my local cab service is part of an online booking service, I used that service for the first time. The user interface is excellent. There is rapid addressing for entering the pick-up place which quickly presented me the possible terminals at Heathrow. The destination was just a smooth. As the pick-up is an airport they prompted me for the flight number. Very nice as that makes tracking delays possible for them and also you can check that the airline and terminal is a correct match.

Also they have an app that I geekly downloaded to my phablet.

Going down

Landing times at Heathrow are difficult to predict as it often happens that your flight has a couple of circles over London before landing due to heavy traffic. Yesterday was good though as we came directly down and therefore were ahead of schedule.

ubicabsSo it was OK that my name wasn’t at the signs held by drivers already waiting at the passenger exit. Actually I was so early that I could have reached the not so frequent direct train home. But as I now already had troubled the driver to go there I of course waited while spending time on the app.

There actually also was a driver tracking on the app. Marvelous. At first glance it seemed the driver was there. But then I noticed a message saying driver tracking wasn’t available and therefore the spot in the terminal 3 building would be my own position or requested pick-up place.

Going crazy

5 minutes after requested time the driver called:

“Where are you Mr. Sorensen?”

“I’m at the passenger exit where all drivers are waiting.”

“OK. I’m just parking the car. Go to the front of the coffee shop and I’ll be there in a few minutes.”

I spotted a coffee shop in front of the lifts to the short stay parking and went over there.

10 minutes later the driver called:

“Where are you Mr. Sorensen?”

“I am in front of the coffee shop”

“Costa Coffee?”

“No. It has a different name…”. After some ping-pong I mentioned terminal 3.

“Terminal 3?” the driver responded. “I’m at terminal 5. I was told to go here. I’ll be with you in 5 minutes”.

Going by car in 5 minutes I wondered. That would indicate crossing the runways or using the train tunnel.

Well, while spending more happy time on the phablet the clock approached the point where I would be at my doorstep using the slow train.

40 minutes after requested time the driver arrived. I was waiting for the mandatory sorry that Brits use even when they are not sorry at all.

Instead the driver greeted me with: “Did you order the cab yourself Mr. Sorensen?”

“Yes I did. On the internet.”

“Internet?” the driver replied.

“Your company has an excellent online booking system” I friendly remarked.

“When I called you first I asked for confirmation about where you were”.

As I realized that he was trying to establish that everything was my fault I presented the confirmation on the app.

ubicabs3We continued (without the usual smalltalk) to the destination. Here the driver (instead of a discount) presented an upgraded version of the price on the booking confirmation.

At that point it was too difficult to keep calm and carry on…..

Bookmark and Share

Anachronism and Data Quality

The term anachronism is used for something misplaced in time. An example is classical paintings where a biblical event is shown with people in clothes from the time when the painting was done.

anachronismIn data quality lingo such a flaw will be categorized as lack of timeliness.

The most frequent example of lack of timeliness, or should we say example of anachronism, in data management today is having an old postal address attached to a party master data entity. A remedy for avoiding this kind of anachronism is explained in the post The Relocation Event.

In a recent blog post called 3-2-1 Start Measuring Data Quality by Janani Dumbleton of Experian QAS the timeliness dimension in data quality is examined along with five other important dimensions of data quality. As said herein an impact of anachronism could be:

“Not being aware of a change in address could result in confidential information being delivered to the wrong recipient. “

Hope you got it.

Bookmark and Share

Everyday Year 2000 Problems

14 years ago this was busy times for computer professionals, including yours truly, because of the upcoming year 2000 apocalypse. The handling of the problem indeed had elements of hysteria, but all in all it was a joint effort by heaps of IT people in meeting a non-postponable deadline around fixing date fields that were too short.

everyday y2k problemsData entry and data storage fields that are too short, have an inadequate format or are missing are frequent data quality issues. Some everyday issues are:

Too short name fields

Names can be very long. But even a moderate lengthy name as Henrik Liliendahl Sørensen can be a problem here and there. Not at least typing your name on Twitter, where the 20 characters name field corresponds very well to the 140 character message length, forces many of us to shorten our name. I found a remedy here from a fellow Sørensen on a work around in the post Getting around the real name length limit in Twitter. Not sure if I’m prepared to take the risk.

Too short and restricted postal code fields

When working with IT solutions in Denmark you see a lot of postal code fields defined as 4 digits. Works fine with Danish addresses but is a real show stopper when you deal with neighboring Swedish and German 5 digit postal codes and not at least postal codes with letters from the Netherlands and the United Kingdom and most other postal codes from around the world.

Missing placeholder for social identities

The rise of social media has been incredible during the last years. However IT systems are lacking behind in support for this. Most systems haven’t a place where you can fill in a social handle. Recently James Taylor wrote the blog post Getting a handle on social MDM. Herein James describes a work around in a IBM MDM solution. Indeed we need ways to link the old systems of records with the new systems of engagement.

Bookmark and Share

The Relocation Event

relocationWhen maintaining party master data one of the challenges is to have the data about the physical address, and sometimes the physical addresses, of a registered party up to date.

You may learn about that your customer, supplier, employee or whatever party you are keeping on record has moved in many ways. Most common are:

  • The person or organization in question is so kind to tell you so. For some purposes for example in the utility sector this event is a future event that triggers a whole workflow of actions.
  • You get the message via a subscription to external reference data for example using available National Change of Address (NCOA) services and services related to business directories and citizen registries.
  • Your mail to a person or organization is returned from postal services often with no information about the new address, so this means investigation work ahead.

Capability to handle this important issue in party master data management (MDM) embracing all the above mentioned scenarios is essential for many enterprises and doing it on an international scale with the different sources and services available in different countries is indeed a daunting task.

Handling the relocation event is a core functionality in the master data service (iDQ™ MDM Edition) I’m currently working with. There’s lot to do in this quest, so I better move on.

Bookmark and Share

The Postal Address Hierarchy

Using postal addresses is a core element in many data quality improvement and master data management (MDM) activities.

HierarchyAs touched many times on this blog postal addresses are formatted very differently around the world. However they may all be arranged in a sort of hierarchy, where there are up to 6 general levels being:

  • Country
  • Region
  • City or district
  • Thoroughfare (street) or block
  • Building number
  • Unit within building

In addition to that the postal code (postcode or zip code) is part of many address formats. Seen in the hierarchical light the postal code is a tricky concept as it may identify a city, district, thoroughfare, a single building or even a given unit within or section of a building. The latter is true for my company address in the United Kingdom, where we have a very granular postcode system.

Country

As discussed in the post The Country List even the top level of a postal address hierarchy isn’t a simple list fit for every purpose. Some issues are:

  • There are different sources with different perceptions of which are the countries on this planet
  • What we regard as countries comes in hierarchies
  • Several coding systems are available

Region

The region is an element in some address formats like the states in the United States and the provinces in Canada, while other countries like Germany that is divided into quite independent Länder do not have the region as a required part of the postal address. The same goes for Swiss cantons.

City or district

I once read that if you used the label city in a web form in Australia, you would get a lot of values like: “I do not live in a city”.

Anyway this level is often (but as mentioned certainly not always) where the postal code is applied. The postal code district may be a single town with surroundings, several villages or a district within a big city.

Thoroughfare (street) or block

Most countries use thoroughfares as streets, roads, lanes, avenues, mews, boulevards and whatever they are called around. Beware that the same street may have several spellings and even several names.

Japan is a counterexample of the use of thoroughfares, as here it’s the blocks between the thoroughfares that are part of the postal address.

Building number

Usually this element will be an integer. However formats with a letter behind the integer (example: 21 A) or a range of integers (example: 21-23) are most annoying. And then this British classic: One Main Grove. OMG.

Unit within a building

This element may or may not be present in a postal address depending on if the building is a single family house or company site, the postal delivery sees it as such or you may actually indicate where within the building the delivery goes or you go. The ups and downs of this level are examined in the post A Universal Challenge.

Bookmark and Share

A Universal Challenge

Yesterday on The Postcode Anywhere blog Guy Mucklow wrote a nice piece called University Challenge. The blog post is about challenges with shared addresses and a remedy at least for addresses in the United Kingdom.

And sure, I also had my challenges with a shared address in the UK as reported in the post Multi-Occupancy.

But I guess the University Challenge is a universal challenge.

The postal formats and available reference data sources are of course very different around. Below is an example from the iDQ™ (instant Data Quality) tool when handling a Danish address with multiple flats. Here the tool continuously display what options is available to make the address unique:

iDQ(tm) multi occupancy

Bookmark and Share

Where the Streets have one Name but Two Spellings

Last week’s post called Where The Streets have Two Names caught a lot of comments both on this blog and in LinkedIn groups as here on Data Quality Professionals and on The Data Quality Association, with a lot of examples from around the world on how this challenge actually exist more or less everywhere.

Recently I had the pleasure of experiencing a variant of the challenge when driving around in a rented car in the Saint Petersburg area in Russia. Here the streets usually only have one name but that may be presented in two different alphabets being the local Cyrillic or the Latin alphabet I’m used to which also was included in the reference data on the Sat Nav. So while it was nice for me to type destinations in Latin letters it was nice to have directions in Cyrillic in order to follow the progress on road signs.

So here standardization (or standardisation) to one preferred language, alphabet or script system isn’t the best solution. Best of breed solutions for handling addresses must be able to handle several right spellings for the same address.

Nevsky_Prospekt,_St_Petersburg,_street_sign
Street sign in Cyrillic with Latin subtitle

Bookmark and Share