Avoiding Contact Data Entry Flaws

Contact data is the data domain most often mentioned when talking about data quality. Names and addresses and other identification data are constantly spelled wrong, or just different, by the employees responsible of entering party master data.

Cleansing data long time after it has been captured is a common way of dealing with this huge problem. However, preventing typos, wrong hearings and multi-cultural misunderstandings at data entry is a much better option wherever applicable.

I have worked with two different approaches to ensure the best data quality for contact data entered by employees. These approaches are:

  • Correction and
  • Assistance

Correction

With correction the data entry clerk, sales representative, customer service professional or whoever is entering the data will enter the name, address and other data into a form.

After submitting the form, or in some cases leaving each field on the form, the application will check the content against business rules and available reference data and return a warning or error message and perhaps a correction to the entered data.

As duplicated data is a very common data quality issue in contact data, a frequent example of such a prompt is a warning about that a similar contact record already exists in the system.

Assistance

With assistance we try to minimize the needed number of key strokes and interactively help with searching in available reference data.

For example when entering address data assistance based data entry will start with the highest geographical level:

  • If we are dealing with international data the country will set the context and know about if a state or province is needed.
  • Where postal codes (like ZIP) exists, this is the fast path to the city.
  • In some countries the postal code only covers one street (thoroughfare), so that’s settled by the postal code. In other situations we will usually have a limited number of streets that can be picked from a list or settled with the first characters.

(I guess many people know this approach from navigation devices for cars.)

When the valid address is known you may catch companies from business directories being on that address and, depending on the country in question, you may know citizens living there from phone directories and other sources and of course the internal party master data, thus avoiding entering what is already known about names and other data.

When catching business entities a search for a name in a business directory often leads to being able to pick a range of identification data and other valuable data and not at least a reference key to future data updates.

Lately I have worked intensively with an assistance based cloud service for business processes embracing contact data entry. We have some great testimonials about the advantages of such an approach here: instant Data Quality Testimonials.

Bookmark and Share

Big Reference Data as a Service

This morning I read an article called The Rise of Big Data Apps and the Fall of SaaS by Raj De Datta on TechCrunch.

I think the first part of the title is right while the second part is misleading. Software as a Service (SaaS) will be a big part of Big Data Apps (BDA).

The article also includes a description of LinkedIn merely as a social recruitment service. While recruiters, as reported in the post Indulgent Moderator or Ruthless Terminator?, certainly are visible on this social network, LinkedIn is much more than that.

Among other things LinkedIn is a source of what I call big reference data as examined in the post Social MDM and Systems of Engagement.

Besides social network profiles big reference data also includes big directory services, being services with large amount of data about addresses, business entities and citizens/consumers as told in the post The Big ABC of Reference Data.

Right now I’m working with a Software as a Service solution embracing Big (Reference) Data as a Service thus being a Big Data App called instant Data Quality.

And hey, I have made a pin about that:

Bookmark and Share

255 Reasons for Data Quality Diversity

255 is one source of truth about how many countries we have on this planet. Even with this modest list of reference data there are several sources of the truth. Another list may have 262 entries and a third list 240 entries.

As I have made a blog post some years ago called 55 reasons to improve data quality I think 255 fits nice in the title of this post.

The 55 reasons to improve data quality in the former post revolves around name and address uniqueness. In the quest for having uniqueness, and fulfilling other data quality dimensions as completeness and timeliness, a have often advocated for using deep (or big) reference data sources as address directories, business directories and consumer/citizen directories.

Doing so in the best of breed way involves dealing with a huge number of reference data sources. Services claimed to have worldwide coverage often falls a bit short compared to local services using local reference sources.

For example when I lived in Denmark, at tiny place in one corner of the world, I was often amazed how address correction services from abroad only had (sometimes outdated) street level coverage, while local reference data sources provides building number and even suite level validation.

Another example was discussed in the post The Art in Data Matching where the multi-lingual capacities needed to do well in Belgium was stressed in the comments.

Every country has its own special requirement for getting name and address data quality right, the data quality dimensions for reference data are different and governments has found 255 (or so) different solutions to balancing privacy and administrative effectiveness.

Right now I’m working on internationalization and internationalisation of a data and software service called instant Data Quality. This service makes big reference data from all over the world available in a single mashup. For that we need at least 255 partners.

Bookmark and Share

Finding Me

Many people have many names and addresses. So have I.

A search for me within Danish reference sources in the iDQ tool gives the following result:

Green T is positive in the Danish Telephone Books. Red C is negative in the Danish Citizen hub. Green C is positive in the Danish Citizen Hub.

Even though I have left Denmark I’m still registered with some phone subscriptions there. And my phone company hasn’t fully achieved single customer view yet, as I’m registered there with two slightly different middle (sur)names.

Following me to the United Kingdom I’m registered here with more different names.

It’s not that I’m attempting some kind of fraud, but as my surname contains The Letter Ø, and that letter isn’t part of the English alphabet, my National Insurance Number (kind of similar to the Social Security Number in the US) is registered by the name “Henrik Liliendahl Sorensen”.

But as the United Kingdom hasn’t a single citizen view, I am separately registered at the National Health Service with the name “Henrik Sorensen”. This is due to a sloppy realtor, who omitted my middle (sur)name on a flat rental contract. That name was taken further by British Gas onto my electricity bill. That document is (surprisingly for me) my most important identity paper in the UK, and it was used as proof of address when registering for health service.

How about you, do you also have several identities?

Bookmark and Share

MDM Summit Europe 2012 Preview

I am looking forward to be at the Master Data Management Summit Europe 2012 next week in London. The conference runs in parallel with the Data Governance Conference Europe 2012.

Data Governance

As I am living within a short walking distance of the venue I won’t have so much time thinking as Jill Dyché had when she recently was on a conference within driving distance, as reported on her blog post After Gartner MDM in which Jill considers MDM and takes the road less traveled. In London Jill will be delivering a key note called: Data Governance, What Your CEO Needs to know.

On the Data Governance tracks there will be a panel discussion called Data Governance in a Regulatory Environment with some good folks: Nicola Askham, Dylan Jones, Ken O’Connor and Gwen Thomas.

Nicola is currently writing an excellent blog post series on the Six Characteristics Of A Successful Data Governance Practitioner. Dylan is the founder of DataQualityPro. Ken was the star on the OCDQblog radio show today discussing Solvency II and Data Quality.

Gwen, being the founder of The Data Governance Institute, is chairing the Data Governance Conference while Aaron Zornes, the founder of The MDM Institute, is chairing the MDM Summit.

Master Data, Social MDM and Reference Data Management

The MDM Institute lately had an “MDM Alert”  with Master Data Management & Data Governance Strategic Planning Assumptions for 2012-13 with the subtitle: Pervasive & Pandemic MDM is in Your Future.

Some of the predictions are about reference data and Social MDM.

Social master data management has been a favorite subject of mine the last couple of years, and I hope to catch up with fellow MDM practitioners and learning how far this has come outside my circles.

Reference Data is a term often used either instead of Master Data or as related to Master Data. Reference data is those data defined and initially maintained outside a single enterprise. Examples from the customer master data realm are a country list, a list of states in a given country or postal code tables for countries around the world.

The trend as I see it is that enterprises seek to benefit from having reference data in more depth than those often modest populated lists mentioned above. In the customer master data realm such big reference data may be core data about:

  • Addresses being every single valid address typically within a given country.
  • Business entities being every single business entity occupying an address in a given country.
  • Consumers (or Citizens) being every single person living on an address in a given country.

There is often no single source of truth for such data.

As I’m working with an international launch of a product called instant Data Quality (iDQ™) I look forward to explore how MDM analysts and practitioners are seeing this field developing.

Bookmark and Share

Bat-and-ball Data Quality

Lately Jim Harris of the OCDQblog has written two excellent blog posts, or may I say home runs, discussing data quality with inspiration from baseball.

In the post Quality Starts and Data Quality Jim talks about that you may have a tough loss in business despite stellar data quality and have a cheap win in business despite of horrible data quality, but in the long run by starting off with good data quality, your organization have a better chance to succeed.

The follow up post called Pitching Perfect Data Quality Jim ponders that business success is achievable without perfect data quality, but data quality has a role to play.

Now, despite that baseball is a very popular sport in the United States, but largely unknown in the rest of world, I think we all understand the metaphors.

Also we have different but similar sports, with other rules, statistics and terms attached, over the world. The common name for these sports is bat-and-ball games.

In Britain, where I live now, cricket is huge and can be used to attract awareness of data issues. As late as yesterday the Ordnance Survey, a government body that have registries with addresses, coordinates and maps, made a blog post called Anyone for cricket? British blogger Peter Thomas also wrote among others a post on cricket and data quality called Wager.

Before coming to Britain I lived in Denmark, where we don’t know baseball, don’t know cricket but sometimes at family picnics, perhaps after a Carlsberg and a snaps or two, plays a similar game called rundbold, with kids and grandpa friendly rules and score board and usually using a tennis ball.

Data quality, not at least data quality in relation to party master data, which is the most prominent domain within the discipline, is also a same same but different game around the world as told in the post Partnerships for the Cloud.

Understanding the rules, statistics and terms of baseball, cricket, rundbold and all the other bat-and-ball games of the world is a daunting task, even though we all know how to hit a ball with a bat.

Bookmark and Share

Big Reference Data Musings

The term “big data” is huge these days. As Steve Sarsfield suggest in a blog post yesterday called Big Data Hype is an Opportunity for Data Management Pros, well, let’s ride on the wave (or is it tsunami?).

The definition of “big data” is as with many buzzwords not crystal clear as examined in a post called It’s time for a new definition of big data on Mike2.0 by Robert Hillard. The post suggests that big may be about volume, but is actually more about big complexity.

As I have worked intensively with large amounts of rich reference data, I have a homemade term called “big reference data”.

Big Reference Data Sets

Reference Data is a term often used either instead of Master Data or as related to Master Data. Reference data is those data defined and (initially) maintained outside a single organization. Examples from the party master data realm are a country list, a list of states in a given country or postal code tables for countries around the world.

The trend is that organizations seek to benefit from having reference data in more depth than those often modest populated lists mentioned above.

An example of a big reference data set is the Dun & Bradstreet WorldBase. This reference data set holds around 300 different attributes describing over 200 million business entities from all over world.

This data set is at first glance well structured with a single (flat) data model for all countries. However, when you work with it you learn that the actual data is very different depending on the different original sources for each country. For example addresses from some countries are standardized, while this isn’t the case for other countries. Completeness and other data quality dimensions vary a lot too.

Another example of a large reference data set is the United Kingdom electoral roll that is mentioned in the post Inaccurately Accurate. As told in the post there are fit for purpose data quality issues. The data set is pretty big, not at least if you span several years, as there is a distinct roll for every year.

Big Reference Data Mashup

Complexity, and opportunity, also arises when you relate several big reference data sets.

Lately DataQualityPro had an interview called What is AddressBase® and how will it improve address data quality? Here Paul Malyon of Experian QAS explains about a new combined address reference source for the United Kingdom.

Now, let’s mash up the AddressBase, the WorldBase and the Electoral Rolls – and all the likes.

Image called Castle in the Sky found on photobotos.

Bookmark and Share

Know Your Foreign Customer

I’m not saying that Customer Master Data Management is easy. But if we compare the capabilities within most companies with handling domestic customer records they are often stellar compared to the capabilities of handling foreign customer records.

It’s not that the knowledge, services and tools doesn’t exist. If you for example are headquartered in the USA, you will typically use best practice and services available there for domestic records. If you are headquartered in France, you will use best practice and services available there for domestic records. Using the best practices and services for foreign (seen from where you are) records is more seldom and if done, it is often done outside enterprise wide data management.

This situation can’t, and will not, continue to exist. With globalization running at full speed and more and more enterprise wide data management programs being launched, we will need best practices and services embracing worldwide customer records.

Also new regulatory compliance will add to this trend. Being effective next year the US Foreign Account Tax Compliance Act (FATCA) will urge both US Companies and Foreign Financial Institutions to better know your foreign customers and other business partners.

In doing that, you have to know about addresses, business directories and consumer/citizen hubs for an often large range of countries as described in the post The Big ABC of Reference Data.

It may seem a daunting task for each enterprise to be able to embrace big reference data for all the countries where you have customers and other business partners.

My guess, well, actually plan, is, that there will be services, based in the cloud, helping with that as indicated in the post Partnerships for the Cloud.

Bookmark and Share

The Big ABC of Reference Data

Reference Data is a term often used either instead of Master Data or as related to Master Data. Reference data is those data defined and (initially) maintained outside a single organisation. Examples from the party master data realm are a country list, a list of states in a given country or postal code tables for countries around the world.

The trend is that organisations seek to benefit from having reference data in more depth than those often modest populated lists mentioned above.

In the party master data realm such reference data may be core data about:

  • Addresses being every single valid address typically within a given country.
  • Business entities being every single business entity occupying an address in a given country.
  • Consumers (or Citizens) being every single person living on an address in a given country.

There is often no single source of truth for such data. Some of the challenges I have met for each type of data are:

Addresses

The depth (or precision if you like) of an address is a common problem. If the depth of address data is at the level of building numbers on streets (thoroughfares) or blocks, you have issues as described in the blog post called Multi-Occupancy.

Address reference data of course have issues with the common data quality dimensions as:

  • Timeliness, because for example new addresses will exist in the real world but not yet in a given address directory.
  • Accuracy, as you are always amazed when comparing two official sources which should have the same elements, but haven’t.

Business Entities

Business directories have been accessible for many years and are often used when handling business-to-business (B2B) customer master data and supplier master data management. Some hurdles in doing this are:

  • Uniqueness, as your view of what a given business entity is occasionally don’t match the view in the business directory as discussed in the post 3 out of 10
  • Conformity, because for example an apparently simple exercise as assigning an industry vertical can be a complex matter as mentioned in the post What are they doing?

Consumers (or Citizens)

In business-to-consumer (B2C) or other activities involving citizens a huge challenge is identifying the individuals living on this planet as pondered in the post Create Table Homo Sapiens. Some troubles are:

  • Consistency isn’t easy, as governments around the world have found 240 (or so) different solutions to balancing privacy concerns and administrative effectiveness.
  • Completeness, as the rules and traditions not only between countries, but also within different industries, certain activities and various channels, are different.

Big Reference Data as a Service

Even though I have emphasized on some data quality dimensions for each type of data, all dimensions apply to all types of data.

For organisations operating multinational and/or multichannel exploiting the wealth and diversity of external reference data is a daunting task.

This is why I see reference data as a service embracing many sources as a good opportunity for getting data quality right the first time. There is more on this subject in the post Reference Data at Work in the Cloud.

Bookmark and Share

Multi-Occupancy

The fact that many people doesn’t live in a single family house but live in a flat sharing the same building number on a street with people living in other flats in the same building is a common challenge in data quality and data matching.

The same challenge also applies to companies sharing the same building number with other companies and not to say when companies and households are in the same building. So this is a common party master data issue.

Address verification and geocoding is seen as important methods for achieving data quality improvement related to the top data quality pain all over being quality of party master data and aiming at getting a single customer view.

Multi-occupancy is a pain in the (you know) getting there.

My pain

I have had some personal experiences living at multi-occupancy addresses lately.

One and a half years ago I was living a painless life in single family house in a Copenhagen suburb.

Then I moved closer to downtown Copenhagen in a flat as mentioned in post Down the Street.

The tradition in Denmark is to send letters and make deliveries and register master data with a common format of units within a building and having separate mailboxes with flat ID and names for each flat. I have received most of my post since then and got all deliveries I’m aware of.

Then I moved to London in a flat. Here the flats in my building have numbers. But the postman delivers the letters in one batch in the street door, and there are no names on the doorbells in front of the door.

So now I sense I don’t get many letters and today I had to order the same stuff trice from amazon.co.uk, because I haven’t received the first two packages despite of their state of the art online accessible package tracking systems that tells me that delivery was successful.

Master data pains unresolved

Address reference data at building number level and related geocodes are becoming commonly available many places around these days.

But having reference data and real world aligned location and related party master data at the unit level is still a challenge most places. Therefore we are still struggling with using address verification and geocoding for single customer view where a given building number has more than a single occupancy.

Bookmark and Share