Contact data is the data domain most often mentioned when talking about data quality. Names and addresses and other identification data are constantly spelled wrong, or just different, by the employees responsible of entering party master data.
Cleansing data long time after it has been captured is a common way of dealing with this huge problem. However, preventing typos, wrong hearings and multi-cultural misunderstandings at data entry is a much better option wherever applicable.
I have worked with two different approaches to ensure the best data quality for contact data entered by employees. These approaches are:
- Correction and
- Assistance
Correction
With correction the data entry clerk, sales representative, customer service professional or whoever is entering the data will enter the name, address and other data into a form.
After submitting the form, or in some cases leaving each field on the form, the application will check the content against business rules and available reference data and return a warning or error message and perhaps a correction to the entered data.
As duplicated data is a very common data quality issue in contact data, a frequent example of such a prompt is a warning about that a similar contact record already exists in the system.
Assistance
With assistance we try to minimize the needed number of key strokes and interactively help with searching in available reference data.
For example when entering address data assistance based data entry will start with the highest geographical level:
- If we are dealing with international data the country will set the context and know about if a state or province is needed.
- Where postal codes (like ZIP) exists, this is the fast path to the city.
- In some countries the postal code only covers one street (thoroughfare), so that’s settled by the postal code. In other situations we will usually have a limited number of streets that can be picked from a list or settled with the first characters.
(I guess many people know this approach from navigation devices for cars.)
When the valid address is known you may catch companies from business directories being on that address and, depending on the country in question, you may know citizens living there from phone directories and other sources and of course the internal party master data, thus avoiding entering what is already known about names and other data.
When catching business entities a search for a name in a business directory often leads to being able to pick a range of identification data and other valuable data and not at least a reference key to future data updates.
Lately I have worked intensively with an assistance based cloud service for business processes embracing contact data entry. We have some great testimonials about the advantages of such an approach here: instant Data Quality Testimonials.
Good stuff, Henrik.
I like the zip+4 in the US. If you can link to the right information repository, that 9-digit code will give you an exact address. Then you can compare it with the rest of the address information you have available to enter. If it matches, then you are done with the address. If there is a discrepancy you can verify whether the zip+4 was entered incorrectly (or is just plan incorrect) or if the rest of your contact information is bad.
Unfortunately relatively few people memorize their zip+4 so that is usually derived from the actual address.
Thanks Bryan.
Challenges are indeed different around the world.
In the UK where I live now the street level granular postal code includes letters and are therefore shorter and a bit easier to remember – also the system have been so for a long time.
In Denmark where I lived before only central Copenhagen have street level postal codes while the 4 digit postal code in the rest of the country includes several streets.
As you suggest above, Henrik, one simple change in input form design that would greatly reduce both key strokes and errors (both for data entry staff and online customers) would be to have the Country as the first line of the address.
This would immediately be able to pick up the relevant zip/post code rules for that country to enable this to be input next, which would in turn pick up suburb, street, etc. and populate them as candidate values – that could be overwritten if not the precise address.
The traditional postal address structure was designed in the mid nineteenth century for postmen walking house to house hand delivering mail – even then it was not the most efficient design! – so to carry it through to 21st century computerised systems may not be the most sensible thing to do.
That’s true John. And as you mention, the tips and tricks for address data entry will work for online self registration too.
Hi Henrik,
we have an ongoing discussion about the way to capture country-specific data, especially the use of special characters. Each member of the (international) team has a different opinion: Should it be Köln, Koeln, Koln or Cologne? Is it Newcastle, Neuburg or Neuchâtel? What happens when in the city two languages are valid like in Switzerland (Biel = Bienne). OK for cities we have still the postal code as identifier but big cities like Köln (this is the correct German spelling) might have different ones…I think your discussion scratches only the surface of the problematic and can end up easily in a political one. I’d be glad to participate!
Jutta, thanks a lot for adding in. The challenges are indeed plentiful and stretch across the data governance realm including that issues may be political.
Will be glad to discuss further, it’s one of my favorite subjects.
Hi,
In Belgium we have the same problem. Even within one country, some addresses exist in French and in Dutch, other addresses exist in French and German. And as, moreover, the Belgian postal system is not very structured and logic, it can be quite dificult to create a single ‘address’ view as of the point of data entry of new client records. One solution can be to create a unique address key that identifies a street (or an another geographic element) in a unique way, independent from the language it is written in. One of the Belgian companies that has created this unique view on Belgian addresses, is WDM Belgium with it’s Road65 tool.
Thanks for joining Lynn. Just checked out the WDM Road 65.
With proper analysis and design input screens should be easily able to deal with this problem. Once you have selected a) the country and b) the language you wish to use, the address can be verified against a database for that country in that language.
The data structures to support this functionality are relatively simple.
The real challenge is loading the correct domain data a into the database or wiring the form into the cloud. This latter would be the preferred option as it would keep all data current.
Unique keys are simply that, unique keys. They do not identify anything and, consequently, cannot uniquely identify anything. Their misuse as unique identifiers is one of the greatest causes of duplication in databases.