Avoiding Contact Data Entry Flaws

19th May 2012

Contact data is the data domain most often mentioned when talking about data quality. Names and addresses and other identification data are constantly spelled wrong, or just different, by the employees responsible of entering party master data.

Cleansing data long time after it has been captured is a common way of dealing with this huge problem. However, preventing typos, wrong hearings and multi-cultural misunderstandings at data entry is a much better option wherever applicable.

I have worked with two different approaches to ensure the best data quality for contact data entered by employees. These approaches are:

  • Correction and
  • Assistance

Correction

With correction the data entry clerk, sales representative, customer service professional or whoever is entering the data will enter the name, address and other data into a form.

After submitting the form, or in some cases leaving each field on the form, the application will check the content against business rules and available reference data and return a warning or error message and perhaps a correction to the entered data.

As duplicated data is a very common data quality issue in contact data, a frequent example of such a prompt is a warning about that a similar contact record already exists in the system.

Assistance

With assistance we try to minimize the needed number of key strokes and interactively help with searching in available reference data.

For example when entering address data assistance based data entry will start with the highest geographical level:

  • If we are dealing with international data the country will set the context and know about if a state or province is needed.
  • Where postal codes (like ZIP) exists, this is the fast path to the city.
  • In some countries the postal code only covers one street (thoroughfare), so that’s settled by the postal code. In other situations we will usually have a limited number of streets that can be picked from a list or settled with the first characters.

(I guess many people know this approach from navigation devices for cars.)

When the valid address is known you may catch companies from business directories being on that address and, depending on the country in question, you may know citizens living there from phone directories and other sources and of course the internal party master data, thus avoiding entering what is already known about names and other data.

When catching business entities a search for a name in a business directory often leads to being able to pick a range of identification data and other valuable data and not at least a reference key to future data updates.

Lately I have worked intensively with an assistance based cloud service for business processes embracing contact data entry. We have some great testimonials about the advantages of such an approach here: instant Data Quality Testimonials.

Bookmark and Share


Deduplication vs Identity Resolution

15th May 2012

When working with data matching you often finds that there basically is a bright view and a dark view.

Traditional data matching as seen in most data quality tools and master data management solutions is the bright view: Being about finding duplicates and making a “single customer view”. Identity resolution is the dark view: Preventing fraud and catching criminals, terrorists and other villains.

These two poles were discussed in a blog post and the following comments last year. The post was called What is Identity Resolution?

While deduplication and identity resolution may be treated as polar opposites and seemingly contrary disciplines they are in my eyes interconnected and interdependent. Yin and Yang Data Quality.

At the MDM Summit in London last month one session was about the Golden Nominal, Creating a Single Record View. Here Corinne Brazier, Force Records Manager at the West Midlands Police in the UK told about how a traditional data quality tool with some matching capabilities was used to deal with “customers” who don’t want to be recognized.

In the post How to Avoid Losing 5 Billion Euros it was examined how both traditional data matching tools and identity screening services can be used to prevent and discover fraudulent behavior.

Deduplication becomes better when some element of identity resolution is added to the process. That includes embracing big reference data in the process. Knowing what is known in available sources about the addresses that is being matched helps. Knowing what is known in business directories about companies helps. Knowing what is known in appropriate citizen directories when deduping records holding data about individuals helps.

Identity Resolution techniques is based on the same data matching algorithms we use for deduplication. Here for example a fuzzy search technology helps a lot compared to using wildcards. And of course the same sources as mentioned above are a key to the resolution.

Right now I’m dipping deep into the world of big reference data as address directories, business directories, citizen directories and the next big thing being social network profiles. I have no doubt about that deduplication and identity resolution will be more yinyang than yin and yang in the future.

Bookmark and Share


How to Avoid Losing 5 Billion Euros

7th May 2012

Two years ago I made a blog post about how 5 billion Euros were lost due to bad identity resolution at European authorities. The post was called Big Time ROI in Identity Resolution.

In the carbon trade scam criminals were able to trick authorities with fraudulent names and addresses.

One way of possible discovery of the fraudster’s pattern of interrelated names and physical and digital locations was, as explained in the post, to have used an “off the shelf” data matching tool in order to achieve what is sometimes called non-obvious relationship awareness. When examining the data I used the Omikron Data Quality Center.

Another and more proactive way would have been upstream prevention by screening identity at data capture.

Identity checking may be a lot of work you don’t want to include in business processes with high volume of master data capture, and not at least screening the identity of companies and individuals on foreign addresses seems a daunting task.

One way to help with overcoming the time used on identity screening covering many countries is using a service that embraces many data sources from many countries at the same time. A core technology in doing so is cloud service brokerage. Here your IT department only has to deal with one interface opposite to having to find, test and maintain hundreds of different cloud services for getting the right data available in business processes.

Right now I’m working with such a solution called instant Data Quality (iDQ).

Really hope there’s more organisations and organizations out there wanting to avoid losing 5 billion Euros, Pounds, Dollars, Rupees, Whatever or even a little bit less.

Bookmark and Share


Social MDM and Systems of Engagement

28th April 2012

Social Master Data Management has been an interest of mine the last couple of years and last week I have tried to reach out to others in exploring this new era of Master Data Management by creating a group on LinkedIn called Social MDM.

When reading a nice blog with the slogan ”Welcome to the Real (IT) World!” by Max J. Pucher I came across a good illustration by John Mancini showing the history of IT and how the term “Systems of Record” is being replaced (or at least supplemented) by the term “Systems of Engagement”:

Master Data Management (MDM) includes having a System of Record (SOR) describing the core entities that takes part in the transactional systems of record that supports the daily business in every organization. For example a golden MDM record is describing the party that acts as a customer on an order record while the products in the underlying order lines are described in golden MDM records for the things dealt with within the organization.

Social Master Data Management (Social MDM) will be about supplementing that System of Record so we are able to further describe the parties taking part in the new Systems of Engagement and link with the old Systems of Records. These parties are reflected as social network profiles that are owned by the same human beings who are our (prospective) customers, part of the same household or are a contact for a company being a (prospective) customer or any other business partner.

For a guy like me who started in IT in the mainframe era (just after it had ended according to the above illustration) and went on with mini computers, PC’s and the internet it’s very exciting to be moving on into the social and cloud era.

It will be good to be joined by even more data quality and MDM practitioners and anyone else in the LinkedIn Social MDM group.

Bookmark and Share


Finding Me

19th April 2012

Many people have many names and addresses. So have I.

A search for me within Danish reference sources in the iDQ tool gives the following result:

Green T is positive in the Danish Telephone Books. Red C is negative in the Danish Citizen hub. Green C is positive in the Danish Citizen Hub.

Even though I have left Denmark I’m still registered with some phone subscriptions there. And my phone company hasn’t fully achieved single customer view yet, as I’m registered there with two slightly different middle (sur)names.

Following me to the United Kingdom I’m registered here with more different names.

It’s not that I’m attempting some kind of fraud, but as my surname contains The Letter Ø, and that letter isn’t part of the English alphabet, my National Insurance Number (kind of similar to the Social Security Number in the US) is registered by the name “Henrik Liliendahl Sorensen”.

But as the United Kingdom hasn’t a single citizen view, I am separately registered at the National Health Service with the name “Henrik Sorensen”. This is due to a sloppy realtor, who omitted my middle (sur)name on a flat rental contract. That name was taken further by British Gas onto my electricity bill. That document is (surprisingly for me) my most important identity paper in the UK, and it was used as proof of address when registering for health service.

How about you, do you also have several identities?

Bookmark and Share


The Taxman: Data Quality’s Best Friend

11th April 2012

Collection of taxes has always been a main driver for having registries and means of identifying people, companies and properties.

5,000 years ago the Egyptians made the first known census in order to effectively collect taxes.

As reported on the Data Value Talk blog, the Netherlands have had 200 years of family names thanks to Napoleon and the higher cause of collecting taxes.

Today the taxman goes cross boarder and wants to help with international data quality as examined in the post Know Your Foreign Customer. The US FATCA regulation is about collecting taxes from activities abroad and as said on the Trillium blog: Data Quality is The Core Enabler for FATCA Compliance.

My guess is that this is only the beginning of a tax based opportunity for having better data quality in relation to international data.

In a tax agenda for the European Union it is said: “As more citizens and companies today work and operate across the EU’s borders, cooperation on taxation has become increasingly important.”.

The EU has a program called FISCUS in the making. Soon we not only have to identify Americans doing something abroad but practically everyone taking part in the globalization.

For that we all need comprehensive accessibility to the wealth of global reference data through “cutting-edge IT systems” (a FISCUS choice of wording).

I am working on that right now:

Bookmark and Share


Updating a Social Business Directory

5th April 2012

Business directories have been around for ages. In the old days it was paper based as in the yellow pages for a phone book. The yellow pages have since made it to be online searchable. We also know commercial business directories as the Dun & Bradstreet WorldBase as well as government operated national wide directories of companies and industry specific business directories.

Such business directories often takes a crucial role in master data quality work as sources for data enrichment in the quest for getting as close as possible to a single version of the truth when dealing with B2B customer master data, supplier master data and other business partner master data.

A classic core data model for Master Data in CRM systems, SCM solutions and Master Data hubs when doing B2B is that you have:

  • Accounts being the BUSINESS entities who are your customers, suppliers, prospects and all kind of other business partners
  • Contacts being the EMPLOYEEs working there and acting in the roles as decision makers, influencers, gate keepers, users and so on

Today we also have to think about social master data management, being exploiting reference data in social media as a supplementary source of external data.

As all social activity this exercise goes two ways:

  • Finding and monitoring your existing and wanted business partners in the social networks
  • Updating your own data

Most business entities in this world are actually one-man-bands. So are mine. Therefore I went to the LinkedIn company pages this morning and updated data about my company Liliendahl Limited: Unlimited Data Quality and Master Data Management consultancy for tool and service vendors.

Bookmark and Share


Real World Identity

20th March 2012

How far do you have to go when checking your customer’s identity?

This morning I read an article on the Danish Computerworld telling about a ferry line now dropping a solution for checking if the passenger using an access card is in fact the paying customer by using a lightweight fingerprint stored on the card. The reason for dropping was by the way due to the cost of upgrading the solution compared to future business value and not any renewed privacy concerns.

I have been involved in some balancing of real world alignment versus fitness for use and privacy in public transport as well as described in the post Real World Alignment. Here it was the question about using a national identification number when registering customers in public transportation.

As citizens of the world we are today used to sometimes having our iris scanned when flying as our passport holds our unique identification that way. Some of the considerations around using biometrics in general public registration were discussed in the post Citizen ID and Biometrics.

In my eyes, or should we say iris, there is no doubt that we will meet an increasing demand of confirming and registering our identification around. Doing that in the fight against terrorism has been there for long. Regulatory compliance will add to that trend as told in the post Know Your Foreign Customer, mentioning the consequences of the FATCA regulation and other regulations.

When talking about identity resolution in the data quality realm we usually deal with strings of text as names, addresses, phone numbers and national identification numbers. Things that reflect the real world, but isn’t the real world.

We will however probably adapt more facial recognition as examined in the post The New Face of Data Matching. We do have access to pictures in the cloud, as you may find your B2C customers picture on FaceBook and your B2B customer contacts picture on LinkedIn or other similar services. It’s still not the real world itself, but a bit closer than a text string. And of course the picture could be false or outdated and thus more suitable for traction on a dating site.

Fingerprint is maybe a bit old fashioned, but as said, more and more biometric passports are issued and the technology for iris and retinal scanning is used around for access control even on mobile devices.

In the story starting this post the business value for reinvesting in a biometric solution wasn’t deemed positive. But looking from the print on my fingers down to my hand lines I foresee some more identity resolution going beyond name and address strings into things closer to the real world as facial recognition and biometrics.

Bookmark and Share


Wildcard Search versus Fuzzy Search

13th February 2012

My last post about search functionality in Master Data Management (MDM) solutions was called Search and if you are lucky you will find.

In the comments the use of wildcards versus fuzzy search was touched.

The problem with wildcards

I have a company called “Liliendahl Limited” as this is the spelling of the name as it is registered with the Companies House for England and Wales.

But say someone is searching using one of the following strings:

  • “Liliendahl Ltd”,
  • “Liliendal Limited” or
  • “Liljendahl Limited”

Search functionality should in these situations return with the hit “Liliendahl Limited”.

Using wildcard characters could, depending on the specific syntax, produce a hit in all combinations of the spelling with a string like this: “lil?enda*l l*”.

The problem is however that most users don’t have the time, patience and skills to construct these search strings with wildcard characters. And maybe the registered name was something slightly else not meeting the wildcard characters used.  

Matching algorithms

Tools for batch matching of name strings have been around for many years. When doing a batch match you can’t practically use wildcard characters. Instead matching algorithms typically rely of one, or in best case a combination, of these techniques:

The same techniques can be used for interactive search thus reaching a hit in one fast search.

Fuzzy search

I have worked with the Omkron FACT algorithm for batch matching. This algorithm has morphed into being implemented as a fuzzy search algorithm as well.

One area of use for this is when webshop users are searching for a product or service within your online shop. This feature is, along with other eCommerce capabilities, branded as FACT-Finder.

The fuzzy search capabilities are also used in a tool I’m involved with called iDQ. Here external reference data sources, in combination with internal master data sources, are searched in an error tolerant way, thus making data available for the user despite heaps of spelling possibilities.

Bookmark and Share


Small Business Owners

1st February 2012

A challenge I encounter over and over again within Data Matching and customer Master Data Management is what to do with small business owners.

Examples of small business owners are:

  • Farmers
  • Healthcare professionals with an own clinic
  • Small family driven shop owners
  • Modest membership organisation administrators
  • Local hospitality providers as Basil Fawlty of Fawlty Towers
  • Independent Data Quality consultants as myself

When handling customer master data we often like to divide those into Business-to-consumer (B2C) or Business-to-business (B2B). We may have different source systems, different data models and different data owners and data stewards for each of the two divisions.

But small business owners usually belong to both divisions. In some transactions they act as private persons (B2C) and in some other transactions they act as a business contact (B2B). If you like to know your customer, have a single customer view , engage in social media and all that jazz, you must have a unique view of the person, the business and the household.

In several industries small business owners, the business and the household is a special target group with unique product requirements. This is true for industries as banking, insurance, telco, real estate, law.

So here are plenty of business cases for multi-domain Master Data Management embracing customer master data and product master data.

The capability to handle a single customer view of small business owners is in my experience very poorly fulfilled in Data Quality and Master Data Management solutions around. Here is certainly room for improvement and entrepreneurship.

Bookmark and Share


Follow

Get every new post delivered to your Inbox.

Join 125 other followers