Staying in Doggerland

Currently I’m travelling a lot between my present home in London, United Kingdom and Copenhagen, Denmark where I have most of my family and where the iDQ headquarter is.

When flying between London and Copenhagen you pass the southern North Sea. In the old days (8,000 years ago) this area was a land occupied by human beings. This ancient land is known today as Doggerland.

Sometimes I feel like a citizen of Doggerland not really belonging in the United Kingdom or Denmark.

I still have some phone subscriptions in Denmark I use there and my family are using there.  The phone company seems to have a hard time getting a 360 degree customer view as I have two different spellings of my name and two different addresses as seen on the screen when I look up myself in the iDQ service:

Besides having a Customer Relationship Mess (CRM) the phone company has recently shifted their outsourcing partner (from CSC to TCS). This has caused a lot of additional mess, apparently also closing one of my subscriptions due to that they have failed to register my payments. They did however send a chaser they say, but to the oldest of the addresses where I don’t pick up mail anymore.

I called to settle the matter and asked if they could correct the address not in use anymore. They couldn’t. The operator did some kind of query into the citizen hub similar to what I can do on iDQ:

However the customer service guy’s screen just showed that I have no address in Denmark in the citizen hub (called CPR), so he couldn’t change the address.

Apparently the phone company have correctly picked up an accurate address in the citizen hub when I got the subscription but failed to update it (along with the other subscriptions) when I moved to another domestic address and now don’t have an adequate business rule when I’m registered at a foreign address.

So now I’m staying in Doggerland.

Bookmark and Share

Mashing Up Big Reference Data and Internal Master Data

Right now I’m working on a cloud service called instant Data Quality (iDQ™).

It is basically a very advanced search engine capable of being integrated into business processes in order to get data quality right the first time and at the same time reducing the time needed for looking up and entering contact data.

With iDQ™ you are able to look up what is known about a given address, company and individual person in external sources (I call these big reference data) and what is already known in internal master data.

From a data quality point of view this mashup helps with solving some of the core data quality issues almost every organization has to deal with, being:

  • Avoiding duplicates
  • Getting data as complete as possible
  • Ensuring maximal accuracy

The mashup is also a very good foundation for taking real-time decisions about master data survivorship.

The iDQ™ service helps with getting data quality right the first time. However, you also need Ongoing Data Maintenance in order to keep data at a high quality. Therefore iDQ™ is build for trigging into subscription services for external reference data.

At iDQ we are looking for partners world-wide who see the benefit of having such a cloud based master data service connected to providing business-to-business (B2B) and/or business-to-consumer (B2C) data services, data quality services and master data management solutions.

Here’s the contact data: http://instantdq.com/contact/

Bookmark and Share

State of this Data Quality Blog

Today is a big day on this blog as it has been live for 3 years.

Success versus Failure

The first entry called Qualities of Data Architecture was a promise to talk about data quality success stories. The reason for emphasizing on success stories related to data quality is a feeling that data quality improvement is too often promoted by horror stories telling about how bad your business may go if you don’t pay attention to data quality.

The problem is that stories about failure usually aren’t taken too seriously. Jim Harris recently had a very good take on that in the post Data Quality and Chicken Little Syndrome.

So, I plan to tell even more success stories along with the inevitable stories about failure that so easily and obviously could have been avoided.

Getting Social

Using social networks to promote your blogging is quite natural.

At the same time social networks has emerged as new source in doing master data management (I call this Social MDM).

Exploring this new discipline over the hype peak, down through the valley of disappointment and up to the plateau of productivity will for sure be a recurring subject on this blog.

People, Processes and Technology

Sometimes you see a statement like “Data Quality is not about technology, it’s all about people”.

Well, most things we can’t solve easily are not just about one thing. In my eyes the old cliché about addressing people, processes and technology surely also relates to getting data quality right.

There are many good blogs around about people and processes. On this blog I’ll try to tell about my comfort zone being technology without forgetting people and processes.

The Hidden Agenda

Most people blogging are doing this to promote our (employers) expertise, services and tools and I am not different.

Lately I have written a lot about a second to none cloud based service for upstream data quality prevention. The wonder is called instant Data Quality.

While upstream prevention is the best approach to data quality still a lot of work must be done every day in downstream cleansing as told in the post Top 5 Reasons for Downstream Cleansing.

As I’m also working with a new stellar cloud based platform for data quality improvement productivity I will for sure share some props for that in the near future.

Bookmark and Share

Data Driven Data Quality

In a recent article Loraine Lawson examines how a vast majority of executives describes their business as “data driven” and how the changing world of data must change our approach to data quality.

As said in the article the world has changed since many data quality tools were created. One aspect is that “there’s a growing business hunger for external, third-party data, which can be used to improve data quality”.

Embedding third-party data into data quality improvement especially in the party master data domain has been a big part of my data quality work for many years.

Some of the interesting new scenarios are:

Ongoing Data Maintenance from Many Sources

As explained in the article on Wikipedia about data quality services as the US National Change of Address (NCOA) service and similar services around the world has been around for many years as a basic use of external data for data quality improvement.

Using updates from business directories like the Dun & Bradstreet WorldBase and other national or industry specific directories is another example.

In the post Business Contact Reference Data I have a prediction saying that professional social networks may be a new source of ongoing data maintenance in the business-to-business (B2B) realm.

Using social data in business-to-consumer (B2C) activities is another option though also haunted with complex privacy considerations.

Near-Real-Time Data Enrichment

Besides updating changes of basic master data from business directories these directories typically also contains a lot of other data of value for business processes and analytics.

Address directories may also hold further information like demographic stereotype profiles, geo codes and property data elements.

Appending phone numbers from phone books and checking national suppression lists for mailing and phoning preferences are other forms of data enrichment used a lot related to direct marketing.

Traditionally these services have been implemented by sending database extracts to a service provider and receiving enriched files for uploading back from the service provider.

Lately I have worked with a new breed of self service data enrichment tools placed in the cloud making it possible for end users to easily configure what to enrich from a palette of address, business entity and consumer/citizen related third-party data and executing the request as close to real-time as the volume makes it possible.

Such services also include the good old duplicate check now much better informed by including third-party reference data.

Instant Data Quality in Data Entry

As discussed in the post Avoiding Contact Data Entry Flaws third-party reference data as address directories, business directories and consumer/citizen directories placed in the cloud may be used very efficiently in data entry functionality in order to get data quality right the first time and at the same time reduce the time spend in data entry work.

Not at least in a globalized world where names of people reflect the diversity of almost any nation today, where business names becomes more and more creative and data entry is done at shared service centers manned with people from cultures with other address formatting rules, there is an increased need for data entry assistance based on external reference data.

When mashing up advanced search in third-party data and internal master when doing data entry you will solve most of the common data quality issues around avoiding duplicates and getting data as complete and timely as needed from day one.

Bookmark and Share

Pulling Data Quality from the Cloud

In a recent post here on the blog the benefits of instant data enrichment was discussed.

In the contact data capture context these are some examples:

  • Getting a standardized address at contact data entry makes it possible for you to easily link to sources with geo codes, property information and other location data.
  • Obtaining a company registration number or other legal entity identifier (LEI) at data entry makes it possible to enrich with a wealth of available data held in public and commercial sources.
  • Having a person’s name spelled according to available sources for the country in question helps a lot with typical data quality issues as uniqueness and consistency.

However, if you are doing business in many countries it is a daunting task to connect with the best of breed sources of big reference data. Add to that, that many enterprises are doing both business-to-business (B2B) and business-to-consumer (B2C) activities including interacting with small business owners. This means you have to link to the best sources available for addresses, companies and individuals.

A solution to this challenge is using Cloud Service Brokerage (CSB).

An example of a Cloud Service Brokerage suite for contact data quality is the instant Data Quality (iDQ™) service I’m working with right now.

This service can connect to big reference data cloud services from all over the world. Some services are open data services in the contact data realm, some are international commercial directories, some are the wealth of national reference data services for addresses, companies and individuals and even social network profiles are on the radar.

Bookmark and Share

The Secret Behind Good Data Quality

This post is inspired by a little tweet chat I had with Daragh O Brien this morning:

The data quality angle was that a simple data quality rule around age (or date of birth) for living persons would be a check creating a warning if age is above 122, because this would, if true, be a new entry in the book of records.

Jeanne Louise Calment of France had the longest confirmed human life of span being 122 years.

Your data quality age check may even be refined as the record for a male is 115 years.

Christian Mortensen, born in Denmark and deceased in the United States, holds that record.

Both Jeanne Calment and Christian Mortensen have shared their secret behind a long life.

Surprisingly both recipes include what is usually not considered good for your health.

Jeanne Calment recommended a diet of port wine and she ate nearly one kilogram of chocolate every week.

Christian Mortensen on the other hand recommended lots of good water and no alcohol – but then a good cigar.

Even though there are lots of recipes and examples out there for a good health and a long life, there is probably no single one way and as told in the post Miracle Food for Thought:

“The facts about the latest dietary discoveries are rarely as simple as the headlines imply. Accurately testing how any one element of our diet may affect our health is fiendishly difficult. And this means scientists’ conclusions, and media reports of them, should routinely be taken with a pinch of salt.”

It’s about the same with data quality, isn’t it?

Accurately testing how any one element of our data may affect our business is fiendishly difficult. So predictions of return of investment (ROI) from data quality improvement are unfortunately routinely taken with a big spoon of salt.

Also as discussed in the post Turning a Blind Eye to Data Quality there are plenty of examples of business success despite of poor data quality.

So, no, there is no single secret behind good data quality. But there is a wealth of good practices, tools and services to choose from out there.

For example I’m not sure I like instant oatmeal – but Instant Data Enrichment for instant Data Quality are good ones for you. I promise.

Bookmark and Share

Instant Data Enrichment

Data enrichment is one of the core activities within data quality improvement. Data enrichment is about updating your data in order to be more real world aligned by correcting and completing with data from external reference data sources.

Traditionally data enrichment has been a follow up activity to data matching and doing data matching as a prerequisite for data enrichment has been a good part of my data quality endeavor during the recent 15 years as reported in the post The GlobalMatchBox.

During the last couple of years I have tried to be part of the quest for doing something about poor data quality by moving the activities upstream. Upstream data quality prevention is better than downstream data cleansing wherever applicable. Doing the data enrichment at data capture is the fast track to improve data quality for example by avoiding contact data entry flaws.

It’s not that you have to enrich with all the possible data available from external sources at once. What is the most important thing is that you are able to link back to external sources without having to do (too much) fuzzy data matching later. Some examples:

  • Getting a standardized address at contact data entry makes it possible for you to easily link to sources with geo codes, property information and other location data at a later point.
  • Obtaining a company registration number or other legal entity identifier (LEI) at data entry makes it possible to enrich with a wealth of available data held in public and commercial sources.
  • Having a person’s name spelled according to available sources for the country in question helps a lot when you later have to match with other sources.

In that way your data will be fit for current and future multiple purposes.

Bookmark and Share

Avoiding Contact Data Entry Flaws

Contact data is the data domain most often mentioned when talking about data quality. Names and addresses and other identification data are constantly spelled wrong, or just different, by the employees responsible of entering party master data.

Cleansing data long time after it has been captured is a common way of dealing with this huge problem. However, preventing typos, wrong hearings and multi-cultural misunderstandings at data entry is a much better option wherever applicable.

I have worked with two different approaches to ensure the best data quality for contact data entered by employees. These approaches are:

  • Correction and
  • Assistance

Correction

With correction the data entry clerk, sales representative, customer service professional or whoever is entering the data will enter the name, address and other data into a form.

After submitting the form, or in some cases leaving each field on the form, the application will check the content against business rules and available reference data and return a warning or error message and perhaps a correction to the entered data.

As duplicated data is a very common data quality issue in contact data, a frequent example of such a prompt is a warning about that a similar contact record already exists in the system.

Assistance

With assistance we try to minimize the needed number of key strokes and interactively help with searching in available reference data.

For example when entering address data assistance based data entry will start with the highest geographical level:

  • If we are dealing with international data the country will set the context and know about if a state or province is needed.
  • Where postal codes (like ZIP) exists, this is the fast path to the city.
  • In some countries the postal code only covers one street (thoroughfare), so that’s settled by the postal code. In other situations we will usually have a limited number of streets that can be picked from a list or settled with the first characters.

(I guess many people know this approach from navigation devices for cars.)

When the valid address is known you may catch companies from business directories being on that address and, depending on the country in question, you may know citizens living there from phone directories and other sources and of course the internal party master data, thus avoiding entering what is already known about names and other data.

When catching business entities a search for a name in a business directory often leads to being able to pick a range of identification data and other valuable data and not at least a reference key to future data updates.

Lately I have worked intensively with an assistance based cloud service for business processes embracing contact data entry. We have some great testimonials about the advantages of such an approach here: instant Data Quality Testimonials.

Bookmark and Share

How to Avoid Losing 5 Billion Euros

Two years ago I made a blog post about how 5 billion Euros were lost due to bad identity resolution at European authorities. The post was called Big Time ROI in Identity Resolution.

In the carbon trade scam criminals were able to trick authorities with fraudulent names and addresses.

One way of possible discovery of the fraudster’s pattern of interrelated names and physical and digital locations was, as explained in the post, to have used an “off the shelf” data matching tool in order to achieve what is sometimes called non-obvious relationship awareness. When examining the data I used the Omikron Data Quality Center.

Another and more proactive way would have been upstream prevention by screening identity at data capture.

Identity checking may be a lot of work you don’t want to include in business processes with high volume of master data capture, and not at least screening the identity of companies and individuals on foreign addresses seems a daunting task.

One way to help with overcoming the time used on identity screening covering many countries is using a service that embraces many data sources from many countries at the same time. A core technology in doing so is cloud service brokerage. Here your IT department only has to deal with one interface opposite to having to find, test and maintain hundreds of different cloud services for getting the right data available in business processes.

Right now I’m working with such a solution called instant Data Quality (iDQ).

Really hope there’s more organisations and organizations out there wanting to avoid losing 5 billion Euros, Pounds, Dollars, Rupees, Whatever or even a little bit less.

Bookmark and Share

Big Reference Data as a Service

This morning I read an article called The Rise of Big Data Apps and the Fall of SaaS by Raj De Datta on TechCrunch.

I think the first part of the title is right while the second part is misleading. Software as a Service (SaaS) will be a big part of Big Data Apps (BDA).

The article also includes a description of LinkedIn merely as a social recruitment service. While recruiters, as reported in the post Indulgent Moderator or Ruthless Terminator?, certainly are visible on this social network, LinkedIn is much more than that.

Among other things LinkedIn is a source of what I call big reference data as examined in the post Social MDM and Systems of Engagement.

Besides social network profiles big reference data also includes big directory services, being services with large amount of data about addresses, business entities and citizens/consumers as told in the post The Big ABC of Reference Data.

Right now I’m working with a Software as a Service solution embracing Big (Reference) Data as a Service thus being a Big Data App called instant Data Quality.

And hey, I have made a pin about that:

Bookmark and Share