Aadhar (or Aadhaar)

The solution to the single most frequent data quality problem being party master data duplicates is actually very simple. Every person (and every legal entity) gets an unique identifier which is used everywhere by everyone.

Now India jumps the bandwagon and starts assigning a unique ID to the 1.2 billion people living in India. As I understand it the project has just been named Aadhar (or Aadhaar). Google translate tells me this word (आधार) means base or root – please correct if anyone knows better.

In Denmark we have had such an identifier (one for citizens and one for companies) for many years. It is not used by everyone everywhere – so you still are able to make money being a data quality professional specializing in data matching.

The main reason that the unique citizen identifier is not used all over is of course privacy considerations. As for the unique company identifier the reason is that data quality often are defined as fit for immediate purpose of use.

Bookmark and Share

A user experience

As a data quality professional it is a learning experience when you are the user.

During the last years I have worked for a data quality tool vendor with headquarter in Germany. As part of the role of serving partners, prospects and customers in Scandinavia I have been a CRM system user. As a tool vendor own medicine has been taken which includes intelligent real time duplicate check, postal address correction, fuzzy search and other goodies built into the CRM system.

Sounds perfect? Sure, if it wasn’t for a few diversity glitches.

The address doesn’t exist

Postal correction is only activated for Germany. This actually makes some sense since most activity is in Germany and postal correction is not that important in Scandinavia as company (and citizen) information is more available and then usually a better choice. Due to a less fortunate setup during the first years  my routine when inserting a new account was to pick correct data from a business directory, paste into the CRM system and then angry override the warning that the address doesn’t exist (in Germany).

Dear worshipful Mr Doctor Oetker

In Germany salutation is paramount. In Scandinavia it is not common to use a prefixed salutation anymore – and if you do, you are regarded as very old fashioned. So having the salutation field for a contact as mandatory is an annoyance and setting up an automated salutation generation mechanism is a complete waste of time.

Bookmark and Share

Data Quality from the Cloud

One of my favorite data quality bloggers Jim Harris wrote a blog post this weekend called “Data, data everywhere, but where is data quality?

I believe in that data quality will be found in the cloud (not the current ash cloud, but to put it plainer: on the internet). Many of the data quality issues I encounter in my daily work with clients and partners is caused by that adequate information isn’t available at data entry – or isn’t exploited. But information needed will in most cases already exist somewhere in the cloud. The challenge ahead is how to integrate available information in the cloud into business processes.

Use of external reference data to ensure data quality is not new. Especially in Scandinavia where I live, this has been in use for long because of the tradition with public sector recording data about addresses, citizens, companies and so on far more intensely than done in the rest of the world.  The Achilles Heel though has always been how to smoothly integrate external data into data entry functionality and other data capture processes and not to forget, how to ensure ongoing maintenance in order to avoid else inevitable erosion of data quality.

The drivers for increased exploitation of external data are mainly:

  • Accessibility, which is where the fast growing (semantic) information store in the cloud helps – not at least backed up by the world wide tendency of governments releasing public sector data
  • Interoperability where increased supply of Service Orientated Architecture (SOA) components will pave the way
  • Cost; the more subscribers to a certain source, the lower the price – plus many sources will simply be free

As said, smoothly integration into business processes is key – or sometimes even better, orchestrating business processes in a new way so that available and affordable information (from the cloud) is pulled into these business processes using only a minimum of costly on premise human resources.

Bookmark and Share

Royal Exceptions

I am not a royalist, but anyway: Today 16th April 2010 is the 70 years birthday of Queen Margrethe II of Denmark. Congratulations Your Majesty.

Having a queen (or king) and a royal family is a good example of that there are always exceptions. As a matter related to data quality: I would say that every person in our country has a first (given) name and a last (family) name. But the royal family hasn’t a last name – only they have some first names like those of Her Majesty being Margrethe Alexandrine Þórhildur Ingrid. (By the way: The third name is actually Icelandic; I guess that explains the ash cloud sent as a greeting from there.)

There are always exceptions. We may define data quality validation rules from here to doomsday – there will always be exceptions. We may write down business rules from now to eternity – tomorrow you will encounter the first exception. Data quality (and democracy) is never perfect – but it’s worth striving for.

Breaking through an open door

This is perhaps a road I have been down before for example lately in the post The Myth about a Myth.

But it is a pet peeve of mine.

Why are some people always reminding us that this and that must be seen in a business context?

Of course everything we do in our professional life within data quality, master data management, business intelligence and so on must be seen in a business context. Again, I have never seen any people taking the opposite stance.

I am aware that playing the “business context” card is a friendly reminder when say some people become too excited about a tool. But remember, every tool is originally made by people to solve a business challenge and if the tool continues to exist it has probably done that several times.

It may be that tools are over exposed in our business issue discussions due to that some people are doing their job:

  • Vendors are naturally pushing their tools – it’s a business issue
  • Analysts talks about tools and vendors – it’s a business issue
  • Conference organizers invites vendors to make sponsorships and tool exhibitions – it’s a business issue

But I don’t think you are breaking through anything when reminding anyone about the business context. Everyone knows that already.  Take it to the next level.

What is Data Quality anyway?

The above question might seem a bit belated after I have blogged about it for 9 months now. But from time to time I ask myself some questions like:

Is Data Quality an independent discipline? If it is, will it continue to be that?

Data Quality is (or should) actually be a part of a lot of other disciplines.

Data Governance as a discipline is probably the best place to include general data quality skills and methodology – or to say all the people and process sides of data quality practice. Data Governance is an emerging discipline with an evolving definition, says Wikipedia. I think there is a pretty good chance that data quality management as a discipline will increasingly be regarded as a core component of data governance.

Master Data Management is a lot about Data Quality, but MDM could be dead already. Just like SOA. In short: I think MDM and SOA will survive getting new life from the semantic web and all the data resources in the cloud. For that MDM and SOA needs Data Quality components. Data Quality 3.0 it is.

You may then replace MDM with CRM, SCM, ERP and so on and here by extend the use of Data Quality components from not only dealing with master data but also transaction data.

Next questions: Is Data Quality tools an independent technology? If it is, will it continue to be that?

It’s clear that Data Quality technology is moving from being stand alone batch processing environments, over embedded modules to, oh yes, SOA components.

If we look at what data quality tools today actually do, they in fact mostly support you with automation of data profiling and data matching, which is probably only some of the data quality challenges you have.

In the recent years there has been a lot of consolidation in the market around Data Integration, Master Data Management and Data Quality which certainly is telling that the market need Data Quality technology as components in a bigger scheme along with other capabilities.

But also some new pure Data Quality players are established – and I think I often see some old folks from the acquired entities at these new challengers. So independent Data Quality technology is not dead and don’t seem to want to be that.

Bookmark and Share

Happy Pi Day

Today March 14 (or 3-14 when writing a date with the month before the day) is Pi Day.

Sometimes I use the analogy about squaring the circle when talking about how to get data quality right. In real life we have to make an approximate construction similar to when we square the circle – sometimes 22/7 or 3.14 instead of Pi with 15,000 decimals is OK and anyway the real fix is proved impossible.

Striving for the ultimate 360° view was discussed in a post here on the blog last year called “360° Business Partner View”.

Who is Responsible for Data Quality?

No, I am not going to continue some of the recent fine debates on who within a given company is data owner, accountable and responsible for data quality.

My point today is that many views on data ownership, the importance of upstream prevention and  fitness for purpose of use in a business context is based on an assumption that the data in a given company is entered by that company, maintained by that company and consumed by that company.

This is in the business world today not true in many cases.

Examples:

Direct marketing campaigns

Making a direct marketing campaign and sending out catalogues is often an eye opener for the quality of data in your customer and prospect master files. But such things are very often outsourced.

Your company extracts a file with say 100.000 names and addresses from your databases and you pay a professional service provider a fee for each row for doing the rest of the job.

Now the service provider could do you the kind favour of carefully deduplicating the file, eliminate the 5.000 purge candidates and bring you the pleasant message that the bill will be reduced by 5 %.

Yes I know, some service providers actually includes deduplication in their offerings. And yes, I know, they are not always that interested in using an advanced solution for that.

I see the business context here – but unfortunately it’s not your business.

Factoring

Sending out invoices is often a good test on how well customer master data is entered and maintained. But again, using an outsourced service for that like factoring is becoming more common.

Your company hands over the name and address, receives the most of the money, and the data is out of sight.

Now the factoring service provider has a pretty good interest in assuring the quality of the data and aligning the data with a real world entity.

Unfortunately this can not be done upstream, it’s a downstream batch process probably with no signalling back to the source.

Customer self service

Today data entry clerks are rapidly being replaced as the customer is doing all the work themselves on the internet. Maybe the form is provided by you, maybe – as often with hotel reservations – the form is provided by a service provider.

So here you basically either have to extend your data governance all the way to your customers living room or office or in some degree (fortunately?) accept that the customer owns the data.

Bookmark and Share

Unpredictable Inaccuracy

Let’s look at some statements:

• Business Intelligence and Data Mining is based on looking into historical data in order to make better decisions for the future.

• Some of the best results from Business Intelligence and Data Mining are made when looking at data in different ways than done before.

• It’s a well known fact that Business Intelligence and Data Mining is very much dependent on the quality of the (historical) data.

• We all agree that you should not start improving data quality (like anything else) without a solid business case.

• Upstream prevention of poor data quality is superior to downstream data cleansing.

Unfortunately the wise statements above have some serious interrelated timing issues:

• The business case can’t be established before we start to look at the data in the different way.

• Data is already stored downstream when that happens.

• Anyway we didn’t know precisely what data quality issues we have in that context before trying out new possible ways of looking at data.

Solutions to these timing issues may be:

• Always try to have the data reflect the real world objects they represent as close as possible – or at least include data elements that makes enrichment from external sources possible.

• Accept that downstream data cleansing will be needed from time to time and be sure to have the necessary instruments for that.

Bookmark and Share