Happy Uniqueness

27th March 2011

When making the baseline for customer data in a new master data management hub you often involve heavy data matching in order to de-duplicate the current stock of customer master data, so you so to speak start with a cleansed duplicate free set of data.

I have been involved in such a process many times, and the result has never been free of duplicates. For two reasons:

  • Even with the best data matching tool and the best external reference data available you obviously can’t settle all real world alignments with the confidence needed and manual verification is costly and slowly.
  • In order to make data fit for the business purposes duplicates are required for a lot of good reasons.

Being able to store the full story from the result of the data matching efforts is what makes me, and the database, most happy.

The notion of a “golden record” is often not in fact a single record but a hierarchical structure that reflects both the real world entity as far as we can get and the instances of this real world entity in a form that are suitable for different business processes.

Some of the tricky constructions that exist in the real world and are usual suspects for multiple instances of the same real world entity are described in the blog posts:

The reasons for having business rules leading to multiple versions of the truth are discussed in the posts:

I’m looking forward to yet a party master data hub migration next week under the above conditions.

Bookmark and Share


Right the First Time

6th January 2011

Since I have just relocated (and we have just passed the new year resolution point) I have become a member of the nearby fitness club.

Guess what: They got my name, address and birthday absolutely right the first time.

Now, this could have been because the young lady at the counter is a magnificent data entry person. But I think that her main competency actually rightfully is being a splendid fitness instructor.

What she did was that she asked for my citizen ID card and took the data from there. A little less privacy yes, but surely a lot better for data quality – or data fitness (credit Frank Harland) you might say.

Bookmark and Share


The Snow Queen

11th December 2010

During the existence of this blog I have come to use two tags several times, namely the fairy tale author Hans Christian Andersen as an inspiration for data quality related subjects and the tag happy databases as a counterweight against that we may talk too much about all the bad data quality around.

In embracing these two tags the fairy tale The Snow Queen also starts in the very bad end.

An evil troll makes a magic mirror that has the power to distort the appearance of things reflected in it. It fails to reflect all the good and beautiful aspects of people and things while it magnifies all the bad and ugly aspects so that they look even worse than they really are; for example makes the loveliest landscapes look like “boiled spinach.” I think every child understands that metaphor.

We tend to do the same in the data quality realm. In order to make a case for data and information quality improvement we like to tell about trainwrecks like on the site edited by IAIDQ. And for the record, I am guilty as everyone else in reading, laughing and contributing to the mobbing when everyone else makes a mistake within data management.

Bookmark and Share


Snowman Data Quality

5th December 2010

Right now it is winter in the Northern Hemisphere and this year winter has come earlier than usual to Northern Europe where I live. We have already had a lot of snow.

One of the good things with snow is that you are able to build a snowman. Snowmen are beautiful pieces of art but very vulnerable.  Wind and not at least rising temperatures makes the snowman ugly and finally go away sooner or later.

Snowmen have this unfortunate fate common with many data quality initiatives.

Many articles, blog posts and so on in the data quality realm focuses on this fate related to technology based initiatives. The common practice of executing downstream cleansing of data using data quality tools is often criticized. As a practitioner in this field I have to admit that: Yes, I am often making the art of building snowman data quality.

An often stated alternative to using data quality tools is improving data quality through change management including relaying on changing the attitude of people entering and maintaining data. Though it’s not my area of expertise I have seen such initiatives too. And I am afraid that I am not convinced that such initiatives unfortunately also sooner or later have the same fate as the snowman.

As said, I’m not the expert here. I am only the little child watching how this snowman is exposed to the changing winds in many business environments and how it finally disappears when the business climate varies over time.

Now, this is supposed to be a cheerful blog about happy databases. I am ready for getting into some warm clothes and build a beautiful snowman of any kind.  

Bookmark and Share


Golden Copy Musings

22nd October 2010

In a recent blog post by Jim Harris called Data Quality is not an Act, it is a Habit the term “golden mean” was mentioned.   

As I commented, mentioning the “golden mean” made me think about the terms “golden copy” and “golden record” which are often used terms in data quality improvement and master data management.

In using these terms I think we mostly are aiming on achieving extreme uniqueness. But we should rather go for symmetry, proportion, and harmony.

The golden copy subject is very timely for me as I this weekend is overseeing the execution of the automated processes that create a baseline for a golden copy of party master data at a franchise operator for a major brand in car rental.

In car rental you are dealing with many different party types. You have companies as customers and prospects and you have individuals being contacts at the companies, employees using the cars rented by the companies and individuals being private renters. A real world person may have several of these roles. Besides that we have cases of mixed identities.

During a series of workshops we have worked with defining the rules for merge and survivorship in the golden copy. Though we may be able to go for extreme uniqueness in identifying real world companies and persons this may not necessary serve the business needs and, like it or not, be capable of being related back into the core systems used in daily business.

Therefore this golden copy is based on a beautiful golden mean exposing symmetry, proportion, and harmony.

Bookmark and Share


The Little Match Girl

30th September 2010

The short story (or fairy tale) The Little Match Girl (or The Litlle Match Seller) by Hans Christian Andersen is a sad story with a bad ending, so it shouldn’t actually belong here on this blog where I will try to tell success stories about data quality improvement resulting in happy databases.

However, if I look at the industry of making data matching tools (and data matching technology is a large part of data quality tools) I wonder if the future has ever been that bright.

There are many tools for data matching out there.

Some tool vendors have been acquired by big players in the data management realm as:

  • IBM acquired Accential Software
  • SAS Institute acquired DataFlux
  • Informatica acquired Similarity Systems and Identity Systems
  • Microsoft acquired Zoomix
  • SAP acquired Fuzzy Informatik and Business Objects that acquired FirstLogic
  • Experian acquired QAS
  • Tibco acquired Netrics

(the list may not be complete, just what immediately comes to my mind).

The rest of the pack is struggling with selling matches in the cold economic winter.

There is another fairy tale similar to The Little Match Girl called The Star Money collected by the Brothers Grimm. This story has a happy ending. Here the little girl gives here remaining stuff away for free and is rewarded with money falling down from above. Perhaps this is like The Coming of Age of Open Source as told in a recent Talend blog post?

Well, open source is first expected to break the ice in the Frozen Quadrant in 2012.

Bookmark and Share


Data Quality Tools: The Cygnets in Information Quality

11th September 2010

Since engaging in the social media community around data and information quality I have noticed quite a lot of mobbing going on pointed at data quality tools. The sentiment seems to be that data quality tools are no good and will play only a very little role, if any, in solving the data and information quality conundrum.

I like to think of data quality tools as being like the cygnet (the young swan) in the fairy tale “The Ugly Duckling” by Hans Christian Andersen. An immature clumsy flapper in the barnyard. And sure, until now tools have generally not been ready to fly, but been mostly situated in the downstream corner of the landscape.

Since last September I have been involved in making a new data quality tool. The tool is based on the principles described in the post Data Quality from the Cloud.

We have now seen the first test flights in the real world and I am absolutely thrilled about the testimonial sayings. Examples:

  • “It (the tool) is lean”.  I like that since lean is a production practice that considers the expenditure of resources for any goal other than the creation of value for the end customer to be wasteful.
  • “It is gold”. I like to consider that as a calculated positive business case.
  • “It is the best thing happened in my period of employment”. I think happy people are essential to data quality.

Paraphrasing Andersen: I never dreamed there could be so much happiness, when I was working with ugly ducklings.

Bookmark and Share


The Ugly Duckling

9th September 2010

The title of the fairy tale “The Ugly Duckling” by Hans Christian Andersen was originally supposed to be the more positive “The Young Swan” (or “The Cygnet”) , but as Andersen did not want to spoil the element of surprise in the protagonist’s transformation, he discarded it for “The Ugly Duckling”.

In a blog post called “Why Isn’t Our Data Quality Worse?” posted today (or last night local Iowa time) Jim Harris examines the psychology term “negativity bias” that explains how bad evokes a stronger reaction than good in the human mind.

Surely, data quality improvement evangelism is most often based on the strong force of badness. Always describing how bad data is everywhere. Bashing executives who don’t get it. Only as a nice positive surprise in the end we tell how our product/consultancy will transform the ugly duckling into a beautiful swan.    

My latest blog post with a truly positive angle called “What a Lovely Day” is almost 2 months old. So I promise myself the next post will have the title “The Young Swan” (or “The Cygnet”) and will be extremely positive about data quality improvement.

Bookmark and Share


What a Lovely Day

14th July 2010

As promised earlier today, here is the first post in an endless row of positive posts about success in data quality improvement.

This beautiful morning I finished yet another of these nice recurring jobs I do from time to time: Deduplicating bunches of files ready for direct marketing making sure that only one, the whole one and nothing but one unique message reaches a given individual decision maker, be that in the online or offline mailbox.

Most jobs are pretty similar and I have a fantastic tool that automates most of the work. I only have the pleasure to learn about the nature of the data and configure the standardisation and matching process accordingly in a user friendly interface. After the automated process I’m enjoying looking for any false positives and checking for false negatives. Sometimes I’m so lucky that I have the chance to repeat the process with a slightly different configuration so we reach the best result possible.

It’s a great feeling that this work reduces the costs of mailings at my clients, makes them look more smart and professional and facilitates that correct measure of response rates that is so essential in planning future even better direct marketing activities.

But that’s not all. I’m also delighted to be able to have a continuing chat about how we over time may introduce data quality prevention upstream at the point of data entry so we don’t have to do these recurring downstream cleansing activities any more. It’s always fascinating going through all the different applications that many organisations are running, some of them so old that I didn’t dream about they existed anymore. Most times we are able to build a solution that will work in the given landscape and anyway soon the credit crunch is totally gone and here we go.

I’ll be back again with more success from the data quality improvement frontier very soon.

Bookmark and Share


Why do you watch it?

14th July 2010

Statler and Waldorf is a pair of Muppet characters. They are two ornery, disagreeable old men. Despite constantly complaining about the show and how terrible some acts were, they would always be back the following week in the best seats in the house. At the end of one episode, they looked at the camera and asked: “Why do you watch it?”.

This is a bit like blogging about data quality, isn’t it? Always describing how bad data is everywhere. Bashing executives who don’t get it. Telling about all the hard obstacles ahead. Explaining you don’t have to boil the ocean but might get success by settling for warming up a nice little drop of water.

Despite really wanting to tell a lot of success stories, being the funny Fuzzy Bear on the stage, well, I am afraid I also have been spending most time on the balcony with Statler and Waldorf.

So, from this day forward: More success stories.

This is the start of a series of 1.3 blog posts…. No, just kidding.

Bookmark and Share


Follow

Get every new post delivered to your Inbox.

Join 125 other followers