Real World Alignment and Continental Drift

You can find many great analogies for working with data quality and Master Data Management (MDM) in world maps. One example is reported in the post The Greenland Problem in MDM, which is about how different business units have a different look on the same real world entity.

Real world alignment isn’t of course without challenges. Also because the real world changes as reported on Daily Mail in an article about how modern countries would be placed on the landmasses as they were 300 million years ago.

World 300 M years ago

The image above may very well show how many master data repositories today reflect the real world. Yep, we may have the country list covered well enough. We may even do quite well if we look at each geographical unit independently. However, the big picture doesn’t fit the world as it is today.

Bookmark and Share

The Internet of Things and the Fat-Finger Syndrome

When coining the term “the Internet of Things” Kevin Ashton said:

“The problem is, people have limited time, attention and accuracy—all of which means they are not very good at capturing data about things in the real world.”

Indeed, many many data quality flaws are due to a human typing the wrong thing. We usually don’t do that intentionally. We do it because we are human.

Typographical errors, and the sometimes dramatic consequences, are often referred to as the “fat-finger syndrome”.

As reported in the post Killing Keystrokes avoiding typing is a way forward for example by sharing data instead of typing in the same data (a little bit differently) within every organization.

IoT Data QualityThe Internet of Things, being common access to data provided by a huge number of well defined devices, is another development in avoiding typos.

It’s not that data coming from these devices can’t be flawed. As debated in the post Social Data vs Sensor Data there may be challenges in sensor data due to errors in a human setting up the sensors.

Also misunderstandings by humans in combining sensor data for analytics and predictions may cause consequences as bad as those based on the traditional fat-finger syndrome.

All in all I guess we won’t see a decrease in the need to address data quality in the future, we just will need to use different approaches, methodologies and tools to fight bad data and information quality.

Are you interested in what all this will be about? Why not joining the Big Data Quality group on LinkedIn?

Bookmark and Share

Data Quality Luxury

I am a bit of a map addict. So when figuring out a visit to London City today I tuned in on Google Maps. When zooming in I got this map:

Louis Vuitton

The pink establishment in the lower middle is the Royal Exchange, which today is filled up by luxury shops. First guess is that Google Maps has overlaid the map with positions from a business directory where Paul Smith was placed inside the building but Louis Vuitton due to a precision issue was placed outside in front of the building.

But there may be other explanations.

As the list of shops in the Royal Exchange shows here, there apparently isn’t a Louis Vuitton shop there.

So maybe Google Maps is timely real world aligned and Louis Vuitton was kicked out of the building (for being too cheap?) and now only has a booth on the steps in front of the building?

Of course, being a data quality geek, yours truly made a real world alignment check.

My report:

  • There’s no booth with bags (fake or real) in front of the building.
  • Paul Smith is exactly on the position within the building as shown on the map.
  • There’s no Louis Vuitton shop in the building.
  • There’s a Louis Vuitton shop, with only one bag with no price tag per window (so it must be real), in the next building behind the Royal Exchange.

Conclusion:

It’s a precision issue with business directory positions on a map, where one is randomly spot on and the other isn’t. You can’t expect data quality luxury.

Bookmark and Share

Know Your Supplier

Social Responsibility for Retailers and Distributors is No Longer an Option is the title of a new blog post by Paul Sirface on the Stibo Systems Datafusion blog.

Herein Paul writes:

“While many companies know that they have to respond to consumers’ demands, those with an active Master Data Management strategy have the best chance of responding effectively.  Multi-domain Master Data Management (MDM) is the perfect place to begin organizing and collecting the data on product related information within the supply chain, including supplier compliance..”

KYC KYSKnow Your Customer (KYC) is a well established term within data management and linked to fraud protection and anti money laundering.

Know Your Supplier (KYS) is indeed an equally important side of party master data management.

While customer master data management is on the way of evolving from handling mostly domestic customer data quality issues to also handling international customer data quality issues, supplier master data management has always been about international data quality challenges for most businesses.

As with customer master data having supplier master data that is well aligned with the real world and that can be maintained to reflect changes in the real world is indeed the starting point.

Bookmark and Share

Last Time Right

The ”First Time Right” principle is a good principle for data quality and indeed getting data right the first time is a fundamental concept in the instant Data Quality service I’m working with these days.

However, some blog posts in the data quality realm this week has pointed out that there is a life, and sometime an end of life, after data has hopefully been captured right the first time.

In the post From Cable to Grave by Guy Mucklow on the Postcode Anywhere blog the bad consequences of a case of chasing debt from a customer not among us anymore is examined.

Asset in, Garbage Out: Measuring data degradation is the title of a post by Rob Karel on Informatica Perspectives. Herein Rob goes through all the dangers data may encounter after being entered right the first time.

timingSome years ago I touched the subject in the post Ongoing Data Maintenance. As told here I’m convinced, after having seeing it work, that a good approach to also getting it right the last time is to capture data in a way that makes data maintainable.

Some techniques for doing this are:

  • Where possible collect external identifiers
  • Atomize data instead of squeezing several different elements into one attribute
  • Make the data model reflect the real world

And oh, it’s not the first time, neither the last time, I will touch this subject. It needs constant attention.

Bookmark and Share

Keep It Real, Stupid

One of my pet peeves is the KISS principle: Keep It Simple, Stupid.

Don’t get me wrong: It’s worth striving for simplicity wherever possible. But some problems are not simple and have simple solutions. Sometimes KISS is the shortcut to getting it all wrong.

Another take on simplicity is a quote floating around in social media these days:

Simply Einstein

Oh, so Einstein said that. So you can’t argue with that.

Well, he probably didn’t as Wikiquote reports:

Simply Not Einstein

So let’s stick to a real Einstein quote:

“Everything should be as simple as it can be, but not simpler”

A great quote related to data quality and master data management by the way.

Bookmark and Share

Multi-Channel Data Matching

Most data matching activities going on are related to matching customer, other rather party, master data.

In today’s business world we see data matching related to party master data in those three different channels types:

  • Offline is the good old channel type where we have the mother of all business cases for data matching being avoiding unnecessary costs by sending the same material with the postman twice (or more) to the same recipient.
  • Online has been around for some time. While the cost of sending the same digital message to the same recipient may not be a big problem, there are still some other factors to be considered, like:
    • Duplicate digital messages to the same recipient looks like spam (even if the recipient provided different eMail addresses him/her self).
    • You can’t measure a true response rate
  • Social is the new channel type for data matching. Most business cases for data matching related to social network profiles are probably based on multi-channel issues.

Multi-channel data matchingThe concept of having a single customer view, or rather single party view, involves matching identities over offline, online and social channels, and typical elements used for data matching are not entirely the same for those channels as seen in the figure to the right.

Most data matching procedures are in my experience quite simple with only a few data elements and no history track taking into considering. However we do see more sophisticated data matching environments often referred to as identity resolution, where we have historical data, more data elements and even unstructured data taking into consideration.

When doing multi-channel data matching you can’t avoid going from the popular simple data matching environments to more identity resolution like environments.

Some advices for getting it right without too much complication are:

  • Emphasize on data capturing by getting it right the first time. It helps a lot.
  • Get your data models right. Here reflecting the real world helps a lot.
  • Don’t reinvent the wheel. There are services for this out here. They help a lot.

Read more about such a service in the post instant Single Customer View.

Bookmark and Share

Sharing is the Future of MDM

Over at the DataRoundtable blog Dylan Jones recently posted an excellent piece called The Future of MDM?

Herein Dylan examines how a lot of people in different organizations spend a lot of time on trying to get complete, timely and unique data about customers and other business partners.

A better future for MDM (Master Data Management) could certainly be that every organization doesn’t have to do the work over and over and again. While self registration by customers is a way of letting off the burden on private enterprises and public sector bodies, we may even do better by not having the customer being the data entry clerk and typing in the same information over and over and again.

Today there are several available options for customer and other business partner reference data:

  • Public sector registries which are getting more and more open being that for example for the address part or even deeper in due respect of privacy considerations which may be different for business entities and individual entities.
  • Commercial directories often build on top of public registries.
  • Personal data lockers like the Mydex service mentioned by Dylan.
  • Social network profiles.

instant Single Customer ViewMy guess is that the future of MDM is going to be a mashup of exploiting the above options.

Oh, and as representatives of such a mashup service we recently at iDQ made sure we had the accurate, complete and timely information filled in on our Linkedin Company profile.

Bookmark and Share

Fuzzy Social Identities in the Data Quality Realm

In the past years social networks has emerged as a new source of external reference data for Master Data Management (MDM). But surely, there are challenges with the data quality related to this source.

Let’s look at a few examples from inside the data quality tool vendor space.

Who is head of Informatica in the social sphere?

There is a twitter account owned by Sohaib Abbasi:

Sohaib Abbasi

Informatica is one of the leading data quality tool vendors and the CEO there is Sohaib Abbasi.

So, is this the real world individual behind the twitter handle @sabbasi the head of Informatica?

A social graph should indicate so: There’s a bunch of Informatica accounts and people following the handle (though that’s not worth the trouble as there is no tweets coming from there).

What about the one behind Data Ladder?

Data Ladder is another data quality tool provider, thought with a fraction of revenue compared to Informatica.

In a recent post I stumbled upon a strange situation around this company. In the social sphere the company for the last seven years has been represented by a guy called Simon as seen here on LinkedIn:

Simon aka Nathan

But I have reasons to believe that his real world identity is Nathan as explored in the comments to this post.

Hmmmm….

Data Quality tool vendors: It’s time to get real.

Bookmark and Share

instant Single Customer View

Achieving a Single Customer View (SCV) is a core driver for many data quality improvement and Master Data Management (MDM) implementations.

As most data quality practitioners will agree, the best way of securing data quality is getting it right the first time. The same is true about achieving a Single Customer View. Get it right the first time. Have an instant Single Customer View.

The cloud based solution I’m working with right now does this by:

  • Searching external big reference data sources with information about individuals, companies, locations and properties as well as social networks
  • Searching internal master data with information already known inside the enterprise
  • Inserting really new entities or updating current entities by picking  as much data as possible from external sources

instant Single Customer View

Some essential capabilities in doing this are:

  • Searching is error tolerant so you will find entities even if the spelling is different
  • The receiving data model is real world aligned. This includes:
    • Party information and location information have separate lives as explained in the post called A Place in Time
    • You may have multiple means of contact attached like many phones, email addresses and social identities

How do you achieve a Single Customer View?

Bookmark and Share