Data Quality X-mas Stories

Today is 2nd of December and time for the 2nd x-mas theme on this blog this year following up on the early yuletide post about The Shortcut to Lapland.

In a way it is not in line with a main subject on this blog being diversity to focus too much on Christmas as I know that many readers may have for example Eid, Diwali or Chinese New Year as the main days of celebration during the year.

To me The Holidays is much about having light in a time of year up north that else would be very dark and even depressive. When I am in Copenhagen I live on a cosy square called Gråbrødretorv (Grey Friars Market). In summertime the square is filled with outdoor seating. Not so much in the winter. But then there is a fir tree with lights on.

Gråbrødretorv2

Anyway there is lots of stuff in the x-mas theme you can relate to data quality. Some of the older ones on this blog were:

Bookmark and Share

Falsus in Uno, Falsus in Omnibus

The title of this blog post is a Latin legal phrase meaning “false in one thing, false in everything”. It refers to a principle about regarding everything a witness says as not credible, if one thing said by the witness is proven not to be true. This has been a part of the plot in plenty of courtroom films and TV-shows.

This principle has meaning related to data quality too. An example from direct marketing will be a receiver of a direct mail saying: “If you can’t get my name right, how can I trust you in getting anything right during a purchase?”

Somed data quality dimensions
Some data quality dimensions

An example from the multi-channel world, or should we say omni-channel today, would be a shopper saying: “If you say one thing about the product in the shop and another thing on the website, how can I trust any of your product information?” Falsehood in omni-channel so to speak.

Measuring the impact of such attitudes and thereby the Return on Investment (ROI) in data quality improvement based on this principle is very hard. We usually only have random anecdotal evidence about that this happens.

But, what we can say is: Don’t lie in court and don’t neglect your data quality. It will hurt your credibility and then in the end your creditworthiness.

Bookmark and Share

Omni-purpose MDM

The terms omni-channel banking and omni-channel retailing are becoming popular within businesses these days.

In this context omni (meaning all) is considered to be something more advanced than multi (meaning many) as in multi-channel retailing.

Data management, including Master Data Management (MDM), is always a bit behind the newest business trends. In our discipline we have hardly even entered the multi stage yet.

Some moons ago I wrote about multi-channel data matching on the Informatica Perspectives blog in the post Five Future Data Matching Trends. Today, on the same blog, Stephan Zoder has the post asking: Is your social media investment hampered by your “data poverty”?

Herein Stephan examines the possible benefits of multi-channel data matching based on a business case within the gambling industry.

Using omni in relation to MDM was seen in a vendor presentation at the Gartner MDM Summit in London last week as reported in the post Slicing the MDM Space. Omnidomain MDM was the proposed term here.

The end goal should probably be something that could be coined as omni-purpose MDM. This will be about advancing MDM capabilities to cover multiple domains and embrace multiple channels in order to obtain a single view of every core entity that can be used in every business process.

Omni

Bookmark and Share

Multi-Channel Data Matching

Most data matching activities going on are related to matching customer, other rather party, master data.

In today’s business world we see data matching related to party master data in those three different channels types:

  • Offline is the good old channel type where we have the mother of all business cases for data matching being avoiding unnecessary costs by sending the same material with the postman twice (or more) to the same recipient.
  • Online has been around for some time. While the cost of sending the same digital message to the same recipient may not be a big problem, there are still some other factors to be considered, like:
    • Duplicate digital messages to the same recipient looks like spam (even if the recipient provided different eMail addresses him/her self).
    • You can’t measure a true response rate
  • Social is the new channel type for data matching. Most business cases for data matching related to social network profiles are probably based on multi-channel issues.

Multi-channel data matchingThe concept of having a single customer view, or rather single party view, involves matching identities over offline, online and social channels, and typical elements used for data matching are not entirely the same for those channels as seen in the figure to the right.

Most data matching procedures are in my experience quite simple with only a few data elements and no history track taking into considering. However we do see more sophisticated data matching environments often referred to as identity resolution, where we have historical data, more data elements and even unstructured data taking into consideration.

When doing multi-channel data matching you can’t avoid going from the popular simple data matching environments to more identity resolution like environments.

Some advices for getting it right without too much complication are:

  • Emphasize on data capturing by getting it right the first time. It helps a lot.
  • Get your data models right. Here reflecting the real world helps a lot.
  • Don’t reinvent the wheel. There are services for this out here. They help a lot.

Read more about such a service in the post instant Single Customer View.

Bookmark and Share

The Dangers of being a Global Shopper

The global shopper is a multi-channel beast.

A global shopper may be a tourist or a business traveler buying goods in exciting cities around the world in shops most probable operated by the very same brands that occupies his local high street. The global shopper may also do his business from his living room by shopping online on sites with strange foreign privacy rules and unusual registration forms.

Oxford_StreetBeing a global shopper is risky business.

For example it’s unbelievable why Oxford Street in London hasn’t been made into a pedestrian street long time ago like any other respectable high street in major cities. But no, global shoppers on Oxford Street are constantly in danger of being hit by a red double-decker bus when crossing the street for a good bargain while looking to the right wrong side.

And how about shoe sizes? Measuring systems and standards around the world is a jungle and as a global shopper you will in 8 ½ out of 10 trials pick the wrong number 42.

Going online isn’t any better.

When registering your home address on a foreign site you are on very slippery ground.

If the site is from the United States, and you are not, you have to choose living in one of 50 different states meaning nothing to you. But there is no way around. My favorite state then is Alaska usually being on the top of the list.

Having a postal code with letters in it can be a no go. Not having a postal code is much like not existing at all.

But don’t give up. As a global shopper you will be able to find sites online with absolutely no clue about what an address looks like. Only thing of course will be the question about if you actually will get your goods or have to settle with the credit card withdrawal only.

Bookmark and Share

The Big ABC of Reference Data

Reference Data is a term often used either instead of Master Data or as related to Master Data. Reference data is those data defined and (initially) maintained outside a single organisation. Examples from the party master data realm are a country list, a list of states in a given country or postal code tables for countries around the world.

The trend is that organisations seek to benefit from having reference data in more depth than those often modest populated lists mentioned above.

In the party master data realm such reference data may be core data about:

  • Addresses being every single valid address typically within a given country.
  • Business entities being every single business entity occupying an address in a given country.
  • Consumers (or Citizens) being every single person living on an address in a given country.

There is often no single source of truth for such data. Some of the challenges I have met for each type of data are:

Addresses

The depth (or precision if you like) of an address is a common problem. If the depth of address data is at the level of building numbers on streets (thoroughfares) or blocks, you have issues as described in the blog post called Multi-Occupancy.

Address reference data of course have issues with the common data quality dimensions as:

  • Timeliness, because for example new addresses will exist in the real world but not yet in a given address directory.
  • Accuracy, as you are always amazed when comparing two official sources which should have the same elements, but haven’t.

Business Entities

Business directories have been accessible for many years and are often used when handling business-to-business (B2B) customer master data and supplier master data management. Some hurdles in doing this are:

  • Uniqueness, as your view of what a given business entity is occasionally don’t match the view in the business directory as discussed in the post 3 out of 10
  • Conformity, because for example an apparently simple exercise as assigning an industry vertical can be a complex matter as mentioned in the post What are they doing?

Consumers (or Citizens)

In business-to-consumer (B2C) or other activities involving citizens a huge challenge is identifying the individuals living on this planet as pondered in the post Create Table Homo Sapiens. Some troubles are:

  • Consistency isn’t easy, as governments around the world have found 240 (or so) different solutions to balancing privacy concerns and administrative effectiveness.
  • Completeness, as the rules and traditions not only between countries, but also within different industries, certain activities and various channels, are different.

Big Reference Data as a Service

Even though I have emphasized on some data quality dimensions for each type of data, all dimensions apply to all types of data.

For organisations operating multinational and/or multichannel exploiting the wealth and diversity of external reference data is a daunting task.

This is why I see reference data as a service embracing many sources as a good opportunity for getting data quality right the first time. There is more on this subject in the post Reference Data at Work in the Cloud.

Bookmark and Share

Nonprofit Data Quality

One of the industries where I have worked a lot with data quality issues is at nonprofit organizations such as charities and other form of membership based organizations.

A general characteristic of such organizations is that they have databases with as many “customers” as huge global enterprises; however the number of employee records is only a fraction compared to those large companies.

So the emphasis is often not at creating well manned data governance organizational structures but implementing the best automation available in order to have optimal party master data management, where the parties involved are members and other roles played by individuals and companies with a common interest.

Many nonprofit organizations have several different fundraising activities going on at the same time. This means that real world individuals, households, organizations and their contacts are registered through different channels. The challenges of getting a “single view of customer” from the data streams created in these processes are discussed in the post Multi-Purpose Data Quality.

There are many nonprofit organizations working internationally. The often decentralized management structures in nonprofit organizations means that way of doing things will naturally be different between countries where nonprofits are operating. Also the differences in legislation and culture are important. Some examples related to how to exploit master data are examined in the post Feasible Names and Addresses.

When it comes to creating business cases for data quality nonprofits are basically of course not different from any other organization. The main goals are increased fundraising and lowering administration costs. As said, the low number of employees often leads to using technology. The low amount of money available often leads to using agile technology.

Bookmark and Share