Big Data and Multi-Domain Master Data Management

The possible connection between the hot buzz within IT today being “big data” and the good old topic of master data management has been discussed a lot lately. An example from CIO UK today is this article called Big data without master data management is a problem.

As said in the article there is a connection through big master data (and big reference data) to big transaction data. Big transaction data is what we usually would call big data, because these are the really big ones.

The two most mentioned kind of big transaction data are:

  • Social data and
  • Sensor data

I also have seen a lot of connections between these big data and master data in multiple domains.

Social Data

Connecting social data to Master Data Management (MDM) is an ongoing discussion I have been involved in for the last three years lately through the new LinkedIn group called Social MDM.

The customer master data domain is in focus here, as the immediate connection here is how to relate traditional systems of record holding customer master data and the systems of engagement where the big social data are waiting to be analyzed and eventually be a part of day-to-day customer centric business processes.

However being able to analyze, monitor and take action on what is being said about specific products in social data is another option and eventually that has to be linked to product master data. In product master data management the focus has traditionally been on your own (resell) products. Effectively listening to social data will mean that you also have to manage data about competing products.

Attaching location to social data has been around for long. Connecting social data to your master data will also require that your location master data are well aligned with the real world.

Sensor Data   

During the past many years I have been involved in data management within public transportation where we have big data coming in from sensors of different kind.

The big problem has for sure being able to connect these transactions correctly to master data. The challenges here are described in the post Multi-Entity Master Data Quality.

The biggest problem is that all the different equipment generating the sensor data in practice can’t be at the same stage at the same time and this will eventually create data that if related without care will show very wrong information about who was the passenger(s), what kind of trip it were, where the journey happened and under which timetable.

Bookmark and Share

The Cases for UPPER CASE in Data Management

I remember some years ago when I started SMS’ing I had an old mobile phone that defaulted the text in upper case. After I while my son answered back: “Why are you always yelling at me in SMSes”.

So I learned that you can use lower case in SMSes as well, and only using all caps in SMSes, as in any other writing, usually means that YOU ARE YELLING.

Examining a text for upper case use can, together with polarity classifiers and all that jazz, be used today in sentiment analysis for example within social media data.

Within data parsing using words in upper case in person names may tell you something too. Especially in France it is common to indicate a surname with only upper case characters, so for example in the name “AUGUST Michel” the first name is the surname and the last name is the given name.

When matching company names a word in upper case may indicate an abbreviation. So “THE Ltd” and “The Happy Entrepreneur Ltd” may be a good match despite of a horrible edit distance.

In data migration within handling names from older systems where all caps have been used, it is common to try to make better looking names. “JOHN SMITH” will be “John Smith” and “SAM MCCLOUD” should be “Sam McCloud”. In environments with other alphabets than English national characters may be reintroduced as well. For example in a German context “JURGEN VON LOW” may come out as “Jürgen von Löw”.

What about you? Have you stumbled upon some fun with upper case in data management?

Bookmark and Share

Sometimes Big Brother is Confused

Google Maps knows a lot. It knows about addresses and it knows about companies on these addresses.

As with most services it seems that Google Maps gets the reference data from different sources.

The other day I went to visit “Channel 4”, the British TV channel that hosted the UK “Big Brother” reality show until lately.

I typed in the address “124 Horseferry Road, London, United Kingdom” and got the point:

However, it seems that there is a large building up to the left called “Channel 4 Television”. Strange. Then I tried with “Channel 4, 124 Horseferry Road, London, United Kingdom”:

Oh, so I will find “Channel Four Television, 124 Horseferry Road” in the “Channel 4 Television” building only 0.2 miles west of “124 Horseferry Rd”:

Bookmark and Share

State of this Data Quality Blog

Today is a big day on this blog as it has been live for 3 years.

Success versus Failure

The first entry called Qualities of Data Architecture was a promise to talk about data quality success stories. The reason for emphasizing on success stories related to data quality is a feeling that data quality improvement is too often promoted by horror stories telling about how bad your business may go if you don’t pay attention to data quality.

The problem is that stories about failure usually aren’t taken too seriously. Jim Harris recently had a very good take on that in the post Data Quality and Chicken Little Syndrome.

So, I plan to tell even more success stories along with the inevitable stories about failure that so easily and obviously could have been avoided.

Getting Social

Using social networks to promote your blogging is quite natural.

At the same time social networks has emerged as new source in doing master data management (I call this Social MDM).

Exploring this new discipline over the hype peak, down through the valley of disappointment and up to the plateau of productivity will for sure be a recurring subject on this blog.

People, Processes and Technology

Sometimes you see a statement like “Data Quality is not about technology, it’s all about people”.

Well, most things we can’t solve easily are not just about one thing. In my eyes the old cliché about addressing people, processes and technology surely also relates to getting data quality right.

There are many good blogs around about people and processes. On this blog I’ll try to tell about my comfort zone being technology without forgetting people and processes.

The Hidden Agenda

Most people blogging are doing this to promote our (employers) expertise, services and tools and I am not different.

Lately I have written a lot about a second to none cloud based service for upstream data quality prevention. The wonder is called instant Data Quality.

While upstream prevention is the best approach to data quality still a lot of work must be done every day in downstream cleansing as told in the post Top 5 Reasons for Downstream Cleansing.

As I’m also working with a new stellar cloud based platform for data quality improvement productivity I will for sure share some props for that in the near future.

Bookmark and Share

Data Driven Data Quality

In a recent article Loraine Lawson examines how a vast majority of executives describes their business as “data driven” and how the changing world of data must change our approach to data quality.

As said in the article the world has changed since many data quality tools were created. One aspect is that “there’s a growing business hunger for external, third-party data, which can be used to improve data quality”.

Embedding third-party data into data quality improvement especially in the party master data domain has been a big part of my data quality work for many years.

Some of the interesting new scenarios are:

Ongoing Data Maintenance from Many Sources

As explained in the article on Wikipedia about data quality services as the US National Change of Address (NCOA) service and similar services around the world has been around for many years as a basic use of external data for data quality improvement.

Using updates from business directories like the Dun & Bradstreet WorldBase and other national or industry specific directories is another example.

In the post Business Contact Reference Data I have a prediction saying that professional social networks may be a new source of ongoing data maintenance in the business-to-business (B2B) realm.

Using social data in business-to-consumer (B2C) activities is another option though also haunted with complex privacy considerations.

Near-Real-Time Data Enrichment

Besides updating changes of basic master data from business directories these directories typically also contains a lot of other data of value for business processes and analytics.

Address directories may also hold further information like demographic stereotype profiles, geo codes and property data elements.

Appending phone numbers from phone books and checking national suppression lists for mailing and phoning preferences are other forms of data enrichment used a lot related to direct marketing.

Traditionally these services have been implemented by sending database extracts to a service provider and receiving enriched files for uploading back from the service provider.

Lately I have worked with a new breed of self service data enrichment tools placed in the cloud making it possible for end users to easily configure what to enrich from a palette of address, business entity and consumer/citizen related third-party data and executing the request as close to real-time as the volume makes it possible.

Such services also include the good old duplicate check now much better informed by including third-party reference data.

Instant Data Quality in Data Entry

As discussed in the post Avoiding Contact Data Entry Flaws third-party reference data as address directories, business directories and consumer/citizen directories placed in the cloud may be used very efficiently in data entry functionality in order to get data quality right the first time and at the same time reduce the time spend in data entry work.

Not at least in a globalized world where names of people reflect the diversity of almost any nation today, where business names becomes more and more creative and data entry is done at shared service centers manned with people from cultures with other address formatting rules, there is an increased need for data entry assistance based on external reference data.

When mashing up advanced search in third-party data and internal master when doing data entry you will solve most of the common data quality issues around avoiding duplicates and getting data as complete and timely as needed from day one.

Bookmark and Share

Goals are Important

A big thing going on in Europe right now is the Euro 2012 football (soccer) championship. 16 national teams are competing for the European Champion title.

People like me not being a subject matter expert may have difficulties seeing above national preferences and evaluating who is the best team. Is it:

  • The team having the highest ball possession percentage,
  • the team with the most handsome legs (my wife says so) or
  • the team with the most expensive players?

Therefore TV channels have experts in the studio. Well, sometimes they also have difficulties seeing above national preferences, but else they can provide you with analysis of a lot of facets about the game and why some things matters more than other things. Even sometimes an expert is nailing it and tells you: “It’s important to score goals”. Oh yes, I think most of us got that already.

It’s the same with reading articles, blog posts and so about data quality and master data management. Experts may have difficulties seeing above brand preferences but anyhow there is a lot of good stuff about different facets of achieving high quality data and doing master data management the right way and even sometimes an expert is nailing it and tells you: “It’s important to support business goals”. Oh yes, …

Bookmark and Share

Business Contact Reference Data

When working with selling data quality software tools and services I have often used external sources for business contact data and not at least when working with data matching and party master data management implementations in business-to-business (B2B) environments I have seen uploads of these data in CRM sources.

A typical external source for B2B contact data will look like this:

Some of the issues with such data are:

  • Some of the contact data names may be the same real world individual as told in the post Echoes in the Database
  • People change jobs all the time. The external lists will typically have entries verified some time ago and when you upload to your own databases, data will quickly become useless do to data decay.
  • When working with large companies in customer and other business partner roles you often won’t interact with the top level people, but people in lower levels not reflected in such external sources.

The rise of social networks has presented new opportunities for overcoming these challenges as examined in a post (written some years ago) called Who is working where doing what?

However, I haven’t seen so many attempts yet to automate and include working with social network profiles in business processes. Surely there are technical issues and not at least privacy considerations in doing so as discussed in the post Sharing Social Master Data.

Right now we have a discussion going on in the LinkedIn Social MDM group about examples of connecting social network profiles and master data management. Please add your experiences in the group here – and join if you aren’t already a member.

Bookmark and Share

Pulling Data Quality from the Cloud

In a recent post here on the blog the benefits of instant data enrichment was discussed.

In the contact data capture context these are some examples:

  • Getting a standardized address at contact data entry makes it possible for you to easily link to sources with geo codes, property information and other location data.
  • Obtaining a company registration number or other legal entity identifier (LEI) at data entry makes it possible to enrich with a wealth of available data held in public and commercial sources.
  • Having a person’s name spelled according to available sources for the country in question helps a lot with typical data quality issues as uniqueness and consistency.

However, if you are doing business in many countries it is a daunting task to connect with the best of breed sources of big reference data. Add to that, that many enterprises are doing both business-to-business (B2B) and business-to-consumer (B2C) activities including interacting with small business owners. This means you have to link to the best sources available for addresses, companies and individuals.

A solution to this challenge is using Cloud Service Brokerage (CSB).

An example of a Cloud Service Brokerage suite for contact data quality is the instant Data Quality (iDQ™) service I’m working with right now.

This service can connect to big reference data cloud services from all over the world. Some services are open data services in the contact data realm, some are international commercial directories, some are the wealth of national reference data services for addresses, companies and individuals and even social network profiles are on the radar.

Bookmark and Share

The Secret Behind Good Data Quality

This post is inspired by a little tweet chat I had with Daragh O Brien this morning:

The data quality angle was that a simple data quality rule around age (or date of birth) for living persons would be a check creating a warning if age is above 122, because this would, if true, be a new entry in the book of records.

Jeanne Louise Calment of France had the longest confirmed human life of span being 122 years.

Your data quality age check may even be refined as the record for a male is 115 years.

Christian Mortensen, born in Denmark and deceased in the United States, holds that record.

Both Jeanne Calment and Christian Mortensen have shared their secret behind a long life.

Surprisingly both recipes include what is usually not considered good for your health.

Jeanne Calment recommended a diet of port wine and she ate nearly one kilogram of chocolate every week.

Christian Mortensen on the other hand recommended lots of good water and no alcohol – but then a good cigar.

Even though there are lots of recipes and examples out there for a good health and a long life, there is probably no single one way and as told in the post Miracle Food for Thought:

“The facts about the latest dietary discoveries are rarely as simple as the headlines imply. Accurately testing how any one element of our diet may affect our health is fiendishly difficult. And this means scientists’ conclusions, and media reports of them, should routinely be taken with a pinch of salt.”

It’s about the same with data quality, isn’t it?

Accurately testing how any one element of our data may affect our business is fiendishly difficult. So predictions of return of investment (ROI) from data quality improvement are unfortunately routinely taken with a big spoon of salt.

Also as discussed in the post Turning a Blind Eye to Data Quality there are plenty of examples of business success despite of poor data quality.

So, no, there is no single secret behind good data quality. But there is a wealth of good practices, tools and services to choose from out there.

For example I’m not sure I like instant oatmeal – but Instant Data Enrichment for instant Data Quality are good ones for you. I promise.

Bookmark and Share

Obscure Date and Time Formats

Date and time can be represented in many ways.

Here are some of the peculiar ones:

Roman Numerals

The Romans had a numbering system where letters from the Latin alphabet signified a value. Roman numerals are still used around the clock and many times for expressing a year something is build, written or made.

This year being 2012 in Arabic numerals is MMXII in Roman numerals. Next year is MMXIII and the year after is of course MMXIIII. No wait, it is MMXIV.

The 12-Hour Clock

A day consists of 24 hours. So naturally 5 hours into the day will be 5:00 and 17 hours into the day will be 17:00. But no. Several countries around the world still stick to the 12-hour clock writing 5:00 AM and 5:00 PM. And in most countries verbal use of the 12-hour clock is common.

The American Date Format

A date consists of three elements: Day, Month and Year.

So to most of the world yesterday the 1st June 2012 will be: 01/06/2012

If you insist using an ISO standard, you’ll do it backward: 2012-06-01

However, if you are from the United States, you’ll do it awkward: 06/01/2012

Even if you are a US data quality tool vendor selling to the whole world, you will still do it awkward:

Blog post published 1st June 2012. Flip that date! – as it will be 6th January to the rest of the world.

Best practice will be writing June 1st 2012 or in other way avoiding ambiguity.

Bookmark and Share