Now We Have a Data Governance Tool Market

Do we need data governance tools? This was a question discussed recently here on the blog in the comments to the post called Data Governance Tools: The New Snake Oil?

As mentioned in a comment one analyst firm, Bloor, has actually made a data governance market update with vendors positioned in their bulls-eye style of visualization. Both a data quality market update and the data governance market update can be fetched via Trillium Software here.

The data governance report states that especially regulations has urged organizations to focus on data quality and thereby data governance. Furthermore Bloor says: “Previously, compliance was typically process-focused: you had to prove the lineage of data, for example, but not its accuracy.”

The vendors positioned in the data governance market is pretty much the usual suspects known from the analyst reports on the data quality tool market. Interesting to see that Experian though makes one of the not so frequent appearances in such a report. That must be about accuracy, since Experian is not so known for process-focused tools but indeed for tools using external reference data in order to improve accuracy.

Market Update Data Governance

Bookmark and Share

Data governance tools: The new snake oil?

Traditionally data governance has been around the people and process side of data management. However we now see tools marketed as data governance tools either as a pure play tool for data governance or as a part of a wider data management suite as told in the post Who needs a data governance tool?

Snake-oilThe post refers to a report by Sunil Soares. In this report data governance tools are seen as tools related to six areas within enterprise data management: Data discovery, data quality, business glossary, metadata, information policy management and reference data management.

While IBM have tools for everything, according to the report it does not seem like a single tool cures it all – yet.

But will we go there? If we need tools at all, do we need an all-cure snake oil tool for data governance? Or will we be better off with different lubricants for data discovery, data quality, business glossary, metadata, information policy management and reference data management?

Bookmark and Share

Data Entry by Employees

A recent infographic prepared by Trillium Software highlights a fact about data quality I personally have been preaching about a lot:

Trillium 75 percent

This number is (roughly) sourced from a study by Wayne W. Eckerson of The Data Warehouse Institute made in 2002:

TDWI 76 percent

So, in the fight against bad data quality, a good place to start will be helping data entry personnel doing it right the first time.

One way of achieving that is to cut down on the data being entered. This may be done by picking the data from sources already available out there instead of retyping things and making those annoying flaws.

If we look at the two most prominent master data domains, some ideas will be:

  • In the product domain I have seen my share of product descriptions and specifications being reentered when flowing down in the supply chain of manufacturers, distributors, re-sellers, retailers and end users. Better batch interfaces with data quality controls is one way of coping with that. Social collaboration is another one as told in the post Social PIM.
  • In the customer, or rather party, domain we have seen an uptake of using address validation. That is good. However, it is not good enough as discussed in the post Beyond Address Validation.

Bookmark and Share

Who needs a data governance tool?

Recently Sunil Soares has released a Research Report being An In-Depth Review of Data Governance Software Tools. Link to the place to download the complimentary report is here.

The report examines what a data governance software tool should do and mentions a range of tools from vendors stretching from:

  • A pure play data governance tool vendor as Collibra
  • A one-stop-shopping vendor within data management as Informatica
  • A none-stop-shopping vendor within everything IT as IBM

MDMDG 2013 wordleAs touched in the latest post on this blog, how far a tool should go in covering additional disciplines related to the core discipline is an ever-recurring question. Data governance should for example definitely be a part of a Master Data Management (MDM) programme, here using the British English way of spelling programme versus program to emphasise what MDM should be. As data governance is very much about people and processes and not so much about technology, do you need a tool at all? If you do, do you need a separate best-of-breed tool for the data governance part or will it be preferable to have it as an integrated part of the MDM solution?

Bookmark and Share

When High Quality Data doesn’t Yield High Quality Service

Better data quality is a prerequisite of better quality of service but unfortunately high quality data doesn’t necessarily lead to high quality service when the data flow is broken. This happened to me last night.

ubicabs2When landing in London Heathrow Airport I usually, economically as I am, use the train to reach my doorstep. However, when I have to catch an early morning flight I order a cab, which actually has a very reasonable price. So yesterday I decided to book a cab in order to cut 30 to 40 minutes of the journey home on the expense of a minor amount of extra pounds.

Excellent data capture

Usually I just call the cab, but as I arrived by airplane and my local cab service is part of an online booking service, I used that service for the first time. The user interface is excellent. There is rapid addressing for entering the pick-up place which quickly presented me the possible terminals at Heathrow. The destination was just a smooth. As the pick-up is an airport they prompted me for the flight number. Very nice as that makes tracking delays possible for them and also you can check that the airline and terminal is a correct match.

Also they have an app that I geekly downloaded to my phablet.

Going down

Landing times at Heathrow are difficult to predict as it often happens that your flight has a couple of circles over London before landing due to heavy traffic. Yesterday was good though as we came directly down and therefore were ahead of schedule.

ubicabsSo it was OK that my name wasn’t at the signs held by drivers already waiting at the passenger exit. Actually I was so early that I could have reached the not so frequent direct train home. But as I now already had troubled the driver to go there I of course waited while spending time on the app.

There actually also was a driver tracking on the app. Marvelous. At first glance it seemed the driver was there. But then I noticed a message saying driver tracking wasn’t available and therefore the spot in the terminal 3 building would be my own position or requested pick-up place.

Going crazy

5 minutes after requested time the driver called:

“Where are you Mr. Sorensen?”

“I’m at the passenger exit where all drivers are waiting.”

“OK. I’m just parking the car. Go to the front of the coffee shop and I’ll be there in a few minutes.”

I spotted a coffee shop in front of the lifts to the short stay parking and went over there.

10 minutes later the driver called:

“Where are you Mr. Sorensen?”

“I am in front of the coffee shop”

“Costa Coffee?”

“No. It has a different name…”. After some ping-pong I mentioned terminal 3.

“Terminal 3?” the driver responded. “I’m at terminal 5. I was told to go here. I’ll be with you in 5 minutes”.

Going by car in 5 minutes I wondered. That would indicate crossing the runways or using the train tunnel.

Well, while spending more happy time on the phablet the clock approached the point where I would be at my doorstep using the slow train.

40 minutes after requested time the driver arrived. I was waiting for the mandatory sorry that Brits use even when they are not sorry at all.

Instead the driver greeted me with: “Did you order the cab yourself Mr. Sorensen?”

“Yes I did. On the internet.”

“Internet?” the driver replied.

“Your company has an excellent online booking system” I friendly remarked.

“When I called you first I asked for confirmation about where you were”.

As I realized that he was trying to establish that everything was my fault I presented the confirmation on the app.

ubicabs3We continued (without the usual smalltalk) to the destination. Here the driver (instead of a discount) presented an upgraded version of the price on the booking confirmation.

At that point it was too difficult to keep calm and carry on…..

Bookmark and Share

Trust in External Data is Like Trust in Analysts

The analyst industry is like any other industry. Analysts compete. Mostly analysts do it by presenting what is supposed to be more trustworthy reports than the other ones do including their special visualization method be that a quadrant, landscape, bulls eye or whatever approach . And sometimes they compete by bashing the other ones.

ukraine_fight
MDM market analysts meetup

This week I had a blog post called A Little Bit of Truth vs A Big Load of Trust. The post cites a blog post from Andrew White of Gartner called From MDM to Big Data – From truth to trust. This post again cites an article on SearchDataManagement called Enterprise master data management and big data: A well-matched pair?

Andrew White’s post praises the views of fellow Gartner analyst Ted Friedman in the SearchDataManagement article and bashes the views of the other contributors being Evan Levy, Andy Hayler (Information Difference), Aaron Zornes of the MDM Institute and Kelly O’Neal by saying:

“… presumably since the thinking out there in the cited analyst community has not gotten very far yet.”

Indeed, you have to consider multiple opinions out there when it comes to Master Data Management (MDM), big data and other external data. The same way there are, when it comes to the data, multiple versions of the truth out there and you have, with Andrew White’s words, to: “..manage and govern trust in someone else’s data”.

Bookmark and Share

Anachronism and Data Quality

The term anachronism is used for something misplaced in time. An example is classical paintings where a biblical event is shown with people in clothes from the time when the painting was done.

anachronismIn data quality lingo such a flaw will be categorized as lack of timeliness.

The most frequent example of lack of timeliness, or should we say example of anachronism, in data management today is having an old postal address attached to a party master data entity. A remedy for avoiding this kind of anachronism is explained in the post The Relocation Event.

In a recent blog post called 3-2-1 Start Measuring Data Quality by Janani Dumbleton of Experian QAS the timeliness dimension in data quality is examined along with five other important dimensions of data quality. As said herein an impact of anachronism could be:

“Not being aware of a change in address could result in confidential information being delivered to the wrong recipient. “

Hope you got it.

Bookmark and Share

Everyday Year 2000 Problems

14 years ago this was busy times for computer professionals, including yours truly, because of the upcoming year 2000 apocalypse. The handling of the problem indeed had elements of hysteria, but all in all it was a joint effort by heaps of IT people in meeting a non-postponable deadline around fixing date fields that were too short.

everyday y2k problemsData entry and data storage fields that are too short, have an inadequate format or are missing are frequent data quality issues. Some everyday issues are:

Too short name fields

Names can be very long. But even a moderate lengthy name as Henrik Liliendahl Sørensen can be a problem here and there. Not at least typing your name on Twitter, where the 20 characters name field corresponds very well to the 140 character message length, forces many of us to shorten our name. I found a remedy here from a fellow Sørensen on a work around in the post Getting around the real name length limit in Twitter. Not sure if I’m prepared to take the risk.

Too short and restricted postal code fields

When working with IT solutions in Denmark you see a lot of postal code fields defined as 4 digits. Works fine with Danish addresses but is a real show stopper when you deal with neighboring Swedish and German 5 digit postal codes and not at least postal codes with letters from the Netherlands and the United Kingdom and most other postal codes from around the world.

Missing placeholder for social identities

The rise of social media has been incredible during the last years. However IT systems are lacking behind in support for this. Most systems haven’t a place where you can fill in a social handle. Recently James Taylor wrote the blog post Getting a handle on social MDM. Herein James describes a work around in a IBM MDM solution. Indeed we need ways to link the old systems of records with the new systems of engagement.

Bookmark and Share

The Future of Data Stewardship

Data Stewardship is performed by data stewards.

What is a Data Steward?

A steward may in a general sense be:

  • One employed in a large household or estate to manage domestic concerns – typically an old role.
  • An employee on a ship, airplane, bus, or train who attends passengers needs – typically a new role.

My guess is that data stewardship also will tend to be going from the first kind of role related to data to the latter kind role related to data.

The current data steward role is predominately seen as the oversight of the house-holding related to the internal enterprise data assets. It’s about keeping everything there clean and tidy. It involves having routines and rules that ensure that things with data are done properly according to the traditions and culture in the enterprise.

Big Data Stewardship

In the future enterprises will rely much more on external data. Exploiting third party reference data and open government data and digging into big data sources as social data and sensor data will shift the focus from looking mostly into keeping the internal data fit for purposes.

As such you as a data steward will become more like the steward on a ship, airplane, bus or train. Data will come and go. After a nice welcoming smile you will have to carefully explain about the safety procedures. Some data will be fairly easy to handle – mostly just spending the time sleeping. Other data will be demanding asking for this and that and changing its mind shortly after. Some data will be a frequent traveler and some data will be there for the first time.

So, are you ready to attend the next batch of travelling data on board your enterprise?

star trek enterprise

Bookmark and Share

Names, Addresses and National Identification Numbers

When working with customer, or rather party, master data management and related data quality improvement and prevention for traditional offline and some online purposes, you will most often deal with names, addresses and national identification numbers.

While this may be tough enough for domestic data, doing this for international data is a daunting task.

Names

In reality there should be no difference between dealing with domestic data and international data when it comes to names, as people in today’s globalized world move between countries and bring their names with them.

Traditionally the emphasize on data quality related to names has been on dealing with the most frequent issues be that heaps of nick names in the United States and other places, having a “van” in bulks of names in the Netherlands or having loads of surname like middle names in Denmark.

With company names there are some differences to be considered like the inclusion of legal forms in company names as told in the post Legal Forms from Hell.

UPU S42Addresses

Address formats varies between countries. That’s one thing.

The availability of public sources for address reference data varies too. These variations are related to for example:

  • Coverage: Is every part of the country included?
  • Depth: Is it street level, house number level or unit level?
  • Costs: Are reference data expensive or free of charge?

As told in the post Postal Code Musings the postal code system in a given country may be the key (or not) to how to deal with addresses and related data quality.

National Identification Numbers

The post called Business Entity Identifiers includes how countries have different implementations of either all-purpose national identification numbers or single-purpose national identification numbers for companies.

The same way there are different administrative practices for individuals, for example:

  • As I understand it is forbidden by constitution down under to have all-purpose identification numbers for individuals.
  • The United States Social Security Number (SSN) is often mentioned in articles about party data management. It’s an example of a single-purpose number in fact used for several purposes.
  • In Scandinavian countries all-purpose national identification numbers are in place as explained in the post Citizen ID within seconds.

Dealing with diversity

Managing party master data in the light of the above mentioned differences around the world isn’t simple. You need comprehensive data governance policies and business rules, you need elaborate data models and you need a quite well equipped toolbox regarding data quality prevention and exploiting external reference data.

Bookmark and Share