What Should be Driving Data Quality: Fear or Greed?

Today I attended a nice little event at the British Computer Society. The event was called “Data Surgery” and had sessions with combined presentations and discussions around data management. Among presenters were Julian Schwarzenbach with his beavers and squirrels from the data zoo and Martin “Johari” Doyle of DQ Global discussing data quality.

wet floorIn the data quality session I attended the good old subject of selling data quality was touched and not surprisingly the fear factor was mentioned as a way to go.

While I agree that fear of failure in the form of bad reputation and financial loss is a working concept I have also seen that data quality initiatives based on fear doesn’t stick too long. Similar thoughts were expressed in the Data Quality Pro post called Taking The ‘Fear’ Factor Out Of Data Quality By Duane Smith. Herein Duane says:

“Selling your data quality initiative based on fear may have a short-term pay back, but I believe it will ultimately fail in the longer term.”

euro notesThe opposite approach to relying on fear is counting on greed. That means making better profit by improving data quality. It’s a more sustainable way I think but indeed predicting ROI from a data quality initiative is very hard as examined on the blog page called ROI.

So, most often we fear counting on greed and falls back to greeting the fear.

Bookmark and Share

On Maps, Data Quality and MDM

Maps are great but sometimes you’ll have some trouble with data quality issues on maps as told in the post Troubled Bridge over Water.

When it comes to political borders on maps things may get really nasty as it happened lately for Huawei with a congratulation to Pakistan on the independence day showing a map with borders not in line with the Pakistani version of the truth. The story is told here.

Google EarthThere are plenty of disputes about borders in the world stretching from the serious situations in the Himalaya region to for example the close to comical case between Canada and Denmark/Greenland over Hans Island.

In these situations you can’t settle on a single version of the truth.

However, even if we don’t have disputes on what is right or wrong we may have very different views on how to look at various entities as examined in the post The Greenland Problem in MDM.

Bookmark and Share

Hear ye, hear ye, hear ye

royal-crier

A certain birth in London the other day was widely visualized by the announcement by a royal crier in front of St. Mary’s Hospital.

However, as reported by International Business Times here, the crier in fact just crashed the party, as he wasn’t invited by any Royal party. But the cries and included facts were true right enough.

So, this time everything was OK. But in general it’s amazing how we confuse great visualization and trustworthiness.

Bookmark and Share

OK, so big data is about size (and veracity)

During the rise of the term “big data” there has been a lot of different definitions around trying to shortly express what this very popular term really is about. A lot of these definitions has included a sentiment about that big data is not (only) about size. The tree V’s being Volume, Variety and Velocity has been very popular. A fourth V being Veracity has been added, though this hardly isn’t a definition of big data but rather a desirable capability of big (and any other) data.

OEDBut apparently big data is about size.

The Oxford English Dictionary has now included big data in this authoritative explanation of English words and terms, and big data is:

“Data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges”.

It’s interesting that the challenges that make data big are not about analyzing the data. It is about data manipulation and data management. These are by the way things you do to achieve veracity.

Bookmark and Share

The Internet of Things and the Fat-Finger Syndrome

When coining the term “the Internet of Things” Kevin Ashton said:

“The problem is, people have limited time, attention and accuracy—all of which means they are not very good at capturing data about things in the real world.”

Indeed, many many data quality flaws are due to a human typing the wrong thing. We usually don’t do that intentionally. We do it because we are human.

Typographical errors, and the sometimes dramatic consequences, are often referred to as the “fat-finger syndrome”.

As reported in the post Killing Keystrokes avoiding typing is a way forward for example by sharing data instead of typing in the same data (a little bit differently) within every organization.

IoT Data QualityThe Internet of Things, being common access to data provided by a huge number of well defined devices, is another development in avoiding typos.

It’s not that data coming from these devices can’t be flawed. As debated in the post Social Data vs Sensor Data there may be challenges in sensor data due to errors in a human setting up the sensors.

Also misunderstandings by humans in combining sensor data for analytics and predictions may cause consequences as bad as those based on the traditional fat-finger syndrome.

All in all I guess we won’t see a decrease in the need to address data quality in the future, we just will need to use different approaches, methodologies and tools to fight bad data and information quality.

Are you interested in what all this will be about? Why not joining the Big Data Quality group on LinkedIn?

Bookmark and Share

How important is big data quality?

Along with the rise of big data the question about quality of big data and the importance of taking data quality into consideration when analyzing big data is raised again and again.

We had a poll in the LinkedIn Big Data Quality group. The results are as shown below:

Big Data Important

So, some people consider data quality to be more important for big data than for small data (the data we have analyzed until the rise of big data), some people consider data quality to be less important with big data, but the majority of people who voted (included yours truly), consider the quality of big data to be equally important as it has been with small data.

As expressed in some comments voting “the same” is often an aggregate of some things that are more important and other things that are less important.

Also some people have voted “mu”  (wrong question) and in the comments explained that you really can’t compare small data with big data.

A repeated sentiment in the comments is that data quality for small data is going to be more important with the rise of big data as examined in the post Small Data with Big Impact.

Bookmark and Share

Fuzzy Social Identities in the Data Quality Realm

In the past years social networks has emerged as a new source of external reference data for Master Data Management (MDM). But surely, there are challenges with the data quality related to this source.

Let’s look at a few examples from inside the data quality tool vendor space.

Who is head of Informatica in the social sphere?

There is a twitter account owned by Sohaib Abbasi:

Sohaib Abbasi

Informatica is one of the leading data quality tool vendors and the CEO there is Sohaib Abbasi.

So, is this the real world individual behind the twitter handle @sabbasi the head of Informatica?

A social graph should indicate so: There’s a bunch of Informatica accounts and people following the handle (though that’s not worth the trouble as there is no tweets coming from there).

What about the one behind Data Ladder?

Data Ladder is another data quality tool provider, thought with a fraction of revenue compared to Informatica.

In a recent post I stumbled upon a strange situation around this company. In the social sphere the company for the last seven years has been represented by a guy called Simon as seen here on LinkedIn:

Simon aka Nathan

But I have reasons to believe that his real world identity is Nathan as explored in the comments to this post.

Hmmmm….

Data Quality tool vendors: It’s time to get real.

Bookmark and Share

Happy New Year

Am I too late? Not at all. Today is the last day in the year of the dragon and tomorrow will be the first day in the year of the snake according to the Chinese calendar. It’s the Chinese New Year.

As globalization moves on we are becoming more and more aware of celebrations from different cultures and I guess we will end up having almost every day as a special day.

Next up as I am aware of is the coming Thursday being Valentine’s Day, a day that has gained much in importance during the last decades in many European countries and other places. Not at least taunted by retailers.

In Chinese symbology, snakes are regarded as intelligent, but with a tendency to be somewhat unscrupulous. So I guess Valentine’s day this year will be great (for retailers).

Everything a good reminder of the diversity issues in data quality which is a frequent subject on this blog.

Happy new year and for god’s sake don’t forget Valentine’s Day.

Chinatown_london

Bookmark and Share