Building an instant Data Quality Service for Quotes

In yesterday’s post called Introducing the Famous Person Quote Checker the issue with all the quotes floating around in social media about things apparently said by famous persons was touched.

The bumblebee can’t fly faster than the speed of light – Albert Einstein
The bumblebee can’t fly faster than the speed of light – Albert Einstein

If you were to build a service that could avoid postings with disputable quotes, what considerations would you have then? Well, I guess pretty much the same considerations as with any other data quality prevention service.

Here are three things to consider:

Getting the reference data right

Finding the right sources for say reference data for world-wide postal addresses was discussed in the post A Universal Challenge.

The same way, so to speak, it will be hard to find a single source of truth about what famous persons actually said. It will be a daunting task to make a registry of confirmed quotes.

Embracing diversity

Staying with postal addresses this blog has a post called Where the Streets have one Name but Two Spellings.

The same way, so to speak again, quotes are translated, transliterated and has gone through transcription from the original language and writing system. So every quote may have many true versions.

Where to put the check?

As examined in the post The Good, Better and Best Way of Avoiding Duplicates there are three options:

1)      A good and simple option could be to periodically scan through postings in social media and when a disputable quote is found sending an eMail to the culprit who did the posting. However, it’s probably too late, as even if you for example delete your tweet, the 250 retweets will still be out there. But it’s a reasonable way of starting marking up all the disputable quotes out there.

2)      A better option could be a real-time check. You type in a quote on a social media site and the service prompts you: “Hey Dude, that person didn’t say that”. The weak point is that you already did all the typing, and now you have to find a new quote. But it will work when people try to share disputable quotes.

3)    The best option would be that you start typing “If you can’t explain it simply… “ and the service prompts a likely quote as: “Everything should be as simple as it can be, but not simpler – Albert Einstein”.

Bookmark and Share

Introducing the Famous Person Quote Checker

quoteAs reported in the post Crap, Damned Crap, and Big Data there are data quality issues with big data.

The mentioned issue is about the use of quotes in social data: A famous person apparently said something apparently clever and the one who makes an update with the quote gets an unusual large amount of likes, retweets, +1s and other forms of recognition.

But many quotes weren’t actually said by that famous person. Maybe it was said by someone else and in many cases there is no evidence that the famous person said it. Some quotes, like the Einstein quote in the Crap post, actually contradicts what they apparently also has said.

As I have worked a lot with data entry functionality checking for data quality around if a certain address actually exist, if a typed in phone number is valid or an eMail address will bounce I think it’s time to make a quote checker to be plugged in on LinkedIn, Twitter, Facebook, Google Plus and other social networks.

So anyone else out there who wants to join the project – or has it already been said by someone else?

Bookmark and Share

Time To Turn Your Customer Master Data Management Social?

The title of a post on the Nimble blog has this question: Time To Turn Your Sales Team Social?´ The post has a lot of evidence on why sales teams that embrace social selling are doing better than teams that doesn’t do that.

We do see new applications supporting social selling where Nimble is one example from the Customer Relationship  Management (CRM) sphere as explored in the post Sharing Social Master Data. Using social services and exploiting social data in sales related business processes will over time affect the way we are doing customer master data management.

Social MDM2Apart from having frontend applications being social aware we also need social aware data integration services and we do indeed need social aware Master Data Management (MDM) solutions for handling data quality issues and ensuring a Single Customer View (SCV) stretching from the old systems of record to the new systems of engagement.

One service capable of doing data integration between the old world and the new world is FlipTop and some months ago I was interviewed on the FlipTop blog about the links to Social MDM here. Currently I’m working with a social aware Master Data Management solution being the iDQ™ MDM Edition.

What about you? Are your Customer Master Data Management and related data quality activities becoming social aware?

Bookmark and Share

Social Score Credibility

A recent piece from Fliptop is called What’s the Score. It is a thorough walk through on what is usually called social scoring done in influence scoring platforms within social media, where Klout, Kred and PeerIndex are the most known services of that kind.

The Fliptop piece has a section around faking, which was also the subject in a post lately on this blog. The post is called Fact Checking by Mashing Up, and is about how to link social network profiles with other known external sources in order to detect cheat. Linking social network profiles with other external sources and internal sources is what is known as Social MDM, a frequent subject on this blog for several years.

A social score must of course be seen in context, as it matters a lot what you are influential about when you want to use social scoring for business. As told in the post Klout Data Quality this was a challenge two years ago, and this is probably still the case. Also here I think linking with other (big) data sources and letting Social MDM be the hub will help.

Taken from Kred on my twitter handle.

PS: I have no idea why moron ended up there. Einstein is OK.

Bookmark and Share

Fact Checking by Mashing Up

A recent blog post by Andrew Grill, CEO of Kred, is called Can you spot a social media faker? Fact checking on social media is now becoming even more important.

Besides methods within the social sphere for fact checking, as described in Andrew Grill’s post, I also believe that mashing up social network profiles and traditional external reference data is a great way of getting the full picture.

As explained in the post Sharing is the Future of MDM there are several available external options for checking the facts:

  • Public sector registries which are getting more and more open being that for example for the address part or even deeper in due respect of privacy considerations which may be different for business entities and individual entities.
  • Commercial directories often build on top of public registries.
  • Personal data lockers like Mydex
  • Social network profiles, including credibility (or influence) services

The challenge is of course that there are plenty of external reference data sources as many sources are national, making up 255 or so variants of each data source, as well as there are plenty of social networks and some credibility (or influence) services for that matter.

Making that easy for you is exactly the concept we are working on in the instant Data Quality, iDQ™, concept.


Bookmark and Share

Crap, Damned Crap, and Big Data

Lately Jim Harris made a thought provoking post on the Mike2 blog. The post is called A Contrarian’s View of Unstructured Data.

Herein Jim wrote:

“My contrarian’s view of unstructured data is that it is, in large part, gigabytes of gossip and yottabytes of yada yada digitized, rumors and hearsay amplified by the illusion-of-truth effect and succumbing to the perception-is-reality effect until the noise amplifies so much that its static solidifies into a signal.”

Indeed, the sound of social data may be like that. Yesterday I wrote a post called Keep It Real, Stupid. Herein I mentioned an apparently fake quote by Albert Einstein saying:

“If you can’t explain it simply, you don’t understand it well enough”.

Today I tried to see how the fake quote was doing on Twitter.

OMG: Going on more than one tweet per minute along with some mutations of the quote saying:

“If you can’t explain it to a six-year-old, you don’t understand it yourself”.

“You do not really understand something unless you can explain it to your grandmother”.

OK folks: Sense-making of social data is not going to be simple. Not even relatively simple.

Simply Einstein Tweets

Simply Einstein Tweet 2


Simply Einstein Tweet 3

Bookmark and Share

Keep It Real, Stupid

One of my pet peeves is the KISS principle: Keep It Simple, Stupid.

Don’t get me wrong: It’s worth striving for simplicity wherever possible. But some problems are not simple and have simple solutions. Sometimes KISS is the shortcut to getting it all wrong.

Another take on simplicity is a quote floating around in social media these days:

Simply Einstein

Oh, so Einstein said that. So you can’t argue with that.

Well, he probably didn’t as Wikiquote reports:

Simply Not Einstein

So let’s stick to a real Einstein quote:

“Everything should be as simple as it can be, but not simpler”

A great quote related to data quality and master data management by the way.

Bookmark and Share

New LinkedIn Group: Big Data Quality

BigDataQualityDo we need a LinkedIn group for this and that? It’s always a question. There are already a lot of LinkedIn groups for Big Data and a lot of LinkedIn groups for Data Quality.

However I think we do see targeted discussions and engagement in the niche groups on LinkedIn, so therefore I created a new group about the intersection of Big Data and Data Quality yesterday. The group is called Big Data Quality.

It’s good to see a stampede of people joining (well, 39 within first 24 hours) and see discussions and comments starting.

So, if you haven’t joined already, please do so here.

And why not take part in the fun, maybe just by voting on the question: How important is data quality for big data compared to data quality for small data?

Bookmark and Share

Social Data vs Sensor Data

Social data sensor data big dataThe two predominant kinds of big data are:

  • Social data and
  • Sensor data

Social data are data born in the social media realm such as facebook likes, linkedin updates, tweets and whatever the data entry we as humans do in the social sphere is called.

Sensor data are data captured by devices of many kinds such as radar, sonar, GPS unit, CCTV Camera, card reader and many more.

There’s a good term called “same same but different” and this term does also in my experience very well describe the two kinds of big data: The social data coming directly from a human hand and the sensor data born by a machine.

Of course there are humans involved with sensor data as well. It is humans who set up the devices and sometimes a human makes a mistake when doing so. Raw sensor data are often manipulated, filtered and censored by humans.

There is indeed data quality issues associated with both kinds of big data, but in slightly different ways. And you surely need to apply master data management (MDM) in order to make some sense of both social data and sensor data as examined in the post Big Data and Multi-Domain Master Data Management.

What is your experience: Is social data and sensor data just big data regardless of source? Is it same same but different? Or are social data and sensor data two separated data worlds just both being big?

Bookmark and Share

Fuzzy Social Identities in the Data Quality Realm

In the past years social networks has emerged as a new source of external reference data for Master Data Management (MDM). But surely, there are challenges with the data quality related to this source.

Let’s look at a few examples from inside the data quality tool vendor space.

Who is head of Informatica in the social sphere?

There is a twitter account owned by Sohaib Abbasi:

Sohaib Abbasi

Informatica is one of the leading data quality tool vendors and the CEO there is Sohaib Abbasi.

So, is this the real world individual behind the twitter handle @sabbasi the head of Informatica?

A social graph should indicate so: There’s a bunch of Informatica accounts and people following the handle (though that’s not worth the trouble as there is no tweets coming from there).

What about the one behind Data Ladder?

Data Ladder is another data quality tool provider, thought with a fraction of revenue compared to Informatica.

In a recent post I stumbled upon a strange situation around this company. In the social sphere the company for the last seven years has been represented by a guy called Simon as seen here on LinkedIn:

Simon aka Nathan

But I have reasons to believe that his real world identity is Nathan as explored in the comments to this post.


Data Quality tool vendors: It’s time to get real.

Bookmark and Share