Fuzzy Social Identities in the Data Quality Realm

In the past years social networks has emerged as a new source of external reference data for Master Data Management (MDM). But surely, there are challenges with the data quality related to this source.

Let’s look at a few examples from inside the data quality tool vendor space.

Who is head of Informatica in the social sphere?

There is a twitter account owned by Sohaib Abbasi:

Sohaib Abbasi

Informatica is one of the leading data quality tool vendors and the CEO there is Sohaib Abbasi.

So, is this the real world individual behind the twitter handle @sabbasi the head of Informatica?

A social graph should indicate so: There’s a bunch of Informatica accounts and people following the handle (though that’s not worth the trouble as there is no tweets coming from there).

What about the one behind Data Ladder?

Data Ladder is another data quality tool provider, thought with a fraction of revenue compared to Informatica.

In a recent post I stumbled upon a strange situation around this company. In the social sphere the company for the last seven years has been represented by a guy called Simon as seen here on LinkedIn:

Simon aka Nathan

But I have reasons to believe that his real world identity is Nathan as explored in the comments to this post.


Data Quality tool vendors: It’s time to get real.

Bookmark and Share

Connecting CRM and MDM with Social Network Profiles

As told on DataQualityPro recently in an interview post about the Benefits of Social MDM, doing social MDM (Master Data Management) may still be outside the radar of most MDM implementations. But there are plenty of things happening with connecting CRM (Customer Relationship Management) and social engagement.

While a lot of the talk is about the biggest social networks as FaceBook and LinkedIn, there are also things going around with more local social networks like the German alternative to LinkedIn called Xing.


Last week I followed a webinar by Dirk Steuernagel of MRM24. It was about connecting your SalesForce.com contact data with Xing.

As said in the MRM24 blog post called Social CRM – Integration von Business Netzwerken in Salesforce.com:

“Our business contacts are usually found in various internal and external systems and on non-synchronized platforms. It requires a lot of effort and nerves to maintain all of our business contacts at the different locations and keep the relevant information up to date.”

(Translated to English by Google and me).


We see a lot of connectors between CRM systems and social networks.

In due time we will also see a lot of connectors between MDM and social networks, which is a natural consequence of the spread of social CRM. This trend was also strongly emphasized on the Gartner (the analyst firm) tweet chat today:

GartnerMDM chat and social MDM

Bookmark and Share

Olympic Darlings and Big Data Experts

The Olympic Games produces two kinds of darlings.

One kind is the big winners as Usain Bolt and Michael Phelps.

The other kind is the big losers. As reported in the post Olympic Moments the 1988 Winter Games had the Brit “Eddie the Eagle” in ski jumping. The 2000 Sydney Summer Games had the swimmer Eric “The Eel” Moussambani. The 2012 London Summer Games now has Hamadou Djibo Issaka in rowing.

The ski jumper Eddie the Eagle came from a country that hates snow and comes to a full stop at the first sight of the white fluffy stuff from above. The rower Hamadou Djibo Issaka comes from Niger, a country almost only covered by desert.

Such braveness in competing way out of your comfort zone naturally brings me to the subject of big data experts.

A while ago I noticed a tweet by Neil Raden:

Oh yes. It’s amazing how many big data experts we have seen emerging in the short life of the big data buzz.

Bookmark and Share

Fake, Snoopy, Kitty and Duplicate Social Media Profiles

As a data quality practitioner I have never been in doubt that when it is said that FaceBook has 900 million profiles, that doesn’t mean that 900 million people have a Facebook profile.

Some people have more than one profile. Some people who had a profile are not among us anymore. As reported by BBC in the article Facebook ‘likes’ and adverts’ value doubted, some profiles are fake resulting in FaceBook earning real money that should have been fake money.

Even some profiles are not really fake but serves other purposes like a snoopbook account created to reveal fraud.

And then some profiles belongs to (the owners of) real cats, as reported by James Standen in a comment to my post called Out of Facebook.

On another social media platform, Twitter, I am guilty of having 5 profiles. Besides my real account hlsdk I have created hldsk, hsldk and hlsdq, so I have been able to thank people mentioning me with a wrong spelled handle. And then there is my female side: MissDqPiggy.

Bookmark and Share

The Secret Behind Good Data Quality

This post is inspired by a little tweet chat I had with Daragh O Brien this morning:

The data quality angle was that a simple data quality rule around age (or date of birth) for living persons would be a check creating a warning if age is above 122, because this would, if true, be a new entry in the book of records.

Jeanne Louise Calment of France had the longest confirmed human life of span being 122 years.

Your data quality age check may even be refined as the record for a male is 115 years.

Christian Mortensen, born in Denmark and deceased in the United States, holds that record.

Both Jeanne Calment and Christian Mortensen have shared their secret behind a long life.

Surprisingly both recipes include what is usually not considered good for your health.

Jeanne Calment recommended a diet of port wine and she ate nearly one kilogram of chocolate every week.

Christian Mortensen on the other hand recommended lots of good water and no alcohol – but then a good cigar.

Even though there are lots of recipes and examples out there for a good health and a long life, there is probably no single one way and as told in the post Miracle Food for Thought:

“The facts about the latest dietary discoveries are rarely as simple as the headlines imply. Accurately testing how any one element of our diet may affect our health is fiendishly difficult. And this means scientists’ conclusions, and media reports of them, should routinely be taken with a pinch of salt.”

It’s about the same with data quality, isn’t it?

Accurately testing how any one element of our data may affect our business is fiendishly difficult. So predictions of return of investment (ROI) from data quality improvement are unfortunately routinely taken with a big spoon of salt.

Also as discussed in the post Turning a Blind Eye to Data Quality there are plenty of examples of business success despite of poor data quality.

So, no, there is no single secret behind good data quality. But there is a wealth of good practices, tools and services to choose from out there.

For example I’m not sure I like instant oatmeal – but Instant Data Enrichment for instant Data Quality are good ones for you. I promise.

Bookmark and Share

Sharing Social Master Data

If a company runs a Customer Relationship Management (CRM) system all employees are supposed to enter their interactions with customers and prospects including adding new accounts and contacts if it’s the first engagement.

With the rise of social networks first engagements are increasingly done in those networks. Furthermore new employees often bring old contacts from former employments with them thus utilizing an established relationship that probably is manifested in one or more already existing social network connections.

As explained in the post Social Master Data Management the term ”Social CRM” has been around for a while. We now see CRM solutions where the account and contact master data primarily is build on extracting those data from social networks.

I have just tried out such a solution called Nimble.

If you are more than a one-man-band company it’s interesting in what degree you are willing (or forced) to share your connections as master data entities for the CRM solution.

In Nimble you have the choice of differentiate for each network. I would probably freely choose a setup with Twitter and LinkedIn as shared with the team, but Facebook as private:

But that is just how I think based on my way of using social networks.

There is a fundamental data quality versus privacy issue around utilizing employee’s social network connections as master data for CRM and eventually enterprise wide Master Data Management (MDM).

All things equal data quality will be best if everyone contributes within reason. Not at least in sales, but also more or less in other functions, you are hired also because of your relations.

What do you think?

Bookmark and Share

Geocoding from 100 Feet Under

I stumbled upon this image posted by Ellie K. on Google+

The title is World map of Flickr and Twitter locations and the legend is that red dots are locations of Flickr pictures, blue dots are locations of Twitter tweets and white dots are locations that have been posted to both.

You may be able to see your city following this link.

For example Copenhagen looks like this:

Here you have Copenhagen in Denmark to the left and Malmoe in Sweden to the right.

The strip between is the fixed link known as the Øresund Bridge.

However the connection isn’t entirely a bridge. If you look at a flyover picture you may think that there wasn’t money enough to finish the connection. Fortunately there was. The part closest to Copenhagen Airport is a 4 kilometer (2.5 miles) undersea tunnel.

So what puzzles me is the dots apparently representing Flickr uploads and tweets made from the tunnel. Are you able to upload to Flickr from down there? How are the tweets geocoded with that precision? My GPS never works when passing the tunnel.

(PS: I know you may geotag when back to surface)

Bookmark and Share

Klout Data Quality

Today it was announced that yet a social media service has passed a 100 million mark, as now 100 Million People have Klout.

Klout is a service that measures your online influence based on your activity on Twitter, LinkedIn, FaceBook and so on. The main measure is a score between 1 and 100.


As many others I have from time to time been tempted to have a narcissistic look at my profile. I haven’t recorded it, but it seems to me that some of the other attributes on Klout changes a lot. Or maybe it’s just me who is moving around in the social media realm in all directions.

Today my Klout style is being a “broadcaster”. And that may be right, as I’m re-tweeting a lot of links. But I’m sure I was a “specialist” the last time I checked, and that is in the opposite corner of the style quadrant. Well, never mind, every description of the styles is positive.

Klout also have beliefs in what topics you are influential about. One of my top 10 topics is “magic”. I think I must be more careful about tweeting about “data quality magic”. Another topic of mine is “Tripoli”. That’s right too; I did make one tweet about Tripoli that ended up as an information quality trainwreck.

Unfortunately I’m not influential about data quality or MDM at all. I’ll have to work on that.

Bookmark and Share

It’s Hard to Be a Data Geek

Sometimes I, along with other folks in my social network circles and groups, describe myself as a data geek.

Another none anonymous data geek, Rich Murnane, recently started a series of excellent cartoons on his blog about DataGeek’s first days on a new job. Hard work indeed.

Then the data geeky corporate twitter account of IBM Initiate has made a twittpoll asking: Do you consider yourself a data geek or a management geek?

It’s a hard question. Because you know that a lot of things about better data is about better management and it’s much more admirable to be a management geek than a poor data geek.

Anyway I stood firm and admitted that I am a data geek. Because the world has always been crowded with management consultants with little attention to the needs of the data. Someone has to take care about the data. It’s hard, but it’s worth it.

Bookmark and Share

More Social Master Data Management

Yesterday my American cyberspace friend Jim Harris was so kind to send an invitation for Google+ – the new social network service you must hook into. Thanks Jim, now I had to fill in yet a profile, upload the same picture as always and start networking from scratch once again 🙂

As many people I have several profiles in different social network services as Twitter, Facebook and LinkedIn. As I’m doing business also with German speaking countries I also use XING as alternative to LinkedIn as told in the post LinkedIn and the other Thing.

In a comment to that post my Austria based French connection Olivier Mathurin noted: “Disconnected duplicated siloed professional profiles, mmm…”

In a post on this blog called Social Master Data Management made one year ago it is discussed how social CRM will add new sources from social networks to the external reference data sources we already know from old time CRM.

With all the different faces everyone are wearing in the social media realm this isn’t going to be easy and one may consider if social master data management is a wrong path giving the individual nature and built-in privacy in social networking services.    

Well, Gartner (the analyst firm) says that increasing links between MDM and social networks is one of the Three Trends That Will Shape the Master Data Management Market.

So, acknowledging that Gartner predictions are self-fulfilling, you better get moving into LinkedIn, Xing, Viadeo, Twitter, Facebook, (forget MySpace), Google+  and what’s next.

Bookmark and Share