What does Twitter Know?

We all know the pain of receiving e-mails with offers that is totally beside what you need.

Now Twitter has joined this spamming habit, which is a bit surprising, because with all the talk about big data and what it can do for prospect and customer insight, you should think that Twitter knows something about you.

Well, apparently not.

I operate two Twitter accounts. One named @hlsdk used for my general interaction with the data management community and one named @ProductDataLake used for a start-up service called Product Data Lake.

For both accounts, I am flooded with e-mails from Twitter about increasing my Holiday sales by using their ad services.


Strange, because:

  • My businesses is not Business-to-Consumer (B2C) being about selling stuff to consumers, where the coming season is a high peak in the Western World. My business is Business-to-Business (B2B) where the coming season when it comes to sales is a stand still in the Western World.
  • In my part of the Western World we don’t use the term Holidays for the coming season. We (still) call it Christmas as told in the post Is the Holiday Season called Christmas Time or Yuletide?
  • In my home country, Denmark, you are not allowed to e-mail businesses with offers in e-mails unless you have actually asked for it. Not sure if Twitter is on the right side of the law here.

Winning by Sharing Data

When I changed my laptop a few months ago, it was the easiest migration to a new computer ever.

Basically I just had to connect to all the services in the cloud I had been using before and for many services the path was to get connected to Google+, Twitter and FaceBook and then connect to many other services via these connections.

ShareThis was a personal win.

Most of the teams I am working with are sharing their data with me in the cloud. As in the bad old days I do not have to call and ask for progress on this and that. I can check the status myself and even get notifications on my phablet when a colleague completes a task.

ShareThis is a shared win.

Within my profession being data quality improvement and Master Data Management (MDM) sharing data is going to be a winning path too as told in the post Sharing is the Future of MDM.

There are several ways of sharing master data like using commercial third party data, digging into open government data, having your own data locker and relying on social collaboration. These options are examined in the post Ways of Sharing Master Data.

Bookmark and Share

Identity Resolution and Social Data

Identity Resolution

Identity resolution is a hot potato when we look into how we can exploit big data and within that frame not at least social data.

Some of the most frequent mentioned use cases for big data analytics revolves around listening to social data streams and combine that with traditional sources within customer intelligence. In order to do that we need to know about who is talking out there and that must be done by using identity resolution features encompassing social networks.

The first challenge is what we are able to do. How we technically can expand our data matching capabilities to use profile data and other clues from social media. This subject was discussed in a recent post on DataQualityPro called How to Exploit Big Data and Maintain Data Quality, interview with Dave Borean of InfoTrellis. In here InfoTrellis “contextual entity resolution” approach was mentioned by David.

The second challenge is what we are allowed to do. Social networks have a natural interest in protecting member’s privacy besides they also have a commercial interest in doing so. The degree of privacy protection varies between social networks. Twitter is quite open but on the other hand holds very little usable stuff for identity resolution as well as sense making from the streams is an issue. Networks as Facebook and LinkedIn are, for good reasons, not so easy to exploit due to the (chancing) game rules applied.

As said in my interview on DataQualityPro called What are the Benefits of Social MDM: It is a kind of a goldmine in a minefield.

Bookmark and Share

Please Retweet

Many moons ago I wondered how my social influence is measured as told in the post Klout Data Quality.

Since then my Klout has dropped a bit from 59 to 57. It does not ruin my day, but I wonder why. A thing that strikes me is from where I get my Klout. It seems Twitter is the place as it counts for 73 % of my Klout. LinkedIn is only 8 %. Personally, I would give them opposite importance.

Klout Network Breakdown

Recently I noticed I was included in a list called Top 200 Thought Leaders in Bigdata Analytics. Honorable maybe. However, I am afraid it merely is a count of how many #Bigdata tags I have used on Twitter relative to others.

What matters to me in social influence seems to be out of scope for Klout, as it is readers and comments on this blog.

What about you. Do you have the right Klout? Is it measured the right way?

Bookmark and Share

Everyday Year 2000 Problems

14 years ago this was busy times for computer professionals, including yours truly, because of the upcoming year 2000 apocalypse. The handling of the problem indeed had elements of hysteria, but all in all it was a joint effort by heaps of IT people in meeting a non-postponable deadline around fixing date fields that were too short.

everyday y2k problemsData entry and data storage fields that are too short, have an inadequate format or are missing are frequent data quality issues. Some everyday issues are:

Too short name fields

Names can be very long. But even a moderate lengthy name as Henrik Liliendahl Sørensen can be a problem here and there. Not at least typing your name on Twitter, where the 20 characters name field corresponds very well to the 140 character message length, forces many of us to shorten our name. I found a remedy here from a fellow Sørensen on a work around in the post Getting around the real name length limit in Twitter. Not sure if I’m prepared to take the risk.

Too short and restricted postal code fields

When working with IT solutions in Denmark you see a lot of postal code fields defined as 4 digits. Works fine with Danish addresses but is a real show stopper when you deal with neighboring Swedish and German 5 digit postal codes and not at least postal codes with letters from the Netherlands and the United Kingdom and most other postal codes from around the world.

Missing placeholder for social identities

The rise of social media has been incredible during the last years. However IT systems are lacking behind in support for this. Most systems haven’t a place where you can fill in a social handle. Recently James Taylor wrote the blog post Getting a handle on social MDM. Herein James describes a work around in a IBM MDM solution. Indeed we need ways to link the old systems of records with the new systems of engagement.

Bookmark and Share

Introducing the Famous Person Quote Checker

quoteAs reported in the post Crap, Damned Crap, and Big Data there are data quality issues with big data.

The mentioned issue is about the use of quotes in social data: A famous person apparently said something apparently clever and the one who makes an update with the quote gets an unusual large amount of likes, retweets, +1s and other forms of recognition.

But many quotes weren’t actually said by that famous person. Maybe it was said by someone else and in many cases there is no evidence that the famous person said it. Some quotes, like the Einstein quote in the Crap post, actually contradicts what they apparently also has said.

As I have worked a lot with data entry functionality checking for data quality around if a certain address actually exist, if a typed in phone number is valid or an eMail address will bounce I think it’s time to make a quote checker to be plugged in on LinkedIn, Twitter, Facebook, Google Plus and other social networks.

So anyone else out there who wants to join the project – or has it already been said by someone else?

Bookmark and Share

On Washing Rental Cars and Shared Data

Recently a tweet from Doug Laney of Gartner has been retweeted a lot:

Rented Car

As most analogies it may fit or maybe not fit seen in different perspectives. Actually rental cars are probably some of the most washed cars as the rental company wash and clean the car between every rental.

In the same way as rental cars usually are quite clean I have also found that sharing data is a powerful way to have clean data as told on the page about Data Quality 3.0. This is also the grounding concept behind the instant Data Quality solution I’m working with, where we have just released our iDQ™ MDM Edition.

Bookmark and Share

Social Score Credibility

A recent piece from Fliptop is called What’s the Score. It is a thorough walk through on what is usually called social scoring done in influence scoring platforms within social media, where Klout, Kred and PeerIndex are the most known services of that kind.

The Fliptop piece has a section around faking, which was also the subject in a post lately on this blog. The post is called Fact Checking by Mashing Up, and is about how to link social network profiles with other known external sources in order to detect cheat. Linking social network profiles with other external sources and internal sources is what is known as Social MDM, a frequent subject on this blog for several years.

A social score must of course be seen in context, as it matters a lot what you are influential about when you want to use social scoring for business. As told in the post Klout Data Quality this was a challenge two years ago, and this is probably still the case. Also here I think linking with other (big) data sources and letting Social MDM be the hub will help.

Taken from Kred on my twitter handle.

PS: I have no idea why moron ended up there. Einstein is OK.

Bookmark and Share

Crap, Damned Crap, and Big Data

Lately Jim Harris made a thought provoking post on the Mike2 blog. The post is called A Contrarian’s View of Unstructured Data.

Herein Jim wrote:

“My contrarian’s view of unstructured data is that it is, in large part, gigabytes of gossip and yottabytes of yada yada digitized, rumors and hearsay amplified by the illusion-of-truth effect and succumbing to the perception-is-reality effect until the noise amplifies so much that its static solidifies into a signal.”

Indeed, the sound of social data may be like that. Yesterday I wrote a post called Keep It Real, Stupid. Herein I mentioned an apparently fake quote by Albert Einstein saying:

“If you can’t explain it simply, you don’t understand it well enough”.

Today I tried to see how the fake quote was doing on Twitter.

OMG: Going on more than one tweet per minute along with some mutations of the quote saying:

“If you can’t explain it to a six-year-old, you don’t understand it yourself”.

“You do not really understand something unless you can explain it to your grandmother”.

OK folks: Sense-making of social data is not going to be simple. Not even relatively simple.

Simply Einstein Tweets

Simply Einstein Tweet 2


Simply Einstein Tweet 3

Bookmark and Share

Social Data vs Sensor Data

Social data sensor data big dataThe two predominant kinds of big data are:

  • Social data and
  • Sensor data

Social data are data born in the social media realm such as facebook likes, linkedin updates, tweets and whatever the data entry we as humans do in the social sphere is called.

Sensor data are data captured by devices of many kinds such as radar, sonar, GPS unit, CCTV Camera, card reader and many more.

There’s a good term called “same same but different” and this term does also in my experience very well describe the two kinds of big data: The social data coming directly from a human hand and the sensor data born by a machine.

Of course there are humans involved with sensor data as well. It is humans who set up the devices and sometimes a human makes a mistake when doing so. Raw sensor data are often manipulated, filtered and censored by humans.

There is indeed data quality issues associated with both kinds of big data, but in slightly different ways. And you surely need to apply master data management (MDM) in order to make some sense of both social data and sensor data as examined in the post Big Data and Multi-Domain Master Data Management.

What is your experience: Is social data and sensor data just big data regardless of source? Is it same same but different? Or are social data and sensor data two separated data worlds just both being big?

Bookmark and Share