Fake, Snoopy, Kitty and Duplicate Social Media Profiles

As a data quality practitioner I have never been in doubt that when it is said that FaceBook has 900 million profiles, that doesn’t mean that 900 million people have a Facebook profile.

Some people have more than one profile. Some people who had a profile are not among us anymore. As reported by BBC in the article Facebook ‘likes’ and adverts’ value doubted, some profiles are fake resulting in FaceBook earning real money that should have been fake money.

Even some profiles are not really fake but serves other purposes like a snoopbook account created to reveal fraud.

And then some profiles belongs to (the owners of) real cats, as reported by James Standen in a comment to my post called Out of Facebook.

On another social media platform, Twitter, I am guilty of having 5 profiles. Besides my real account hlsdk I have created hldsk, hsldk and hlsdq, so I have been able to thank people mentioning me with a wrong spelled handle. And then there is my female side: MissDqPiggy.

Bookmark and Share

Social MDM, Privacy and Data Quality

The term “Social MDM” has been promoted quite well this week not at least as part of the social media information stream from the ongoing user conference of the tool vendor Informatica.

In a blog post called Informatica 9.5 for Big Data Challenge #2: Social Jody Ko of Informatica introduces the opportunities and challenges.

In the closing remarks Judy says: “There’s still a long way to go to bring social data into the mainstream enterprise, in part due to concerns over privacy and the potential “creepiness” factor of mining social data.”

As I understand it the spearhead Social MDM part of the tool release is a Facebook App that provides connectivity between Facebook and the MDM solution.

Industry analyst R “Ray” Wang examines this in the blog post News Analysis: Informatica Launches MDM 9.5. The analysis states that it now is time to “drive data out of Facebook and not into Facebook”.

The opportunities and challenges of driving data out of Facebook was discussed in a post called exactly Out of Facebook here on the blog some years ago.

Balancing privacy with data hoarding is still for sure a subject that in no way is settled and probably never will be.

Connecting systems of record in traditional MDM solutions with social network profiles is in no way a walk over too. The classic data quality challenges with uniqueness of records and completeness of data only gets more difficult, but also, there are great opportunities for getting a better picture of your customers and other business partners.

If you are interested in Social MDM and the related challenges and opportunities there is a LinkedIn group for Social MDM.

The group is new, less than a month old at the present time, but there is already a lot of content to dip into, including:

Bookmark and Share

Social MDM and Systems of Engagement

Social Master Data Management has been an interest of mine the last couple of years and last week I have tried to reach out to others in exploring this new era of Master Data Management by creating a group on LinkedIn called Social MDM.

When reading a nice blog with the slogan ”Welcome to the Real (IT) World!” by Max J. Pucher I came across a good illustration by John Mancini showing the history of IT and how the term “Systems of Record” is being replaced (or at least supplemented) by the term “Systems of Engagement”:

Master Data Management (MDM) includes having a System of Record (SOR) describing the core entities that takes part in the transactional systems of record that supports the daily business in every organization. For example a golden MDM record is describing the party that acts as a customer on an order record while the products in the underlying order lines are described in golden MDM records for the things dealt with within the organization.

Social Master Data Management (Social MDM) will be about supplementing that System of Record so we are able to further describe the parties taking part in the new Systems of Engagement and link with the old Systems of Records. These parties are reflected as social network profiles that are owned by the same human beings who are our (prospective) customers, part of the same household or are a contact for a company being a (prospective) customer or any other business partner.

For a guy like me who started in IT in the mainframe era (just after it had ended according to the above illustration) and went on with mini computers, PC’s and the internet it’s very exciting to be moving on into the social and cloud era.

It will be good to be joined by even more data quality and MDM practitioners and anyone else in the LinkedIn Social MDM group.

Bookmark and Share

The Big Search Opportunity

The other day Bloomberg Businessweek had an article telling that Facebook Delves Deeper Into Search.

I have always been advocating for having better search functionality in order to get more business value from your data. That certainly also applies to big data.

In a recent post called Big Reference Data Musings here on the blog, the challenge of utilizing large external data sources for getting better master data quality was discussed. In a comment Greg Leman pointed out, that there often isn’t a single source of the truth, as you for example could expect from say a huge reference data source as the Dun & Bradstreet WorldBase holding information about business entities from all over the world.

Indeed our search capabilities optimally must span several sources. In the business directory search realm you may include several sources at a time like supplementing the D&B  WorldBase with for example EuroContactPool, if you do business in Europe, or the source called Wiki-Data (under rename to AvoxData) if you are in financial services and wants to utilize the new Legal Entity Identifier (LEI) for counterparty uniqueness in conjunction with other more complete sources.

As examined in Search and if you are lucky you will find combining search on external reference data sources and internal master data sources is a big opportunity too. In doing that you, as described the follow up piece named Wildcard Search versus Fuzzy Search, must get the search technology right.

I see in the Bloomberg article that Facebook don’t intend to completely reinvent the wheel for searching big data, as they have hired a Google veteran, the Danish computer scientist Lars Rasmussen, for the job.

Bookmark and Share

Real World Identity

How far do you have to go when checking your customer’s identity?

This morning I read an article on the Danish Computerworld telling about a ferry line now dropping a solution for checking if the passenger using an access card is in fact the paying customer by using a lightweight fingerprint stored on the card. The reason for dropping was by the way due to the cost of upgrading the solution compared to future business value and not any renewed privacy concerns.

I have been involved in some balancing of real world alignment versus fitness for use and privacy in public transport as well as described in the post Real World Alignment. Here it was the question about using a national identification number when registering customers in public transportation.

As citizens of the world we are today used to sometimes having our iris scanned when flying as our passport holds our unique identification that way. Some of the considerations around using biometrics in general public registration were discussed in the post Citizen ID and Biometrics.

In my eyes, or should we say iris, there is no doubt that we will meet an increasing demand of confirming and registering our identification around. Doing that in the fight against terrorism has been there for long. Regulatory compliance will add to that trend as told in the post Know Your Foreign Customer, mentioning the consequences of the FATCA regulation and other regulations.

When talking about identity resolution in the data quality realm we usually deal with strings of text as names, addresses, phone numbers and national identification numbers. Things that reflect the real world, but isn’t the real world.

We will however probably adapt more facial recognition as examined in the post The New Face of Data Matching. We do have access to pictures in the cloud, as you may find your B2C customers picture on FaceBook and your B2B customer contacts picture on LinkedIn or other similar services. It’s still not the real world itself, but a bit closer than a text string. And of course the picture could be false or outdated and thus more suitable for traction on a dating site.

Fingerprint is maybe a bit old fashioned, but as said, more and more biometric passports are issued and the technology for iris and retinal scanning is used around for access control even on mobile devices.

In the story starting this post the business value for reinvesting in a biometric solution wasn’t deemed positive. But looking from the print on my fingers down to my hand lines I foresee some more identity resolution going beyond name and address strings into things closer to the real world as facial recognition and biometrics.

Bookmark and Share

Sharing Social Master Data

If a company runs a Customer Relationship Management (CRM) system all employees are supposed to enter their interactions with customers and prospects including adding new accounts and contacts if it’s the first engagement.

With the rise of social networks first engagements are increasingly done in those networks. Furthermore new employees often bring old contacts from former employments with them thus utilizing an established relationship that probably is manifested in one or more already existing social network connections.

As explained in the post Social Master Data Management the term ”Social CRM” has been around for a while. We now see CRM solutions where the account and contact master data primarily is build on extracting those data from social networks.

I have just tried out such a solution called Nimble.

If you are more than a one-man-band company it’s interesting in what degree you are willing (or forced) to share your connections as master data entities for the CRM solution.

In Nimble you have the choice of differentiate for each network. I would probably freely choose a setup with Twitter and LinkedIn as shared with the team, but Facebook as private:

But that is just how I think based on my way of using social networks.

There is a fundamental data quality versus privacy issue around utilizing employee’s social network connections as master data for CRM and eventually enterprise wide Master Data Management (MDM).

All things equal data quality will be best if everyone contributes within reason. Not at least in sales, but also more or less in other functions, you are hired also because of your relations.

What do you think?

Bookmark and Share

Informatics for adding value to information

Recently the Global Agenda Council on Emerging Technologies within the World Economic Forum has made a list of the top 10 emerging technologies for 2012. According to this list the technology with the greatest potential to provide solutions to global challenges is informatics for adding value to information.

As said in the summary: “The quantity of information now available to individuals and organizations is unprecedented in human history, and the rate of information generation continues to grow exponentially. Yet, the sheer volume of information is in danger of creating more noise than value, and as a result limiting its effective use. Innovations in how information is organized, mined and processed hold the key to filtering out the noise and using the growing wealth of global information to address emerging challenges.”

Big data all over

Surely “big data” is the buzzword within data management these days and looking for extreme data quality will be paramount.

Filtering out the noise and using the growing wealth of global information will help a lot in our endurance to make a better world and to make better business.

In my focus area, being master data management, we also have to filtering out the noise and exploit the growing wealth of information related to what we may call Big Master Data.

Big external reference data

The growth of master data collections is also seen in collections of external reference data.

For example the Dun & Bradstreet Worldbase holding business entities from around the world has lately grown quickly from 100 million entities to over 200 millions entities. Most of the growth has been due to better coverage outside North America and Western Europe, with the BRIC countries coming in fast. A smaller world resulting in bigger data.

Also one of the BRICS, India, is on the way with a huge project for uniquely identifying and holding information about every citizen – that’s over a billion. The project is called Aadhaar.

When we extend such external registries also to social networking services by doing Social MDM, we are dealing with very fast growing number of profiles in Facebook, LinkedIn and other services.

Surely we need informatics for adding the value of big external reference data into our daily master data collections.

Bookmark and Share

Klout Data Quality

Today it was announced that yet a social media service has passed a 100 million mark, as now 100 Million People have Klout.

Klout is a service that measures your online influence based on your activity on Twitter, LinkedIn, FaceBook and so on. The main measure is a score between 1 and 100.


As many others I have from time to time been tempted to have a narcissistic look at my profile. I haven’t recorded it, but it seems to me that some of the other attributes on Klout changes a lot. Or maybe it’s just me who is moving around in the social media realm in all directions.

Today my Klout style is being a “broadcaster”. And that may be right, as I’m re-tweeting a lot of links. But I’m sure I was a “specialist” the last time I checked, and that is in the opposite corner of the style quadrant. Well, never mind, every description of the styles is positive.

Klout also have beliefs in what topics you are influential about. One of my top 10 topics is “magic”. I think I must be more careful about tweeting about “data quality magic”. Another topic of mine is “Tripoli”. That’s right too; I did make one tweet about Tripoli that ended up as an information quality trainwreck.

Unfortunately I’m not influential about data quality or MDM at all. I’ll have to work on that.

Bookmark and Share

Big Master Data

Right now I am overseeing the processing of yet a master data file with millions of records. In this case it is product master data also with customer master data kind of attributes, as we are working with a big pile of author names and related book titles.

The Big Buzz

Having such high numbers of master data records isn’t new at all and compared to the size of data collections we usually are talking about when using the trendy buzzword BigData, it’s nothing.

Data collections that qualify as big will usually be files with transactions.

However master data collections are increasing in volume and most transactions have keys referencing descriptions of the master entities involved in the transactions.

The growth of master data collections are also seen in collections of external reference data.

For example the Dun & Bradstreet Worldbase holding business entities from around the world has lately grown quickly from 100 million entities to near 200 millions entities. Most of the growth has been due to better coverage outside North America and Western Europe, with the BRIC countries coming in fast. A smaller world resulting in bigger data.

Also one of the BRICS, India, is on the way with a huge project for uniquely identifying and holding information about every citizen – that’s over a billion. The project is called Aadhaar.

When we extend such external registries also to social networking services by doing Social MDM, we are dealing with very fast growing number of profiles in Facebook, LinkedIn and other services.

Extreme Master Data

Gartner, the analyst firm, has a concept called “extreme data” that rightly points out, that it is not only about volume this “big data” thing; it is also about velocity and variety.

This is certainly true also for master data management (MDM) challenges.

Master data are exchanged between organizations more and more often in higher and higher volumes. Data quality focuses and maturity may probably not be the same within the exchanging parties. The velocity and volume makes it hard to rely on people centric solutions in these situations.

Add to that increasing variety in master data. The variety may be international variety as the world gets smaller and we have collections of master data embracing many languages and cultures. We also add more and more attributes each day as for example governments are releasing more data along with the open data trend and we generally include more and more attributes in order to make better and more informed decisions.

Variety is also an aspect of Multi-Domain MDM, a subject that according to Gartner (the analyst firm once again) is one of the Three Trends That Will Shape the Master Data Management Market.

Bookmark and Share

More Social Master Data Management

Yesterday my American cyberspace friend Jim Harris was so kind to send an invitation for Google+ – the new social network service you must hook into. Thanks Jim, now I had to fill in yet a profile, upload the same picture as always and start networking from scratch once again 🙂

As many people I have several profiles in different social network services as Twitter, Facebook and LinkedIn. As I’m doing business also with German speaking countries I also use XING as alternative to LinkedIn as told in the post LinkedIn and the other Thing.

In a comment to that post my Austria based French connection Olivier Mathurin noted: “Disconnected duplicated siloed professional profiles, mmm…”

In a post on this blog called Social Master Data Management made one year ago it is discussed how social CRM will add new sources from social networks to the external reference data sources we already know from old time CRM.

With all the different faces everyone are wearing in the social media realm this isn’t going to be easy and one may consider if social master data management is a wrong path giving the individual nature and built-in privacy in social networking services.    

Well, Gartner (the analyst firm) says that increasing links between MDM and social networks is one of the Three Trends That Will Shape the Master Data Management Market.

So, acknowledging that Gartner predictions are self-fulfilling, you better get moving into LinkedIn, Xing, Viadeo, Twitter, Facebook, (forget MySpace), Google+  and what’s next.

Bookmark and Share