Identity Resolution and Social Data

Fingerprint
Identity Resolution

Identity resolution is a hot potato when we look into how we can exploit big data and within that frame not at least social data.

Some of the most frequent mentioned use cases for big data analytics revolves around listening to social data streams and combine that with traditional sources within customer intelligence. In order to do that we need to know about who is talking out there and that must be done by using identity resolution features encompassing social networks.

The first challenge is what we are able to do. How we technically can expand our data matching capabilities to use profile data and other clues from social media. This subject was discussed in a recent post on DataQualityPro called How to Exploit Big Data and Maintain Data Quality, interview with Dave Borean of InfoTrellis. In here InfoTrellis “contextual entity resolution” approach was mentioned by David.

The second challenge is what we are allowed to do. Social networks have a natural interest in protecting member’s privacy besides they also have a commercial interest in doing so. The degree of privacy protection varies between social networks. Twitter is quite open but on the other hand holds very little usable stuff for identity resolution as well as sense making from the streams is an issue. Networks as Facebook and LinkedIn are, for good reasons, not so easy to exploit due to the (chancing) game rules applied.

As said in my interview on DataQualityPro called What are the Benefits of Social MDM: It is a kind of a goldmine in a minefield.

Bookmark and Share

The Intersections of Big Data, Data Quality and Master Data Management

This blog has since 2009 been very much about the intersection between Master Data Management (MDM) and data quality. These two disciplines are closely related as the vast majority of work with data quality improvement going on is related to master data taking some slightly different forms depending on if we are fighting with party master data, product master data, location master data or other master data domains.

Big Data Quality MDMIn mid 2011 the term big data became more popular than data quality as reported in post Data Quality vs Big Data. After initial euphoria about big data and focus on the analytical side of big data the question about big data quality has fortunately gained traction. Apart from the quality of the algorithms used in big data analytics the quality of the big data is definitely a factor to be taken very serious when deciding to act on the outcomes of big data analytics.

There are questions about the quality of the big data itself as for example told in the post Crap, Damned Crap, and Big Data. This story is about social data and how crappy these data streams may be. Another prominent flavor of big data is sensor data where there also may be issues of data quality as in the example mentioned in the post Going in the Wrong Direction.

As examined in the latter example the quality of big data will in many cases have to be measured by how well big data relates to internal master data and external reference data. You may find more examples of that in the post Big Data and Multi-Domain Master Data Management.

Bookmark and Share

Data is the new petroleum

oil”Data is the new oil” is a well-known term today used to emphasize on the fact that data and your ability to exploit data can make you rich.

The rise of big data has put some more fire to this burning issue indeed with the variant saying “Big data is the new oil”.

Now, as oil is many things, data is many things too. As few of us actually use crude oil, also called petroleum, few of us don’t use raw data to get rich. We use information distilled from raw data for specific purposes. One example is examined in the post Mashing Up Big Reference Data and Internal Master Data.

This brings me to that we have the question of quality of oil just as we have the question of the quality of data as explained nicely by Ken O’Connor in the post Data is the new oil – what grade is yours?

Bookmark and Share

Four Flavors of Big Reference Data

In the post Five Flavors of Big Data the last flavor mentioned is “big reference data”.

The typical example of a reference data set is a country table. This is of course a very small data set with around 250 entities. But even that can be complicated as told in the post The Country List.

Reference data can be much bigger. Some flavors of big reference data are:

  • Third-party data sources
  • Open government data
  • Crowd sourced open reference data
  • Social networks

Third-party data sources:

The use of third-part data within Master Data Management is discussed in the post Third-Party Data and MDM. These data may also have a more wide use within the enterprise not at least within business intelligence.

Examples of such data sets are business directories, where the Dun & Bradstreet World Base as probably the best known one today counts over 200 million business entities from all over the world. Another example is address and property directories.

Open government data

The above mentioned directories are often built on top of public sector data which are becoming more and more open around the world. So an alternative is digging directly into the government data.

Crowd sourced open reference data

There are plenty of initiatives around where directories similar to the commercial and government directories are collected by crowd-sourcing and shared openly.

Social networks

In social networks profile data are maintained by the entities in question themselves which is a great advantage in terms of timeliness of data.

London Big Data Meet-up

If you are in London please join the TDWI UK and IRM UK complimentary London meet-up on big data on the 19th February 2014 where I will elaborate on the four flavors of big reference data.

Bookmark and Share

How Mature are Big Data Maturity Models?

The rise of big data creates a lot of well known side effects. One of them is maturity models.

Here’s a Big Data Maturity Model from 2012. The Data Warehousing Institute has introduced their TDWI Big Data Maturity Model and Assessment Tool. And yesterday over at John Radcliffe’s blog there is an introduction of another Big Data Maturity Model.

Big Data Maturity Model Radcliffe
Click on image to go to John’s blog for more maturity.

The concept of a maturity model is well established since the Capability Maturity Model (CMM) of software development was born and probably we will also see a big data immaturity model one day.

As organizations will be climbing up the steps of the big data maturity models we will learn more about what’s up there and indeed we already know something because the use of big data started long before the use of the term big data.

What do you think: How mature are big data maturity models?

Bookmark and Share

New Oxford Dictionary Entries in 2013

Well, selfie was selected as the new word of the year in the Oxford English Dictionary and indeed that choice was celebrated with the buzzworthy selfie taken at the memorial services for Nelson Mandela this week.

selfie

Big data also made it to the list of well explained terms as told in this post: OK, so big data is about size (and veracity).

And finally, after a little social sharing of this post on my phablet, I srsly think I will have a digital detox.

Bookmark and Share

Do You Like the Lake?

CapgemeniToday Capgemini as a result of a co-innovation partnership with Pivotal released their take on information management in the big data era in a piece called The Principles of the Business Data Lake.

The business data lake concept is a new try on getting rid of all the excel spreadsheets business people operate because of limitations in today’s enterprise data warehouses and the business intelligence solutions sitting on top of those extracted, transformed and loaded data.

In the business data lake you load raw data including unstructured data sources. Single view and related governance is restricted to master and reference data.

It’s not that you are going to load all the data in the world in your business data lake. You will link internal and external data based on where and when needed.

Thomas Redman has made a famous metaphor in the data quality realm about a polluted lake where the best option to deal with that is to prevent polluted water from streaming into the lake. I guess the rise of big data challenges that take as told some years ago in the post Extreme Data Quality.

In the business data lake we will have polluted data. In that view I think it’s a good thing that master and reference data has a special place in the lake.

What do you think? Do you like the lake – the old and/or the new one?

Bookmark and Share

Trust in External Data is Like Trust in Analysts

The analyst industry is like any other industry. Analysts compete. Mostly analysts do it by presenting what is supposed to be more trustworthy reports than the other ones do including their special visualization method be that a quadrant, landscape, bulls eye or whatever approach . And sometimes they compete by bashing the other ones.

ukraine_fight
MDM market analysts meetup

This week I had a blog post called A Little Bit of Truth vs A Big Load of Trust. The post cites a blog post from Andrew White of Gartner called From MDM to Big Data – From truth to trust. This post again cites an article on SearchDataManagement called Enterprise master data management and big data: A well-matched pair?

Andrew White’s post praises the views of fellow Gartner analyst Ted Friedman in the SearchDataManagement article and bashes the views of the other contributors being Evan Levy, Andy Hayler (Information Difference), Aaron Zornes of the MDM Institute and Kelly O’Neal by saying:

“… presumably since the thinking out there in the cited analyst community has not gotten very far yet.”

Indeed, you have to consider multiple opinions out there when it comes to Master Data Management (MDM), big data and other external data. The same way there are, when it comes to the data, multiple versions of the truth out there and you have, with Andrew White’s words, to: “..manage and govern trust in someone else’s data”.

Bookmark and Share

About Big Data and Doing It

The below saying has become a popular share around in social media:

“Big data is like teenage sex. Everybody talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.”

Indeed, there is quite a lot of hype around big data as for example told in The Big MDM Trend.

big data and teenage sexThe teenage sex joke isn’t new at all. It has been used about a lot of new trends. I remember when the e-Business hype started, the joke was used here as well as you still can find some evidence about if googling the saying and getting this and that.

Today e-Business has matured and maybe a few brick and mortar bookstores have stopped laughing about the e-Business and teenage-sex joke now.

Also, maybe the joke says more about parents’ knowledge about teenage-sex.

Bookmark and Share

So You Think You Can Handle Big Data?

It has often been put forward that one might think that it’s strange that everyone think they can make sense out of big data while even the supposed best ones can’t get small data right.

Phone tappingA good reminder of that is reported by Gary Allemann in the post Data quality error embarrasses US. The post tells the story and learning from a recent incident, where a former South African anti-apartheid fighter was detained in the United States because he was still on a terrorist list +many years after the world finally has changed view about bad guys and good guys in that struggle.

So, while we have no doubt about that the United States security agencies are able to collect and store big data about almost every person (friends and enemies all together) we may have our doubts if these guys are able to make any sense of it if they don’t know who is naughty and who is nice at a given time.

Bookmark and Share