The Big Data Secret of SPECTRE

I’m sorry if this blog is turning into a travel blog. But here’s a third Paris story.

Boulevard Haussmann is one of the city’s great thoroughfares (to use the right meta-data term) and is known to be where we can find the headquarters of SPECTRE.

While visiting SPECTRE today I learned a lot about how SPECTRE is exploiting big data as an important way of keeping up with the tough competition in its industry sector today. But all that is of course a secret.

When asking about if they still has trouble with Bond the answer was:

Jimmy Bond when he was a field agent

“Bond? – Jimmy Bond? – The sexy data scientist who is working for NSA?”

“Oh no, I replied. James Bond.”

“Oh, yes” the SPECTRE chief data manipulator replied. “He was with British Intelligence. But he has been moved to the EU Data Protection Service. He just got his license to fine. Now 2%  and soon 5% of our global turnover each time. Very dangerous man. Very dangerous”.

Bookmark and Share

Sharing is the Future of MDM

Over at the DataRoundtable blog Dylan Jones recently posted an excellent piece called The Future of MDM?

Herein Dylan examines how a lot of people in different organizations spend a lot of time on trying to get complete, timely and unique data about customers and other business partners.

A better future for MDM (Master Data Management) could certainly be that every organization doesn’t have to do the work over and over and again. While self registration by customers is a way of letting off the burden on private enterprises and public sector bodies, we may even do better by not having the customer being the data entry clerk and typing in the same information over and over and again.

Today there are several available options for customer and other business partner reference data:

  • Public sector registries which are getting more and more open being that for example for the address part or even deeper in due respect of privacy considerations which may be different for business entities and individual entities.
  • Commercial directories often build on top of public registries.
  • Personal data lockers like the Mydex service mentioned by Dylan.
  • Social network profiles.

instant Single Customer ViewMy guess is that the future of MDM is going to be a mashup of exploiting the above options.

Oh, and as representatives of such a mashup service we recently at iDQ made sure we had the accurate, complete and timely information filled in on our Linkedin Company profile.

Bookmark and Share

Doctor Livingstone, I Presume?

The title of this blog post is a famous quote from history (which as most quotes are disputed) said by Henry Morton Stanley (who actually was born John Rowlands) when he found Doctor Livingstone (David Livingstone) deep into the African jungle in 1871 after a 6 month expedition with 200 men through unknown territory.

Today it’s much easier to find people. Mobile phone use, credit card transactions and tweet positions leads the way, unless of course you really, really don’t want to be found as it was with Osama bin Mohammed bin Awad bin Laden.

One of the biggest issues in data quality is real world alignment of the data registered about persons. As told in the post out Out of Africa there are some issues in the way we handle such data, as:

  • Cultural diversity: Names, addresses, national ID’s and other basic attributes are formatted differently country by country and in some degree within countries. Most data models with a person entity are build on the format(s) of the country where it is designed.
  • Intended purpose of use: Person master data are often stored in tables made for specific purposes like a customer table, a subscriber table a contact table and so on. Therefore the data identifying the individual is directly linked with attributes describing a specific role of that individual.
  • “Impersonal” use: Person data is often stored in the same table as other party master types as business entities, projects, households et cetera.

Besides that I have found that many organizations don’t use the sources available today in getting data quality right when it comes to contact data.

It’s not that I suggest actually hacking into mobile phone use logs and so. There are a lot of sources not compromising with privacy that let you exploit external reference data as explained in the post Beyond Address Validation.

Bookmark and Share

Hierarchy Management in Social MDM

Hierarchy management is a core feature in master data management (MDM). When it comes to integrating social data and social network profiles into MDM, hierarchy management will be very important too.

Aggregated Level of Social MDM in B2C

The primarily privacy related challenges of social MDM not at least within business-to-consumer (B2C) have been a topic of a lot of blogging lately.  Examples are:

One way of overcoming the privacy considerations is linking to social data and social network profiles at an aggregate level.

Using aggregate level linking is already well known in direct marketing with the use of demographic stereotypes. These stereotypes are based on groups of consumers often defined by their address and/or their age. Combining this knowledge with product master data was examined in the post Customer Product Matrix Management.

Social MDM will add new dimensions to this way of using hierarchies in master data and linking the data across multiple channels without the need to uniquely identify a real world person in every aspect.

Contact Level Social MDM in B2B

As discussed in the post Business Contact Reference Data social network profiles has lot to offer within mastering business-to-business (B2B) contact data.

While access to external reference data at the account level has been around for many years by having available public and commercial (and even open) business directories, the problem of identifying and maintain correct and timely data about the contacts at these accounts has been huge.

Integrating with social networks can help here and social networks are actually also integrating more and more with the traditional business directories. LinkedIn has business directory links for larger companies today and lately I noticed a new professional social network called CompanyBook that is based on linking your profile to a (complete) business directory. By the way: The business directory data available in CompanyBook is surprisingly deep, for example revenue data is free for you to grab.

When it comes to contact data they are basically maintained out there by you. A service like LinkedIn is often described as a recruitment service. In my eyes it is a lot more than that. It is along with similar services a goldmine (within a minefield) for getting MDM within B2B done much better.

Bookmark and Share

Data Driven Data Quality

In a recent article Loraine Lawson examines how a vast majority of executives describes their business as “data driven” and how the changing world of data must change our approach to data quality.

As said in the article the world has changed since many data quality tools were created. One aspect is that “there’s a growing business hunger for external, third-party data, which can be used to improve data quality”.

Embedding third-party data into data quality improvement especially in the party master data domain has been a big part of my data quality work for many years.

Some of the interesting new scenarios are:

Ongoing Data Maintenance from Many Sources

As explained in the article on Wikipedia about data quality services as the US National Change of Address (NCOA) service and similar services around the world has been around for many years as a basic use of external data for data quality improvement.

Using updates from business directories like the Dun & Bradstreet WorldBase and other national or industry specific directories is another example.

In the post Business Contact Reference Data I have a prediction saying that professional social networks may be a new source of ongoing data maintenance in the business-to-business (B2B) realm.

Using social data in business-to-consumer (B2C) activities is another option though also haunted with complex privacy considerations.

Near-Real-Time Data Enrichment

Besides updating changes of basic master data from business directories these directories typically also contains a lot of other data of value for business processes and analytics.

Address directories may also hold further information like demographic stereotype profiles, geo codes and property data elements.

Appending phone numbers from phone books and checking national suppression lists for mailing and phoning preferences are other forms of data enrichment used a lot related to direct marketing.

Traditionally these services have been implemented by sending database extracts to a service provider and receiving enriched files for uploading back from the service provider.

Lately I have worked with a new breed of self service data enrichment tools placed in the cloud making it possible for end users to easily configure what to enrich from a palette of address, business entity and consumer/citizen related third-party data and executing the request as close to real-time as the volume makes it possible.

Such services also include the good old duplicate check now much better informed by including third-party reference data.

Instant Data Quality in Data Entry

As discussed in the post Avoiding Contact Data Entry Flaws third-party reference data as address directories, business directories and consumer/citizen directories placed in the cloud may be used very efficiently in data entry functionality in order to get data quality right the first time and at the same time reduce the time spend in data entry work.

Not at least in a globalized world where names of people reflect the diversity of almost any nation today, where business names becomes more and more creative and data entry is done at shared service centers manned with people from cultures with other address formatting rules, there is an increased need for data entry assistance based on external reference data.

When mashing up advanced search in third-party data and internal master when doing data entry you will solve most of the common data quality issues around avoiding duplicates and getting data as complete and timely as needed from day one.

Bookmark and Share

Business Contact Reference Data

When working with selling data quality software tools and services I have often used external sources for business contact data and not at least when working with data matching and party master data management implementations in business-to-business (B2B) environments I have seen uploads of these data in CRM sources.

A typical external source for B2B contact data will look like this:

Some of the issues with such data are:

  • Some of the contact data names may be the same real world individual as told in the post Echoes in the Database
  • People change jobs all the time. The external lists will typically have entries verified some time ago and when you upload to your own databases, data will quickly become useless do to data decay.
  • When working with large companies in customer and other business partner roles you often won’t interact with the top level people, but people in lower levels not reflected in such external sources.

The rise of social networks has presented new opportunities for overcoming these challenges as examined in a post (written some years ago) called Who is working where doing what?

However, I haven’t seen so many attempts yet to automate and include working with social network profiles in business processes. Surely there are technical issues and not at least privacy considerations in doing so as discussed in the post Sharing Social Master Data.

Right now we have a discussion going on in the LinkedIn Social MDM group about examples of connecting social network profiles and master data management. Please add your experiences in the group here – and join if you aren’t already a member.

Bookmark and Share

Social MDM, Privacy and Data Quality

The term “Social MDM” has been promoted quite well this week not at least as part of the social media information stream from the ongoing user conference of the tool vendor Informatica.

In a blog post called Informatica 9.5 for Big Data Challenge #2: Social Jody Ko of Informatica introduces the opportunities and challenges.

In the closing remarks Judy says: “There’s still a long way to go to bring social data into the mainstream enterprise, in part due to concerns over privacy and the potential “creepiness” factor of mining social data.”

As I understand it the spearhead Social MDM part of the tool release is a Facebook App that provides connectivity between Facebook and the MDM solution.

Industry analyst R “Ray” Wang examines this in the blog post News Analysis: Informatica Launches MDM 9.5. The analysis states that it now is time to “drive data out of Facebook and not into Facebook”.

The opportunities and challenges of driving data out of Facebook was discussed in a post called exactly Out of Facebook here on the blog some years ago.

Balancing privacy with data hoarding is still for sure a subject that in no way is settled and probably never will be.

Connecting systems of record in traditional MDM solutions with social network profiles is in no way a walk over too. The classic data quality challenges with uniqueness of records and completeness of data only gets more difficult, but also, there are great opportunities for getting a better picture of your customers and other business partners.

If you are interested in Social MDM and the related challenges and opportunities there is a LinkedIn group for Social MDM.

The group is new, less than a month old at the present time, but there is already a lot of content to dip into, including:

Bookmark and Share

255 Reasons for Data Quality Diversity

255 is one source of truth about how many countries we have on this planet. Even with this modest list of reference data there are several sources of the truth. Another list may have 262 entries and a third list 240 entries.

As I have made a blog post some years ago called 55 reasons to improve data quality I think 255 fits nice in the title of this post.

The 55 reasons to improve data quality in the former post revolves around name and address uniqueness. In the quest for having uniqueness, and fulfilling other data quality dimensions as completeness and timeliness, a have often advocated for using deep (or big) reference data sources as address directories, business directories and consumer/citizen directories.

Doing so in the best of breed way involves dealing with a huge number of reference data sources. Services claimed to have worldwide coverage often falls a bit short compared to local services using local reference sources.

For example when I lived in Denmark, at tiny place in one corner of the world, I was often amazed how address correction services from abroad only had (sometimes outdated) street level coverage, while local reference data sources provides building number and even suite level validation.

Another example was discussed in the post The Art in Data Matching where the multi-lingual capacities needed to do well in Belgium was stressed in the comments.

Every country has its own special requirement for getting name and address data quality right, the data quality dimensions for reference data are different and governments has found 255 (or so) different solutions to balancing privacy and administrative effectiveness.

Right now I’m working on internationalization and internationalisation of a data and software service called instant Data Quality. This service makes big reference data from all over the world available in a single mashup. For that we need at least 255 partners.

Bookmark and Share

Real World Identity

How far do you have to go when checking your customer’s identity?

This morning I read an article on the Danish Computerworld telling about a ferry line now dropping a solution for checking if the passenger using an access card is in fact the paying customer by using a lightweight fingerprint stored on the card. The reason for dropping was by the way due to the cost of upgrading the solution compared to future business value and not any renewed privacy concerns.

I have been involved in some balancing of real world alignment versus fitness for use and privacy in public transport as well as described in the post Real World Alignment. Here it was the question about using a national identification number when registering customers in public transportation.

As citizens of the world we are today used to sometimes having our iris scanned when flying as our passport holds our unique identification that way. Some of the considerations around using biometrics in general public registration were discussed in the post Citizen ID and Biometrics.

In my eyes, or should we say iris, there is no doubt that we will meet an increasing demand of confirming and registering our identification around. Doing that in the fight against terrorism has been there for long. Regulatory compliance will add to that trend as told in the post Know Your Foreign Customer, mentioning the consequences of the FATCA regulation and other regulations.

When talking about identity resolution in the data quality realm we usually deal with strings of text as names, addresses, phone numbers and national identification numbers. Things that reflect the real world, but isn’t the real world.

We will however probably adapt more facial recognition as examined in the post The New Face of Data Matching. We do have access to pictures in the cloud, as you may find your B2C customers picture on FaceBook and your B2B customer contacts picture on LinkedIn or other similar services. It’s still not the real world itself, but a bit closer than a text string. And of course the picture could be false or outdated and thus more suitable for traction on a dating site.

Fingerprint is maybe a bit old fashioned, but as said, more and more biometric passports are issued and the technology for iris and retinal scanning is used around for access control even on mobile devices.

In the story starting this post the business value for reinvesting in a biometric solution wasn’t deemed positive. But looking from the print on my fingers down to my hand lines I foresee some more identity resolution going beyond name and address strings into things closer to the real world as facial recognition and biometrics.

Bookmark and Share

Sharing Social Master Data

If a company runs a Customer Relationship Management (CRM) system all employees are supposed to enter their interactions with customers and prospects including adding new accounts and contacts if it’s the first engagement.

With the rise of social networks first engagements are increasingly done in those networks. Furthermore new employees often bring old contacts from former employments with them thus utilizing an established relationship that probably is manifested in one or more already existing social network connections.

As explained in the post Social Master Data Management the term ”Social CRM” has been around for a while. We now see CRM solutions where the account and contact master data primarily is build on extracting those data from social networks.

I have just tried out such a solution called Nimble.

If you are more than a one-man-band company it’s interesting in what degree you are willing (or forced) to share your connections as master data entities for the CRM solution.

In Nimble you have the choice of differentiate for each network. I would probably freely choose a setup with Twitter and LinkedIn as shared with the team, but Facebook as private:

But that is just how I think based on my way of using social networks.

There is a fundamental data quality versus privacy issue around utilizing employee’s social network connections as master data for CRM and eventually enterprise wide Master Data Management (MDM).

All things equal data quality will be best if everyone contributes within reason. Not at least in sales, but also more or less in other functions, you are hired also because of your relations.

What do you think?

Bookmark and Share