Fact Checking by Mashing Up

A recent blog post by Andrew Grill, CEO of Kred, is called Can you spot a social media faker? Fact checking on social media is now becoming even more important.

Besides methods within the social sphere for fact checking, as described in Andrew Grill’s post, I also believe that mashing up social network profiles and traditional external reference data is a great way of getting the full picture.

As explained in the post Sharing is the Future of MDM there are several available external options for checking the facts:

  • Public sector registries which are getting more and more open being that for example for the address part or even deeper in due respect of privacy considerations which may be different for business entities and individual entities.
  • Commercial directories often build on top of public registries.
  • Personal data lockers like Mydex
  • Social network profiles, including credibility (or influence) services

The challenge is of course that there are plenty of external reference data sources as many sources are national, making up 255 or so variants of each data source, as well as there are plenty of social networks and some credibility (or influence) services for that matter.

Making that easy for you is exactly the concept we are working on in the instant Data Quality, iDQ™, concept.


Bookmark and Share

255 Reasons for Data Quality Diversity

255 is one source of truth about how many countries we have on this planet. Even with this modest list of reference data there are several sources of the truth. Another list may have 262 entries and a third list 240 entries.

As I have made a blog post some years ago called 55 reasons to improve data quality I think 255 fits nice in the title of this post.

The 55 reasons to improve data quality in the former post revolves around name and address uniqueness. In the quest for having uniqueness, and fulfilling other data quality dimensions as completeness and timeliness, a have often advocated for using deep (or big) reference data sources as address directories, business directories and consumer/citizen directories.

Doing so in the best of breed way involves dealing with a huge number of reference data sources. Services claimed to have worldwide coverage often falls a bit short compared to local services using local reference sources.

For example when I lived in Denmark, at tiny place in one corner of the world, I was often amazed how address correction services from abroad only had (sometimes outdated) street level coverage, while local reference data sources provides building number and even suite level validation.

Another example was discussed in the post The Art in Data Matching where the multi-lingual capacities needed to do well in Belgium was stressed in the comments.

Every country has its own special requirement for getting name and address data quality right, the data quality dimensions for reference data are different and governments has found 255 (or so) different solutions to balancing privacy and administrative effectiveness.

Right now I’m working on internationalization and internationalisation of a data and software service called instant Data Quality. This service makes big reference data from all over the world available in a single mashup. For that we need at least 255 partners.

Bookmark and Share

Big Reference Data Musings

The term “big data” is huge these days. As Steve Sarsfield suggest in a blog post yesterday called Big Data Hype is an Opportunity for Data Management Pros, well, let’s ride on the wave (or is it tsunami?).

The definition of “big data” is as with many buzzwords not crystal clear as examined in a post called It’s time for a new definition of big data on Mike2.0 by Robert Hillard. The post suggests that big may be about volume, but is actually more about big complexity.

As I have worked intensively with large amounts of rich reference data, I have a homemade term called “big reference data”.

Big Reference Data Sets

Reference Data is a term often used either instead of Master Data or as related to Master Data. Reference data is those data defined and (initially) maintained outside a single organization. Examples from the party master data realm are a country list, a list of states in a given country or postal code tables for countries around the world.

The trend is that organizations seek to benefit from having reference data in more depth than those often modest populated lists mentioned above.

An example of a big reference data set is the Dun & Bradstreet WorldBase. This reference data set holds around 300 different attributes describing over 200 million business entities from all over world.

This data set is at first glance well structured with a single (flat) data model for all countries. However, when you work with it you learn that the actual data is very different depending on the different original sources for each country. For example addresses from some countries are standardized, while this isn’t the case for other countries. Completeness and other data quality dimensions vary a lot too.

Another example of a large reference data set is the United Kingdom electoral roll that is mentioned in the post Inaccurately Accurate. As told in the post there are fit for purpose data quality issues. The data set is pretty big, not at least if you span several years, as there is a distinct roll for every year.

Big Reference Data Mashup

Complexity, and opportunity, also arises when you relate several big reference data sets.

Lately DataQualityPro had an interview called What is AddressBase® and how will it improve address data quality? Here Paul Malyon of Experian QAS explains about a new combined address reference source for the United Kingdom.

Now, let’s mash up the AddressBase, the WorldBase and the Electoral Rolls – and all the likes.

Image called Castle in the Sky found on photobotos.

Bookmark and Share

Search and if you are lucky you will find

This morning I was following the tweet stream from the ongoing Gartner Master Data Management (MDM) conference here in London, when another tweet caught my eyes:

This reminded me about that (error tolerant) search is The Overlooked MDM Feature.

Good search functionality is essential for making the most out of your well managed master data.

Search functionality may be implemented in these main scenarios:

Inside Search

You should be able to quickly find what is inside your master data hub.

The business benefits from having fast error tolerant search as a capacity inside your master data management solution are plenty, including:

  • Better data quality by upstream prevention against duplicate entries as explained in this post.
  • More efficiency by bringing down the time users spends on searching for information about entities in the master data hub.
  • Higher employee satisfaction by eliminating a lot of frustration else coming from not finding what you know must be inside the hub already.

MDM inside search capabilities applies to multiple domains: Party, product and location master data.

Search the outside

You should be able to quickly find what you need to bring inside your master data hub.

Data entry may improve a lot by having fast error tolerant search that explores the cloud for relevant data related to the entry being done. Doing that has two main purposes:

  • Data entry becomes more effective with less cumbersome investigation and fewer keystrokes.
  • Data quality is safeguarded by better real world alignment.

Preferably the inside and the outside search should be the same mash-up.

Searching the outside is applies especially to location and party master data.

Search from the outside

Website search applies especially to product master data and in some cases also to related location master data as described in the post Product Placement.

Your website users should be able to quickly find what you publish from your master data hub be that description of physical products, services or research documents as in the case of Gartner, which is an analyst firm.

As said in the tweet on the top of this post, (good) search makes the life of your coming and current customers much easier. Do I need to emphasize the importance of good customer experience?

Bookmark and Share

Reference Data at Work in the Cloud

One of the product development programs I’m involved in is about exploiting rich external reference data and using these data in order to get data quality right the first time and being able to maintain optimal data quality over time.

The product is called instant Data Quality (abbreviated as iDQ ™). I have briefly described the concept in an earlier post called instant Data Quality.

iDQ ™combines two concepts:

  • Software as a Service
  • Data as a Service

While most similar solutions are bundled with one specific data provider the iDQ ™ concept embraces a range data sources. The current scope is around customer master data where iDQ ™ may include Business-to-Business (B2B) directories, Business-to-Consumer (B2C) directories, real estate directories, Postal Address Files and even social media network data from external sources as well as internal master data at the same time all presented in a compact mash-up.

The product has already gained a substantial success in my home country Denmark leading to the formation of a company solely working with development and sales of iDQ ™.

The results iDQ ™ customers gains may seem simple but are the core advantages of better data quality most enterprises are looking for, like said by one of Denmark’s largest companies:

“For DONG Energy iDQ ™ is a simple and easy solution when searching for master data on individual customers. We have 1,000,000 individual customers. They typically relocate a few times during the time they are customers of us. We use iDQ ™ to find these customers so we can send the final accounts to the new address. iDQ ™ also provides better master data because here we have an opportunity to get names and addresses correctly spelled.

iDQ ™ saves time because we can search many databases at the time. Earlier we had to search several different databases before we found the right master data on the customer. ”

Please find more testimonials here.

I hope to be able to link to testimonials in more languages in the future.

Bookmark and Share

Social Master Data Management

The term ”Social CRM” has been around for a while. Like traditional CRM (Customer Relationship Management) is heavily dependent on proper MDM (Master Data Management) we will also see that enterprise wide social CRM will be dependent on a proper social MDM element in order to be a success.

The challenge in social MDM will be that we are not going to replace some data sources for MDM, but we are actually going to add some more sources and handle the integration of these sources with the sources for traditional CRM and MDM and other new sources coming from the cloud.

Customer Master Data sources will expand to embrace:

  • Traditional data entry from field work like a sales representative entering prospect and customer master data as part of Sales Force Automation.
  • Data feed and data integration with external reference data like using a business directory. Such integration will increasingly take place in the cloud and the trend of governments releasing public sector data will add tremendously to this activity.
  • Self registration by prospects and customers via webforms.
  • Social media master data captured during social CRM and probably harvested in more and more structured ways.

Social media master data are found as profiles in services as Facebook mainly for business-to–consumer activities, LinkedIn mainly for business-to-business activities and Twitter somewhere in between. These are only some prominent examples of such services. Where LinkedIn may be dominant for professional use in English speaking countries and countries where English is widely spoken as Scandinavia and the Netherlands other regions are far less penetrated by LinkedIn. For example for German speaking countries the similar network service called Xing is much more crowded. So, when embracing global business you will have to acknowledge the diversity found in social network services.

A good way to integrate all these sources in business processes is using mashup’s. An example will be a mashup for entering customer data. If you are entering a business entity you may want to know:

  • What is already known in internal databases about that entity – either via a centralized MDM hub or throughout disparate databases?
  • Is the visit address correct according to public sector data?
  • How is the business account related to other business entities learned from a business directory?
  • Do we recognize the business contact in social networks – maybe we did have contact before in another relation?

If you are entering a consumer entity you may want to know:

  • Does that person already exist in our internal databases – as an individual and as a household?
  • What do we know about the residence address from public sector data?
  • Can we obtain additional data from phone book directories, nixie lists and what else being available, affordable and legal in the country in question?
  • How do we connect in social media?

Of course privacy is a big issue. Norms vary between countries, so do the legal rules. Norms vary between individuals and by the individuals as a private person and a business contact. Norms vary between industries and from company to company.

If aligning people, processes and technology didn’t matter before, it will when dealing with social master data management.

Bookmark and Share

Enterprise Data Mashup and Data Matching

A mashup is a web page or application that uses or combines data or functionality from two or many more external sources to create a new service. Mashups can be considered to have an active role in the evolution of social software and Web 2.0. Enterprise Mashups are secure, visually rich web applications that expose actionable information from diverse internal and external information sources. So says Wikipedia.

I think that Enterprise Mashups will need data matching – and data matching will improve from data mashups.

The joys and challenges of Enterprise Mashups was recently touched in the post “MDM Mashups: All the Taste with None of the Calories” by Amar Ramakrishnan of Initiate. Data needs to be cleansed and matched before being exposed in an Enterprise Mashup. An Enterprise Mashup is then a fast way to deliver Master Data Management results to the organization.

Party Data Matching has typically been done in these two often separated contexts:

  • Matching internal data like deduplicating and consolidating
  • Matching internal data against an external source like address correction and business directory matching

Increased utilization of multiple functions and multiple sources – like a mashup – will help making better matching. Some examples I have tried includes:

  • If you know whether an address is unique or not this information is used to settle a confidence of an individual or household duplicate.
  • If you know if an address is a single residence or a multiple residence (like a nursing home or campus) this information is used to settle a confidence of an individual or household duplicate.
  • If you know the frequency of a name (in a given country) this information is used to settle a confidence of a private, household or contact duplicate.

As many data quality flaws (not surprisingly) are introduced at data entry, mashups may help during data entry, like:

  • An address may be suggested from an external source.
  • A business entity may be picked from an external business directory.
  • Various rules exist in different countries for using consumer/citizen directories – why not use the best available where you do business.

Also the rise of social media adds new possibilities for mashup content during data entry, data maintenance and for other uses of MDM / Enterprise Mashups. Like it or not, your data on Facebook, Twitter and not at least LinkedIn are going to be matched and mashed up.

Bookmark and Share