Finding the Truth in Social Business Directories

LinkedIn has a section called companies. When browsing around on LinkedIn you are sometimes hinted to follow a company that LinkedIn think will be of interest for you.

The other day my hint included two identical logo’s for the old Master Data Management (MDM) vendor called Siperian. Curiously and data quality geeky as I am I checked and actually there are two Siperians on LinkedIn companies:

Both have an identical head quarter address in California, USA.

So, even MDM vendors have created duplicates.

Also, Siperian was acquired by the Data Integration giant Informatica some years ago, so you should expect that the Siperians was emptied. But that is not the case. Some Siperian folks still claims working for one of the Siperian duplicates (though many also for Imformatica at the same time).

Now, I was not sure about the legal status of the old Siperian company. So I went to another social network called Companybook. On that site the company registry is based on an external business directory.

Here it seems that the Siperian company in Toronto, Canada actually still exist, though marked as owned by Informatica.

So, I’m still looking for that single source of the truth out there. Until then I will mashup the external sources out there with my internal MDM vendor knowledge as told in the post yesterday called Mashing Up Big Reference Data with Internal Master Data.

Bookmark and Share

Data Driven Data Quality

In a recent article Loraine Lawson examines how a vast majority of executives describes their business as “data driven” and how the changing world of data must change our approach to data quality.

As said in the article the world has changed since many data quality tools were created. One aspect is that “there’s a growing business hunger for external, third-party data, which can be used to improve data quality”.

Embedding third-party data into data quality improvement especially in the party master data domain has been a big part of my data quality work for many years.

Some of the interesting new scenarios are:

Ongoing Data Maintenance from Many Sources

As explained in the article on Wikipedia about data quality services as the US National Change of Address (NCOA) service and similar services around the world has been around for many years as a basic use of external data for data quality improvement.

Using updates from business directories like the Dun & Bradstreet WorldBase and other national or industry specific directories is another example.

In the post Business Contact Reference Data I have a prediction saying that professional social networks may be a new source of ongoing data maintenance in the business-to-business (B2B) realm.

Using social data in business-to-consumer (B2C) activities is another option though also haunted with complex privacy considerations.

Near-Real-Time Data Enrichment

Besides updating changes of basic master data from business directories these directories typically also contains a lot of other data of value for business processes and analytics.

Address directories may also hold further information like demographic stereotype profiles, geo codes and property data elements.

Appending phone numbers from phone books and checking national suppression lists for mailing and phoning preferences are other forms of data enrichment used a lot related to direct marketing.

Traditionally these services have been implemented by sending database extracts to a service provider and receiving enriched files for uploading back from the service provider.

Lately I have worked with a new breed of self service data enrichment tools placed in the cloud making it possible for end users to easily configure what to enrich from a palette of address, business entity and consumer/citizen related third-party data and executing the request as close to real-time as the volume makes it possible.

Such services also include the good old duplicate check now much better informed by including third-party reference data.

Instant Data Quality in Data Entry

As discussed in the post Avoiding Contact Data Entry Flaws third-party reference data as address directories, business directories and consumer/citizen directories placed in the cloud may be used very efficiently in data entry functionality in order to get data quality right the first time and at the same time reduce the time spend in data entry work.

Not at least in a globalized world where names of people reflect the diversity of almost any nation today, where business names becomes more and more creative and data entry is done at shared service centers manned with people from cultures with other address formatting rules, there is an increased need for data entry assistance based on external reference data.

When mashing up advanced search in third-party data and internal master when doing data entry you will solve most of the common data quality issues around avoiding duplicates and getting data as complete and timely as needed from day one.

Bookmark and Share

Business Contact Reference Data

When working with selling data quality software tools and services I have often used external sources for business contact data and not at least when working with data matching and party master data management implementations in business-to-business (B2B) environments I have seen uploads of these data in CRM sources.

A typical external source for B2B contact data will look like this:

Some of the issues with such data are:

  • Some of the contact data names may be the same real world individual as told in the post Echoes in the Database
  • People change jobs all the time. The external lists will typically have entries verified some time ago and when you upload to your own databases, data will quickly become useless do to data decay.
  • When working with large companies in customer and other business partner roles you often won’t interact with the top level people, but people in lower levels not reflected in such external sources.

The rise of social networks has presented new opportunities for overcoming these challenges as examined in a post (written some years ago) called Who is working where doing what?

However, I haven’t seen so many attempts yet to automate and include working with social network profiles in business processes. Surely there are technical issues and not at least privacy considerations in doing so as discussed in the post Sharing Social Master Data.

Right now we have a discussion going on in the LinkedIn Social MDM group about examples of connecting social network profiles and master data management. Please add your experiences in the group here – and join if you aren’t already a member.

Bookmark and Share

Instant Data Enrichment

Data enrichment is one of the core activities within data quality improvement. Data enrichment is about updating your data in order to be more real world aligned by correcting and completing with data from external reference data sources.

Traditionally data enrichment has been a follow up activity to data matching and doing data matching as a prerequisite for data enrichment has been a good part of my data quality endeavor during the recent 15 years as reported in the post The GlobalMatchBox.

During the last couple of years I have tried to be part of the quest for doing something about poor data quality by moving the activities upstream. Upstream data quality prevention is better than downstream data cleansing wherever applicable. Doing the data enrichment at data capture is the fast track to improve data quality for example by avoiding contact data entry flaws.

It’s not that you have to enrich with all the possible data available from external sources at once. What is the most important thing is that you are able to link back to external sources without having to do (too much) fuzzy data matching later. Some examples:

  • Getting a standardized address at contact data entry makes it possible for you to easily link to sources with geo codes, property information and other location data at a later point.
  • Obtaining a company registration number or other legal entity identifier (LEI) at data entry makes it possible to enrich with a wealth of available data held in public and commercial sources.
  • Having a person’s name spelled according to available sources for the country in question helps a lot when you later have to match with other sources.

In that way your data will be fit for current and future multiple purposes.

Bookmark and Share

Avoiding Contact Data Entry Flaws

Contact data is the data domain most often mentioned when talking about data quality. Names and addresses and other identification data are constantly spelled wrong, or just different, by the employees responsible of entering party master data.

Cleansing data long time after it has been captured is a common way of dealing with this huge problem. However, preventing typos, wrong hearings and multi-cultural misunderstandings at data entry is a much better option wherever applicable.

I have worked with two different approaches to ensure the best data quality for contact data entered by employees. These approaches are:

  • Correction and
  • Assistance


With correction the data entry clerk, sales representative, customer service professional or whoever is entering the data will enter the name, address and other data into a form.

After submitting the form, or in some cases leaving each field on the form, the application will check the content against business rules and available reference data and return a warning or error message and perhaps a correction to the entered data.

As duplicated data is a very common data quality issue in contact data, a frequent example of such a prompt is a warning about that a similar contact record already exists in the system.


With assistance we try to minimize the needed number of key strokes and interactively help with searching in available reference data.

For example when entering address data assistance based data entry will start with the highest geographical level:

  • If we are dealing with international data the country will set the context and know about if a state or province is needed.
  • Where postal codes (like ZIP) exists, this is the fast path to the city.
  • In some countries the postal code only covers one street (thoroughfare), so that’s settled by the postal code. In other situations we will usually have a limited number of streets that can be picked from a list or settled with the first characters.

(I guess many people know this approach from navigation devices for cars.)

When the valid address is known you may catch companies from business directories being on that address and, depending on the country in question, you may know citizens living there from phone directories and other sources and of course the internal party master data, thus avoiding entering what is already known about names and other data.

When catching business entities a search for a name in a business directory often leads to being able to pick a range of identification data and other valuable data and not at least a reference key to future data updates.

Lately I have worked intensively with an assistance based cloud service for business processes embracing contact data entry. We have some great testimonials about the advantages of such an approach here: instant Data Quality Testimonials.

Bookmark and Share

Updating a Social Business Directory

Business directories have been around for ages. In the old days it was paper based as in the yellow pages for a phone book. The yellow pages have since made it to be online searchable. We also know commercial business directories as the Dun & Bradstreet WorldBase as well as government operated national wide directories of companies and industry specific business directories.

Such business directories often takes a crucial role in master data quality work as sources for data enrichment in the quest for getting as close as possible to a single version of the truth when dealing with B2B customer master data, supplier master data and other business partner master data.

A classic core data model for Master Data in CRM systems, SCM solutions and Master Data hubs when doing B2B is that you have:

  • Accounts being the BUSINESS entities who are your customers, suppliers, prospects and all kind of other business partners
  • Contacts being the EMPLOYEEs working there and acting in the roles as decision makers, influencers, gate keepers, users and so on

Today we also have to think about social master data management, being exploiting reference data in social media as a supplementary source of external data.

As all social activity this exercise goes two ways:

  • Finding and monitoring your existing and wanted business partners in the social networks
  • Updating your own data

Most business entities in this world are actually one-man-bands. So are mine. Therefore I went to the LinkedIn company pages this morning and updated data about my company Liliendahl Limited: Unlimited Data Quality and Master Data Management consultancy for tool and service vendors.

Bookmark and Share

Some Deduplication Tactics

When doing the data quality kind of deduplication you will often have two kinds of data matching involved:

  • Data matching in order to find duplicates internally in your master data, most often your customer database
  • Data matching in order to align your master data with an external registry

As the latter activity also helps with finding the internal duplicates, a good question is in which order to do these two activities.

External identifiers

If we for example look at business-to-business (B2B) customer master data it is possible to match against a business directory. Some choices are:

  • If you have mostly domestic data in a country with a public company registration you can obtain a national ID from matching with a business directory based on such a registry. An example will be the French SIREN/SIRET identifiers as mentioned in the post Single Company View.
  • Some registries cover a range of countries. An example is the EuroContactPool where each business entity is identified with a Site ID.
  • The Dun & Bradstreet WorldBase covers the whole world by identifying approximately 200 million active and dissolved business entities with a DUNS-number. The DUNS-number also serves as a privatized national ID for companies in the United States.

If you start with matching your B2B customers against such a registry, you will get a unique identifier that can be attached to your internal customer master data records which will make a succeeding internal deduplication a no-brainer.

Common matching issues

A problem is however is that you seldom get a 100 % hit rate in a business directory matching, often not even close as examined in the post 3 out of 10.

Another issue is the commercial implications. Business directory matching is often performed as an external service priced per record. Therefore you may save money by merging the duplicates before passing on to external matching. And even if everything is done internally, removing the duplicates before directory matching will save process load.

However a common pitfall is that an internal deduplication may merge two similar records that actually are represented by two different entities in the business directory (and the real world).

So, as many things data matching, the answer to the sequence question is often: Both.

A good process sequence may be this one:

  1. An internal deduplication with very tight settings
  2. A match against an external registry
  3. An internal deduplication exploiting external identifiers and having more loose settings for similarities not involving an external identifier

Bookmark and Share

Magic Quadrant Diversity

The Magic Quadrants from Gartner Inc. ranks the tool vendors within a lot of different IT disciplines. Related to my work the quadrants for data quality tools and master data management is the most interesting ones.

However, the quadrants examine the vendors in a global scope. But, how are the vendors doing in my country?

I tried to look up a few of the vendors in a local business directory for Denmark provided (free to use on the web) by the local Experian branch.


First up is DataFlux, the (according to Gartner) leading data quality tool vendor.

Result: No hits.

Knowing that DataFlux is owned by SAS Institute will however, with a bit of patience, finally bring you to information about the DataFlux product deep down on the SAS local website.

PS: Though SAS is more known here as the main airline (Scandinavian Airlines System), SAS Institute is actually very successful in Denmark having a much larger part of the Business Intelligence market here than most places else.


Next up is Informatica, a well positioned company in both the quadrant for data quality tools and customer master data management.

Result: No Hits.

Here you have to know that Informatica is represented in the Nordic area by a company called Affecto. You will find information about the Informatica products deep down on the Affecto website – along with the competing product FirstLogic owned by Business Objects (owned by SAP) also historically represented by Affecto.

Stibo Systems

Stibo Systems may not be as well known as the two above, but is tailing the mega vendors in the quadrant for Product Master Data Management, as mentioned recently in a blog post by Dan Power.

Result: Hit:

They are here with over 500 employees – at least in the legal entity called Stibo where Stibo Systems is an alternate name and brand. And it’s no kidding; I visited them last month at the impressive head quarter near Århus (the second largest city in Denmark).

Bookmark and Share

Business Directory Match: Global versus Local

When doing data quality improvement in business-to-business party master data an often used shortcut is matching your portfolio of business customers with a business directory and preferably picking new customers from the directory in the future.

If you are doing business in more than one country you will have some considerations about what business directory to use like engaging with a local business directory for each country or engaging with a single business directory covering all countries in question.

There are pro’s and con’s.

One subject is conformity. I have met this issue a couple of times. A business directory covering many countries will have a standardized way of formatting the different elements like a postal address, whereas a local (national) business directory will use best practice for the particular country.

An example from my home country Denmark:

The Dun & Bradstreet WorldBase is a business directory holding 170 million business entities from all over the world. A Danish street address is formatted like this:

Address Line 1 = Hovedgaden 12 A, 4. th

Observe that Denmark belongs to that half of the earth where house numbers are written after the street name.

In a local business directory (based on the public registry) you will be able to get this format:

Street name = Hovedgaden
Street code = 202 4321
House number = 012A
Floor = 04
Side/door = TH

Here you get an atomized address with metadata for the atomized elements and the unique address coding used in Denmark.

Bookmark and Share

3 out of 10

Just before I left for summer vacation I noticed a tweet by MDM guru Aaron Zornes saying:

This is a subject very close to me as I have worked a lot with business directory matching during the last 15 years not at least matching with the D&B WorldBase.

The problem is that if you match your B2B customers, suppliers and other business partners with a business directory like the D&B WorldBase you could naively expect a 100% match.

If your result is only a 30% hit rate the question is: How many among the remaining 70% are false negatives and how many are true negatives.

True negatives

There may be a lot of reasons for true negatives, namely:

  • Your business entity isn’t listed in the business directory. Some countries like those of the old Czechoslovakia, some English speaking countries in the Pacifics, the Nordic countries and others have a tight public registration of companies and then it is less tight from countries in North America, other European countries and the rest of the world.
  • Your supposed business entity isn’t a business entity. Many B2B customer/prospect tables holds a lot of entities not being a formal business entity but being a lot of other types of party master data.
  • Uniqueness may be different defined in the business directory and your table to be matched. This includes the perception of hierarchies of legal entities and branches – not at least governmental and local authority bodies is a fuzzy crowd. Also the different roles as those of small business owners are a challenge. The same is true about roles as franchise takers and the use of trading styles.

False negatives

In business directory matching the false negatives are those records that should have been matched by an automated function, but isn’t.

The number of false negatives is a measure of the effectiveness of the automated matching tool(s) and rules applied. Big companies often use the magic quadrant leaders in data quality tools, but these aren’t necessary the best tools for business directory matching.

Personally I have found that you need a very complex mix of tools and rules for getting a decent match rate in business directory matching, including combining both deterministic and probabilistic matching. Some different techniques are explained in more details here.

Bookmark and Share