Legal Forms from Hell

When doing data matching with company names a basic challenge is that a proper company name in most cultures in most cases have two elements:

  • The actual company name
  • The legal form

Some worldwide examples:

  • Informatica Corporation
  • Talend SA
  • SAP Deutschland AG & Co. KG
  • Sony Kabushiki Kaisha
  • LEGO A/S

There are hundreds of different legal forms in full and abbreviated forms. Wikipedia has a list here (here called types of business entity).

However, when typing in company names in databases the legal form is often omitted. And even where legal forms are present they may be represented differently in full or abbreviated forms, with varying spelling and punctuation and so on. As the actual company names also suffer from this fuzziness, the complexity is overwhelming.

A common way of handling this issue in data matching is to separate the legal form and then emphasize on comparing the remaining part being the actual company name. When doing that it has to be done country specific or else you may remove the entire name of a company like with a name of an Italian company called Société Anonyme, which is a French legal form.

While the practice of having legal forms in company names may serve well for the original purpose of knowing the risk of doing business with that entity, it is certainly not serving the purpose of having the uniqueness data quality dimension solved.

One should think that it is time for changing the bad (legal demanded) practice of mixing legal forms with company names and serve the original purpose in another more data quality friendly way.

Bookmark and Share

Business Directory Match: Global versus Local

When doing data quality improvement in business-to-business party master data an often used shortcut is matching your portfolio of business customers with a business directory and preferably picking new customers from the directory in the future.

If you are doing business in more than one country you will have some considerations about what business directory to use like engaging with a local business directory for each country or engaging with a single business directory covering all countries in question.

There are pro’s and con’s.

One subject is conformity. I have met this issue a couple of times. A business directory covering many countries will have a standardized way of formatting the different elements like a postal address, whereas a local (national) business directory will use best practice for the particular country.

An example from my home country Denmark:

The Dun & Bradstreet WorldBase is a business directory holding 170 million business entities from all over the world. A Danish street address is formatted like this:

Address Line 1 = Hovedgaden 12 A, 4. th

Observe that Denmark belongs to that half of the earth where house numbers are written after the street name.

In a local business directory (based on the public registry) you will be able to get this format:

Street name = Hovedgaden
Street code = 202 4321
House number = 012A
Floor = 04
Side/door = TH

Here you get an atomized address with metadata for the atomized elements and the unique address coding used in Denmark.

Bookmark and Share

Out of Facebook

Some while ago it was announced that Facebook signed up member number 500,000,000.

If you are working with customer data management you will know that this doesn’t mean that 500,000,000 distinct individuals are using Facebook. Like any customer table the Facebook member table will suffer from a number of different data quality issues like:

  • Some individuals are signed up more than once using different profiles.
  • Some profiles are not an individual person, but a company or other form of establishment.
  • Some individuals who created a profile are not among us anymore.

Nevertheless the Facebook member table is a formidable collection of external reference data representing the real world objects that many companies are trying to master when doing business-2- consumer activities.

For those companies who are doing business-2-business activities a similar representation of real world objects will be the +70,000,000 profiles on LinkedIn plus profiles in other social business networks around the world which may act as external reference data for the business contacts in the master data hubs, CRM systems and so on.

Customer Master Data sources will expand to embrace:

  • Traditional data entry from field work like a sales representative entering prospect and customer master data as part of Sales Force Automation.
  • Data feed and data integration with traditional external reference data like using a business directory. Such integration will increasingly take place in the cloud and the trend of governments releasing public sector data will add tremendously to this activity.
  • Self registration by prospects and customers via webforms.
  • Social media master data captured during social CRM and probably harvested in more and more structured ways as a new wave of exploiting external reference data.

Doing “Social Master Data Management” will become an integrated part of customer master data management offering both opportunities for approaching a “single version of the truth” and some challenges in doing so.

Of course privacy is a big issue. Norms vary between countries, so do the legal rules. Norms vary between individuals and by the individuals as a private person and a business contact. Norms vary between industries and from company to company.

But the fact that 500,000,000 profiles has been created on Facebook in a very few years by people from all over world shows that people are willing to share and that much information can be collected in the cloud. However no one wants to be spammed by sharing and indeed there have been some controversies around how data in Facebook is handled. 

Anyway I have no doubt that we will see less data entering clerks entering the same information in each company’s separate customer tables and that we increasingly will share our own master data attributes in the cloud.

Bookmark and Share

3 out of 10

Just before I left for summer vacation I noticed a tweet by MDM guru Aaron Zornes saying:

This is a subject very close to me as I have worked a lot with business directory matching during the last 15 years not at least matching with the D&B WorldBase.

The problem is that if you match your B2B customers, suppliers and other business partners with a business directory like the D&B WorldBase you could naively expect a 100% match.

If your result is only a 30% hit rate the question is: How many among the remaining 70% are false negatives and how many are true negatives.

True negatives

There may be a lot of reasons for true negatives, namely:

  • Your business entity isn’t listed in the business directory. Some countries like those of the old Czechoslovakia, some English speaking countries in the Pacifics, the Nordic countries and others have a tight public registration of companies and then it is less tight from countries in North America, other European countries and the rest of the world.
  • Your supposed business entity isn’t a business entity. Many B2B customer/prospect tables holds a lot of entities not being a formal business entity but being a lot of other types of party master data.
  • Uniqueness may be different defined in the business directory and your table to be matched. This includes the perception of hierarchies of legal entities and branches – not at least governmental and local authority bodies is a fuzzy crowd. Also the different roles as those of small business owners are a challenge. The same is true about roles as franchise takers and the use of trading styles.

False negatives

In business directory matching the false negatives are those records that should have been matched by an automated function, but isn’t.

The number of false negatives is a measure of the effectiveness of the automated matching tool(s) and rules applied. Big companies often use the magic quadrant leaders in data quality tools, but these aren’t necessary the best tools for business directory matching.

Personally I have found that you need a very complex mix of tools and rules for getting a decent match rate in business directory matching, including combining both deterministic and probabilistic matching. Some different techniques are explained in more details here.

Bookmark and Share

Business Directory Musings

This coming Sunday I have worked professionally within Information Technology for 30 years. As I will be on a (well deserved!) vacation in Andalusia on Sunday, I’ll better post my thoughts today.

I have had a lot of different positions and worked in a lot of different domains. The single subject I have worked with the most is business directories.

My first job was at the Danish Tax Authorities and one of the assignments was being a secretary to the committee working for a joint registration of companies in Denmark. Besides I learned a lot about working in political driven organizations and about aligning business and technology I feel good about having been part of the start of building a public sector master data directory. Such directories are both essential for an effective public administration and can be used as external reference data in private enterprises as a valuable mean to improve data quality with business partner master data.

Later I have been working a lot with improving data quality through matching solutions around business directories. This goes from the Dun & Bradstreet WorldBase holding nearly 170 million business entities from all over the world, over databases like the EuroContactPool to national databases either holding all businesses (available) in a single country or given industry segments.

I guess I also will be spending some additional years from now with integrating business directory information into business processes as smooth as possible and preferable along with a range of other kind of external reference data.

One of the new sources building up in the cloud in the realm of business directories is master data references in social networks. The LinkedIn Companies feature is a prominent example. Of course such directories have some data quality issues. This is seen in looking at the companies where I currently work:

  • DM Partner A/S seems OK
  • Omikron Data Quality has 90 employees according to the company profile (filled out by yours truly). Then it’s strange that there are only 25 profiles in the network. But that’s because most employees are in Germany where the competing network called Xing is stronger.
  • Trapeze Group Europe has not been updated with a recent merger and not all profiles has changed their profile accordingly yet. But I’m sure that will be done as time goes by.

I have no doubt though that including information from social networks will become a part of integrating business partner master data in my future.

Bookmark and Share

Social Master Data Management

The term ”Social CRM” has been around for a while. Like traditional CRM (Customer Relationship Management) is heavily dependent on proper MDM (Master Data Management) we will also see that enterprise wide social CRM will be dependent on a proper social MDM element in order to be a success.

The challenge in social MDM will be that we are not going to replace some data sources for MDM, but we are actually going to add some more sources and handle the integration of these sources with the sources for traditional CRM and MDM and other new sources coming from the cloud.

Customer Master Data sources will expand to embrace:

  • Traditional data entry from field work like a sales representative entering prospect and customer master data as part of Sales Force Automation.
  • Data feed and data integration with external reference data like using a business directory. Such integration will increasingly take place in the cloud and the trend of governments releasing public sector data will add tremendously to this activity.
  • Self registration by prospects and customers via webforms.
  • Social media master data captured during social CRM and probably harvested in more and more structured ways.

Social media master data are found as profiles in services as Facebook mainly for business-to–consumer activities, LinkedIn mainly for business-to-business activities and Twitter somewhere in between. These are only some prominent examples of such services. Where LinkedIn may be dominant for professional use in English speaking countries and countries where English is widely spoken as Scandinavia and the Netherlands other regions are far less penetrated by LinkedIn. For example for German speaking countries the similar network service called Xing is much more crowded. So, when embracing global business you will have to acknowledge the diversity found in social network services.

A good way to integrate all these sources in business processes is using mashup’s. An example will be a mashup for entering customer data. If you are entering a business entity you may want to know:

  • What is already known in internal databases about that entity – either via a centralized MDM hub or throughout disparate databases?
  • Is the visit address correct according to public sector data?
  • How is the business account related to other business entities learned from a business directory?
  • Do we recognize the business contact in social networks – maybe we did have contact before in another relation?

If you are entering a consumer entity you may want to know:

  • Does that person already exist in our internal databases – as an individual and as a household?
  • What do we know about the residence address from public sector data?
  • Can we obtain additional data from phone book directories, nixie lists and what else being available, affordable and legal in the country in question?
  • How do we connect in social media?

Of course privacy is a big issue. Norms vary between countries, so do the legal rules. Norms vary between individuals and by the individuals as a private person and a business contact. Norms vary between industries and from company to company.

If aligning people, processes and technology didn’t matter before, it will when dealing with social master data management.

Bookmark and Share

Merging Customer Master Data

One of the most frequent assignments I have had within data matching is merging customer databases after two companies have been merged.

This is one of the occasions where it doesn’t help saying the usual data quality mantras like:

  • Prevention and root cause analysis is a better option
  • Change management is a critical factor in ensuring long-term data quality success
  • Tools are not important

It is often essential for the new merged company to have a 360 degree view of business partners as soon as possible in order to maximize synergies from the merger. If the volumes are above just a few thousand entities it is not possible to obtain that using human resources alone. Automated matching is the only realistic option.

The types of entities to be matched may be:

  • Private customers – individuals and households (B2C)
  • Business customers (B2B) on account level, enterprises, legal entities and branches
  • Contacts for these accounts

I have developed a slightly extended version of this typification here.

One of the most common challenges in merging customer databases is that hierarchy management may have been done very different in the past within the merging bodies. When aligning different perceptions I have found that a real world approach often fulfils the different reasoning.

The fuzziness needed for the matching is basically dependent on the common unique keys available in the two databases. These are keys as citizen ID’s (whatever labeled around the world) and public company ID’s (the same applies). Matching both databases with an external source (per entity type) is an option. “Duns Numbering” is probably the most common known type of such an approach. Maintaining a solution for assigning Duns Numbers to customer files from the D&B WorldBase is by the way one of my other assignments as described here.

The automated matching process may be divided into these three steps:

During my many years of practice in doing this I have found that the result from the automated process may vary considerable in quality and speed depending on the tools used.

Bookmark and Share

What is a best-in-class match engine?

Latest in connection with that TIBCO acquires data matching vendor Netrics the term best-in-class match engine has been attached to the Netrics product.

First: I have no doubt that the Netrics product is a capable match engine – I know that from discussions in the LinkedIn Data Matching group and here on this blog.

Next: I don’t think anyone knows what product is the best match engine, because I don’t think that all match engines have been benchmarked with a representative set of data.

There are of course on top the matching capabilities with different entity types to consider. Here party master data (like customer data) are covered by most products whereas capabilities with other entity types (be that considered same same or not) are far less exposed.

As match engine products are acquired and integrated in suites the core matching capabilities somehow becomes mixed up with a lot of other capabilities making it hard to compare the match engine alone.

Some independent match engines work stand alone and some may be embedded into other applications.

These may then be the classes to be best in:

  • Match engines in suites
  • Embedded match engines (for say SAP, MS CRM and so on)
  • Stand alone match engines

Many match engines I have seen are tuned to deal with data from the country (culture) where they are born and had their first triumphs. As the US market is still far the largest for match engines the nomination of best match engine resembles when a team becomes World Champions in American Football. International/multi-cultural capabilities will become more and more important in data matching. But indeed we may define a class for each country (culture).

In the old days I have heard that one match engine was best for marketing data and another match engine was best for credit risk management. I think these days are over too. With Master Data Management you have to embrace all data purposes.

Some match engines are more successful in one industry. The biggest differentiator in match effectiveness is with B2C and/or B2B data. B2C is the easiest, B2B is more complex and embracing both is in my eyes a must for being considered best-in-class – unless we define separate classes for B2C, B2B and both.

As some matching techniques are deterministic and some are probabilistic the evaluation on the latter one will be based on data already processed in a given instance, as the matching gets better and better as the self learning element is warmed up.

So, yes, an endless religious-like discussion I reopened here.

Bookmark and Share

Dealing with annoying customers

No, this is not a blog post about how to handle customers that unjustly complaints about everything.

This is a blog post about how to maintain high quality data in customer databases.

When doing that, there are some types of party entities that are more difficult to handle than others. In general B2B (business) entities are more complex than B2C (consumer/citizen) entities. Some of the B2B types I have spent more time with than others are the following:

Restaurants are some of the more demanding guests in our databases:

  • They do change owner more often than most other business entities making them a new legal entity each time which is important for some business contexts like credit risk.
  • On the other hand it’s the same address despite a new owner, which makes it being the same entity in the eyes of other business contexts like logistics.
  • In many cases you may have a name (trade style) of the restaurant and another official name of the business – a variant of this is when the restaurant is franchised.

Public sector bodies can’t be sliced the same way as private entities:

  • Often it is hard to state if a business partner belongs to a narrow defined or a broader defined unit within a governmental or local authority.
  • Public sector bodies tend to have long names that may be used with different inclusion of words, sequence of words and abbreviations of words.

Global enterprises may be seen as one or as thousands of customers:

  • The need for hierarchy management is obvious when it comes to handle data about business partners that belongs to a global enterprise – risk management, 1-1 marketing, sales force automation and so on will use the same data in many different ways.
  • Company family trees are useful but treacherous. A mother and a daughter may be very close connected with lots of shared services or it may be a strictly matter of ownership with no operational ties at all.

These are some of the facts of life that make it fun and not trivial when you are conducting data matching and other activities in order to achieve and maintain high quality of customer master data.

Bookmark and Share