Using a Business Entity Identifier from Day One

One of the ways to ensure data quality for customer – or rather party – master data when operating in a business-to-business (B2B) environment, is to on-board new entries using an external defined business entity identifier.

By doing that, you tackle some of the most challenging data quality dimensions as:

  • Uniqueness, by checking if a business with that identifier already exist in your internal master data. This approach is superior to using data matching as explained in the post The Good, Better and Best Way of Avoiding Duplicates.
  • Accuracy, by having names, addresses and other information defaulted from a business directory and thus avoiding those spelling mistakes that usually are all over in party master data.
  • Conformity, by inheriting additional data as line-of-business codes and descriptions from a business directory.

Having an external business identifier stored with your party master data helps a lot with maintaining data quality as pondered in the post Ongoing Data Maintenance.

Busienss Entity IdentifiersWhen selecting an identifier there are different options as national IDs, LEI, DUNS Number and others as explained in the post Business Entity Identifiers.

At the Product Data Lake service I am working on right now, we have decided to use an external business identifier from day one. I know this may be something a typical start-up will consider much later if and when the party master data population has grown. But, besides being optimistic about our service, I think it will be a win not to have to fight data quality issues later with guarantied increased costs.

For the identifier to use we have chosen the DUNS Number from Dun & Bradstreet. The reason is that this currently is the only worldwide covered business identifier. Also, Dun & Bradstreet offers some additional data that fits our business model. This includes consistent line-of-business information and worldwide company family trees.

Bookmark and Share

The World of Reference Data

Google EarthReference Data Management (RDM) is an evolving discipline within data management. When organizations mature in the reference data management realm we often see a shift from relying on internally defined reference data to relying on externally defined reference data. This is based on the good old saying of not to reinvent the wheel and also that externally defined reference data usually are better in fulfilling multiple purposes of use, where internally defined reference data tend to only cater for the most important purpose of use within your organization.

Then, what standard to use tend to be a matter of where in the world you are. Let’s look at three examples from the location domain, the party domain and the product domain.

Location reference data

If you read articles in English about reference data and ensuring accuracy and other data quality dimensions for location data you often meet remarks as “be sure to check validity against US Postal Services” or “make sure to check against the Royal Mail PAF File”. This is all great if all your addresses are in the United States or the United Kingdom. If all your addresses are in another country, there will in many cases be similar services for the given country. If your address are spread around the world, you have to look further.

There are some Data-as-a-Service offerings for international addresses out there. When it comes to have your own copy of location reference data the Universal Postal Union has an offering called the Universal POST*CODE® DataBase. You may also look into open data solutions as GeoNames.

Party reference data

Within party master data management for Business-to-Business (B2B) activities you want to classify your customers, prospects, suppliers and other business partners according to what they do, For that there are some frequently used coding systems in areas where I have been:

  • Standard Industrial Classification (SIC) codes, the four-digit numerical codes assigned by the U.S. government to business establishments.
  • The North American Industry Classification System (NAICS).
  • NACE (Nomenclature of Economic Activities), the European statistical classification of economic activities.

As important economic activities change over time, these systems change to reflect the real world. As an example, my Danish company registration has changed NACE code three times since 1998 while I have been doing the same thing.

This doesn’t make conversion services between these systems more easy.

Product reference data

There are also a good choice of standardized and standardised classification systems for product data out there. To name a few:

  • TheUnited Nations Standard Products and Services Code® (UNSPSC®), managed by GS1 US™ for the UN Development Programme (UNDP).
  • eCl@ss, who presents themselves as: “THE cross-industry product data standard for classification and clear description of products and services that has established itself as the only ISO/IEC compliant industry standard nationally and internationally”. eCl@ss has its main support in Germany (the home of the Mercedes E-Class).

In addition to cross-industry standards there are heaps of industry specific international, regional and national standards for product classification.

Bookmark and Share

Making a Firmographic Analysis

What demographics are to people, firmographics are to organizations.

I am currently working with starting up a Business-to-Business (B2B) service. In order to assess the market I had to know something about how many companies there are out there who possibly could be in need of such a service.

The service will work word-wide, but adhering to the sayings about thinking globally/big and starting locally/small I have started with assessing the Danish market. Also there are easy and none expensive access to business directories for Denmark.

My first filter was selecting companies with at least 50 employees.

As the service is suitable for companies within ecosystems of manufacturers, distributors and retailers, I selected the equivalent range of industry codes. In this case it was NACE codes which resembles SIC codes and other classifications of Line-Of-Business used in other geographies.

There were circa 2,500 companies in my selection. However, some belong to the same company family tree. By doing a merge/purge with the largest company in a company family tree as the survivor, the list was down to circa 2,000 companies.

For this particular service, there are some other possibly competing approaches that are stronger for some kinds of goods than other kinds of goods. For that purpose, I made a bespoke categorization being:

  • Priority A: Building materials, furniture, houseware, machinery and vehicles.
  • Priority B: Electronics, books and clothes.
  • Priority C: Pharmaceuticals, food, beverage and tobacco.

Retailers that span several priorities were placed in priority B. Else, for this high level analysis, I only used the primary Line-Of-Business.

The result was as shown below:

Firmographic

So, from my firmographic analysis I know the rough size of the target market in one locality. I can assume, that other markets look more or less the same or I can do specific firmographics on other geographies. Also, I can apply first results of dialogues with entities in the breakdown model and see if the model needs a modification.

Bookmark and Share

instant Single Customer View

Achieving a Single Customer View (SCV) is a core driver for many data quality improvement and Master Data Management (MDM) implementations.

As most data quality practitioners will agree, the best way of securing data quality is getting it right the first time. The same is true about achieving a Single Customer View. Get it right the first time. Have an instant Single Customer View.

The cloud based solution I’m working with right now does this by:

  • Searching external big reference data sources with information about individuals, companies, locations and properties as well as social networks
  • Searching internal master data with information already known inside the enterprise
  • Inserting really new entities or updating current entities by picking  as much data as possible from external sources

instant Single Customer View

Some essential capabilities in doing this are:

  • Searching is error tolerant so you will find entities even if the spelling is different
  • The receiving data model is real world aligned. This includes:
    • Party information and location information have separate lives as explained in the post called A Place in Time
    • You may have multiple means of contact attached like many phones, email addresses and social identities

How do you achieve a Single Customer View?

Bookmark and Share

While we are waiting for the LEI

As told in the post Business Entity Identifiers there has been a new global numbering system for business entities on the way for some time. The wonder is called LEI (Legal Entity Identifier).

fsb-leiThe implementation work has been adapted by the Financial Stability Board. The latest developments are reported in a publication called Fifth progress note on the Global LEI Initiative.

Surely, while the implementations may be in good hands, the set up doesn’t give hope for a speedy process where every legal entity in the world in a short time will have a LEI.

And then the next question will be how long it will take before organizations will have enriched existing databases with that LEI and implemented on-boarding processes where a LEI is captured with every new insertion of party master data describing a legal entity.

A good way to start to be prepared will be to implement features in on-boarding business processes where available external reference data are captured when new party entities are added to your databases. Having best available information about names, addresses and business entity identifiers available today and a culture of capturing such information will be a great starting point.

And oh, the instant Data Quality concept is precisely all about doing that.

Bookmark and Share

Some Kinds of Reference Data

The term ”reference data” and related Reference Data Management (RDM) is used commonly in the data quality and Master Data Management (MDM) realm.

As with most terms it may be used with slightly different meanings. Usually, but not necessarily always, reference data are core data entities defined outside a given organization.

I have come across the below discussed kinds of reference data:

Reference Data in Investment Banking

The term “reference data” is well established in investment banking. Reference data are core master data entities as counterparties, securities and currencies. These are the things you deal with in investment banking. They are not made up for a given bank or other single financial institution but are shared across the whole market and should optimally be the same to every institution at exactly the same point of time.

RDMSmall Reference Data

In Master Data Management in general we usually see reference data as value lists helping describing and standardizing internal master data.

One example will be a country list. A list of countries should be the same for every organization in the world. However available lists does differ though most variations usually don’t have any business impact as the academic question about if Antarctica should be in the list or not.

A list of codes describing to which industry a given company belongs is another example of reference data. As examined in the post What are they doing? you may choose to standardize on SIC codes or standardise on NACE codes or develop your own set of codes for that purpose.

Big Reference Data

In geography a country list is in the top levels of defining locations. Further deep we may have postal code systems within each country as ZIP codes in the United States and PLZ codes in Germany. Yet further deep we have every single valid postal address eventually all over the world. This is what I call big reference data.

A way of sourcing industry codes for your customers, suppliers and other business partners will be picking from or enriching from a business directory like for example the D&B WorldBase or any other of the many business directories around. Such directories may also be seen as big reference data.

The dramatic increase in the use of social media and related social network profiles has emerged as a new kind of big reference data serving as links to our internal master data.

Bookmark and Share

Business Entity Identifiers

The least cumbersome way of uniquely identifying a business partner being a company, government body or other form of organization is to use an externally provided number.

However, there are quite a lot of different numbers to choose from.

All-Purpose National Identification Numbers

In some counties, like in Scandinavia, the public sector assigns a unique number to every company to be used in every relation to the public sector and open to be used by the private sector as well for identification purposes.

As reported in the post Single Company View I worked with the early implementation of such a number in Denmark way back in time.

Single-Purpose National Identification Numbers

In most countries there are multiple systems of numbers for companies each with an original special purpose. Examples are registration numbers, VAT numbers and employer identification numbers.

My current UK company has both a registration number and a VAT number and very embarrassing for a data quality and master data geek these two numbers have different names and addresses attached.

Other Numbering Systems

The best known business entity numbering system around the world is probably the DUNS-number used by Dun & Bradstreet. As examined in the post Select Company_ID from External_Source Where Possible the use of DUNS-numbers and similar business directory id’s is a very common way of uniquely identifying business partners.

In the manufacturing and retail world legal entities may, as part of the Global Data Synchronization Network, be identified with a Global Location Number (GLN).

There has been a lot of talk in the financial sector lately around implementing yet a new numbering system for legal entities with an identifier usually abbreviated as LEI. Wikipedia has the details about a Legal Entity Identification for Financial Contracts.

These are only some of the most used numbering systems for business entities.

So, the trend doesn’t seem to be a single source of truth but multiple sources making up some kind of the truth.

Bookmark and Share