Are These Familiar Hierarchies in Your MDM / DQM / PIM Solution?

The term family is used in different contexts within Master Data Management (MDM), Data Quality Management (DQM) and Product Information Management (PIM) when working with hierarchy management and entity resolution.

Here are three frequent examples:

Consumer / citizen family

Family consumer citizenWhen handling party master data about consumers / citizens we can deal with the basic definition of a family, being a group consisting of two parents and their children living together as a unit.

This is used when the business scenario does not only target each individual person but also a household with a shared economy. When identifying a household, a common parameter is that the persons live on the same postal address (at the same time) while observing constellations as:

  • Nuclear families consisting of a female and a male adult (and their children)
  • Rainbow families where the gender is not an issue
  • Extended families consisting of more than two generations
  • Persons who happen to live on the same postal address

There are multicultural aspects of these constellations including the different family name constructions around the world and the various frequency and acceptance of rainbow families as well of frequency of extended families.

Company family tree

When handling party master data about companies / organizations a valuable information is how the companies / organizations are related most commonly pictured as a company family tree with mothers and sisters. This can in theory be in infinite levels. The basic levels are:

  • A global ultimate mother being the company that ultimately owns (fully or partly) a range of companies in several countries.
  • A national ultimate mother being the company that owns (fully or partly) a range of companies in a given country.
  • A legal entity being the basic registered company within a country having some form of a business entity identifier.
  • A branch owned by a legal entity and operating from a given postal / visiting address.

Family companyYou can build your own company tree describing your customers, suppliers and other business partners. Alternatively or supplementary, you can rely on third party business directories. It is here worth noticing that a national source will only go to the ultimate national mother level while a global source can include the global ultimate mother and thus form larger families.

Having a company family view in your master data repository is a valuable information asset within credit risk, supply risk, discount opportunities, cross-selling and more.

Product family

The term “product family” is often used to define a level in a homegrown product classification / product grouping scheme. It is used to define a level that can have levels above and levels below with other terms as “product line”, “product category”, “product class”, “product group”, “product type” and more.

Family productSometimes it is also used as a term to define a product with a family of variants below, where variants are the same product produced and kept in stock in different colours, sizes and more.

Read more about Stock Keeping Units (SKUs), product variants, product identification and product classification in the post Five Product Information Management Core Aspects.

Three Flavors of Data Monetization

The term data monetization is trending in the data management world.

Data monetization is about harvesting direct financial results from having access to data that is stored, maintained, categorized and made accessible in an optimal manner. Traditionally data management & analytics has contributed indirectly to financial outcome by aiming at keeping data fit for purpose in the various business processes that produced value to the business. Today the best performers are using data much more directly to create new services and business models.

In my view there are three flavors of data monetization:

  • Selling data: This is something that have been known to the data management world for years. Notable examples are the likes of Dun & Bradstreet who is selling business directory data as touched in the post What is a Business Directory? Another examples is postal services around the world selling their address directories. This is the kind of data we know as third party data.
  • Wrapping data around products: If you have a product – or a service – you can add tremendous value to these products and services and make them more sellable by wrapping data, potentially including third party data, around those products and services. These data will thus become second party data as touched in the post Infonomics and Second Party Data.
  • Advanced analytics and decision making: You can combine third party data, second party data and first party data (your own data) in order to make advanced analytics and fast operational decision making in order to sell more, reduce costs and mitigate risks.

Please learn more about data monetization by downloading a recent webinar hosted by Information Builders, their expert Rado Kotorov and yours truly here.

Data Monetization

Using a Business Entity Identifier from Day One

One of the ways to ensure data quality for customer – or rather party – master data when operating in a business-to-business (B2B) environment, is to on-board new entries using an external defined business entity identifier.

By doing that, you tackle some of the most challenging data quality dimensions as:

  • Uniqueness, by checking if a business with that identifier already exist in your internal master data. This approach is superior to using data matching as explained in the post The Good, Better and Best Way of Avoiding Duplicates.
  • Accuracy, by having names, addresses and other information defaulted from a business directory and thus avoiding those spelling mistakes that usually are all over in party master data.
  • Conformity, by inheriting additional data as line-of-business codes and descriptions from a business directory.

Having an external business identifier stored with your party master data helps a lot with maintaining data quality as pondered in the post Ongoing Data Maintenance.

Busienss Entity IdentifiersWhen selecting an identifier there are different options as national IDs, LEI, DUNS Number and others as explained in the post Business Entity Identifiers.

At the Product Data Lake service I am working on right now, we have decided to use an external business identifier from day one. I know this may be something a typical start-up will consider much later if and when the party master data population has grown. But, besides being optimistic about our service, I think it will be a win not to have to fight data quality issues later with guarantied increased costs.

For the identifier to use we have chosen the DUNS Number from Dun & Bradstreet. The reason is that this currently is the only worldwide covered business identifier. Also, Dun & Bradstreet offers some additional data that fits our business model. This includes consistent line-of-business information and worldwide company family trees.

Bookmark and Share

The World of Reference Data

Google EarthReference Data Management (RDM) is an evolving discipline within data management. When organizations mature in the reference data management realm we often see a shift from relying on internally defined reference data to relying on externally defined reference data. This is based on the good old saying of not to reinvent the wheel and also that externally defined reference data usually are better in fulfilling multiple purposes of use, where internally defined reference data tend to only cater for the most important purpose of use within your organization.

Then, what standard to use tend to be a matter of where in the world you are. Let’s look at three examples from the location domain, the party domain and the product domain.

Location reference data

If you read articles in English about reference data and ensuring accuracy and other data quality dimensions for location data you often meet remarks as “be sure to check validity against US Postal Services” or “make sure to check against the Royal Mail PAF File”. This is all great if all your addresses are in the United States or the United Kingdom. If all your addresses are in another country, there will in many cases be similar services for the given country. If your address are spread around the world, you have to look further.

There are some Data-as-a-Service offerings for international addresses out there. When it comes to have your own copy of location reference data the Universal Postal Union has an offering called the Universal POST*CODE® DataBase. You may also look into open data solutions as GeoNames.

Party reference data

Within party master data management for Business-to-Business (B2B) activities you want to classify your customers, prospects, suppliers and other business partners according to what they do, For that there are some frequently used coding systems in areas where I have been:

  • Standard Industrial Classification (SIC) codes, the four-digit numerical codes assigned by the U.S. government to business establishments.
  • The North American Industry Classification System (NAICS).
  • NACE (Nomenclature of Economic Activities), the European statistical classification of economic activities.

As important economic activities change over time, these systems change to reflect the real world. As an example, my Danish company registration has changed NACE code three times since 1998 while I have been doing the same thing.

This doesn’t make conversion services between these systems more easy.

Product reference data

There are also a good choice of standardized and standardised classification systems for product data out there. To name a few:

  • TheUnited Nations Standard Products and Services Code® (UNSPSC®), managed by GS1 US™ for the UN Development Programme (UNDP).
  • eCl@ss, who presents themselves as: “THE cross-industry product data standard for classification and clear description of products and services that has established itself as the only ISO/IEC compliant industry standard nationally and internationally”. eCl@ss has its main support in Germany (the home of the Mercedes E-Class).

In addition to cross-industry standards there are heaps of industry specific international, regional and national standards for product classification.

Bookmark and Share

Making a Firmographic Analysis

What demographics are to people, firmographics are to organizations.

I am currently working with starting up a Business-to-Business (B2B) service. In order to assess the market I had to know something about how many companies there are out there who possibly could be in need of such a service.

The service will work word-wide, but adhering to the sayings about thinking globally/big and starting locally/small I have started with assessing the Danish market. Also there are easy and none expensive access to business directories for Denmark.

My first filter was selecting companies with at least 50 employees.

As the service is suitable for companies within ecosystems of manufacturers, distributors and retailers, I selected the equivalent range of industry codes. In this case it was NACE codes which resembles SIC codes and other classifications of Line-Of-Business used in other geographies.

There were circa 2,500 companies in my selection. However, some belong to the same company family tree. By doing a merge/purge with the largest company in a company family tree as the survivor, the list was down to circa 2,000 companies.

For this particular service, there are some other possibly competing approaches that are stronger for some kinds of goods than other kinds of goods. For that purpose, I made a bespoke categorization being:

  • Priority A: Building materials, furniture, houseware, machinery and vehicles.
  • Priority B: Electronics, books and clothes.
  • Priority C: Pharmaceuticals, food, beverage and tobacco.

Retailers that span several priorities were placed in priority B. Else, for this high level analysis, I only used the primary Line-Of-Business.

The result was as shown below:


So, from my firmographic analysis I know the rough size of the target market in one locality. I can assume, that other markets look more or less the same or I can do specific firmographics on other geographies. Also, I can apply first results of dialogues with entities in the breakdown model and see if the model needs a modification.

Bookmark and Share

instant Single Customer View

Achieving a Single Customer View (SCV) is a core driver for many data quality improvement and Master Data Management (MDM) implementations.

As most data quality practitioners will agree, the best way of securing data quality is getting it right the first time. The same is true about achieving a Single Customer View. Get it right the first time. Have an instant Single Customer View.

The cloud based solution I’m working with right now does this by:

  • Searching external big reference data sources with information about individuals, companies, locations and properties as well as social networks
  • Searching internal master data with information already known inside the enterprise
  • Inserting really new entities or updating current entities by picking  as much data as possible from external sources

instant Single Customer View

Some essential capabilities in doing this are:

  • Searching is error tolerant so you will find entities even if the spelling is different
  • The receiving data model is real world aligned. This includes:
    • Party information and location information have separate lives as explained in the post called A Place in Time
    • You may have multiple means of contact attached like many phones, email addresses and social identities

How do you achieve a Single Customer View?

Bookmark and Share

While we are waiting for the LEI

As told in the post Business Entity Identifiers there has been a new global numbering system for business entities on the way for some time. The wonder is called LEI (Legal Entity Identifier).

fsb-leiThe implementation work has been adapted by the Financial Stability Board. The latest developments are reported in a publication called Fifth progress note on the Global LEI Initiative.

Surely, while the implementations may be in good hands, the set up doesn’t give hope for a speedy process where every legal entity in the world in a short time will have a LEI.

And then the next question will be how long it will take before organizations will have enriched existing databases with that LEI and implemented on-boarding processes where a LEI is captured with every new insertion of party master data describing a legal entity.

A good way to start to be prepared will be to implement features in on-boarding business processes where available external reference data are captured when new party entities are added to your databases. Having best available information about names, addresses and business entity identifiers available today and a culture of capturing such information will be a great starting point.

And oh, the instant Data Quality concept is precisely all about doing that.

Bookmark and Share

Some Kinds of Reference Data

The term ”reference data” and related Reference Data Management (RDM) is used commonly in the data quality and Master Data Management (MDM) realm.

As with most terms it may be used with slightly different meanings. Usually, but not necessarily always, reference data are core data entities defined outside a given organization.

I have come across the below discussed kinds of reference data:

Reference Data in Investment Banking

The term “reference data” is well established in investment banking. Reference data are core master data entities as counterparties, securities and currencies. These are the things you deal with in investment banking. They are not made up for a given bank or other single financial institution but are shared across the whole market and should optimally be the same to every institution at exactly the same point of time.

RDMSmall Reference Data

In Master Data Management in general we usually see reference data as value lists helping describing and standardizing internal master data.

One example will be a country list. A list of countries should be the same for every organization in the world. However available lists does differ though most variations usually don’t have any business impact as the academic question about if Antarctica should be in the list or not.

A list of codes describing to which industry a given company belongs is another example of reference data. As examined in the post What are they doing? you may choose to standardize on SIC codes or standardise on NACE codes or develop your own set of codes for that purpose.

Big Reference Data

In geography a country list is in the top levels of defining locations. Further deep we may have postal code systems within each country as ZIP codes in the United States, PLZ codes in Germany and PIN codes in India. Yet further deep we have every single valid postal address eventually all over the world. This is what I call big reference data.

A way of sourcing industry codes for your customers, suppliers and other business partners will be picking from or enriching from a business directory like for example the D&B WorldBase or any other of the many business directories around. Such directories may also be seen as big reference data.

The dramatic increase in the use of social media and related social network profiles has emerged as a new kind of big reference data serving as links to our internal master data.

Bookmark and Share

Business Entity Identifiers

The least cumbersome way of uniquely identifying a business partner being a company, government body or other form of organization is to use an externally provided number.

However, there are quite a lot of different numbers to choose from.

All-Purpose National Identification Numbers

In some counties, like in Scandinavia, the public sector assigns a unique number to every company to be used in every relation to the public sector and open to be used by the private sector as well for identification purposes.

As reported in the post Single Company View I worked with the early implementation of such a number in Denmark way back in time.

Single-Purpose National Identification Numbers

In most countries there are multiple systems of numbers for companies each with an original special purpose. Examples are registration numbers, VAT numbers and employer identification numbers.

My current UK company has both a registration number and a VAT number and very embarrassing for a data quality and master data geek these two numbers have different names and addresses attached.

Other Numbering Systems

The best known business entity numbering system around the world is probably the DUNS-number used by Dun & Bradstreet. As examined in the post Select Company_ID from External_Source Where Possible the use of DUNS-numbers and similar business directory id’s is a very common way of uniquely identifying business partners.

In the manufacturing and retail world legal entities may, as part of the Global Data Synchronization Network, be identified with a Global Location Number (GLN).

There has been a lot of talk in the financial sector lately around implementing yet a new numbering system for legal entities with an identifier usually abbreviated as LEI. Wikipedia has the details about a Legal Entity Identification for Financial Contracts.

These are only some of the most used numbering systems for business entities.

So, the trend doesn’t seem to be a single source of truth but multiple sources making up some kind of the truth.

Bookmark and Share

Beyond Address Validation

The quality of contact master data is the number one data quality issue around.

Lately there has been a lot of momentum among data quality tool providers in offering services for getting at least the postal address in contact data right. The new services are improved by:

  • Being cloud based offering validation services that are implemented at data entry and based on fresh reference data.
  • Being international and thus providing address validation for customer and other party data embracing a globalized world.

Capturing an address that is aligned with the real world may have a significant effect on business outcomes as reported by the tool vendor WorldAddresses in a recent blog post.

However, a valid address based on address reference data only tells you if the address is valid, not if the addressee is (still) on the address, and you are not sure if the name and other master data elements are accurate and complete. Therefore you often need to combine address reference data with other big reference data sources as business directories and consumer/citizen reference sources.

Using business directories is not new at all. Big reference sources as the D&B WorldBase and many other directories have been around for many years and been a core element in many data quality initiatives with customer data in business-to-business (B2B) environments and with supplier master data.

Combining address reference data and business entity reference data makes things even better, also because business directories doesn’t always come with a valid address.

Using public available reference data when registering private consumers, employees and other citizen roles has until now been practiced in some industries and for special reasons. Therefore the big reference data and the services are out there and being used today in some business processes.

Mashing up address reference data, business entity reference data and consumer/citizen reference data is a big opportunity for many organizations in the quest for high quality contact master data, as most organizations actually interact with both companies and private persons if we look at the total mix of business processes.

The next big source is going to be exploiting social network profiles as well. As told in the post Social Master Data Management social media will be an additional source of knowledge about our business partners. Again, you won’t find the full truth here either. You have to mashup all the sources.

Bookmark and Share