Reference Data Management (RDM) is an evolving discipline within data management. When organizations mature in the reference data management realm we often see a shift from relying on internally defined reference data to relying on externally defined reference data. This is based on the good old saying of not to reinvent the wheel and also that externally defined reference data usually are better in fulfilling multiple purposes of use, where internally defined reference data tend to only cater for the most important purpose of use within your organization.
Then, what standard to use tend to be a matter of where in the world you are. Let’s look at three examples from the location domain, the party domain and the product domain.
Location reference data
If you read articles in English about reference data and ensuring accuracy and other data quality dimensions for location data you often meet remarks as “be sure to check validity against US Postal Services” or “make sure to check against the Royal Mail PAF File”. This is all great if all your addresses are in the United States or the United Kingdom. If all your addresses are in another country, there will in many cases be similar services for the given country. If your address are spread around the world, you have to look further.
There are some Data-as-a-Service offerings for international addresses out there. When it comes to have your own copy of location reference data the Universal Postal Union has an offering called the Universal POST*CODE® DataBase. You may also look into open data solutions as GeoNames.
Party reference data
Within party master data management for Business-to-Business (B2B) activities you want to classify your customers, prospects, suppliers and other business partners according to what they do, For that there are some frequently used coding systems in areas where I have been:
- Standard Industrial Classification (SIC) codes, the four-digit numerical codes assigned by the U.S. government to business establishments.
- The North American Industry Classification System (NAICS).
- NACE (Nomenclature of Economic Activities), the European statistical classification of economic activities.
As important economic activities change over time, these systems change to reflect the real world. As an example, my Danish company registration has changed NACE code three times since 1998 while I have been doing the same thing.
This doesn’t make conversion services between these systems more easy.
Product reference data
There are also a good choice of standardized and standardised classification systems for product data out there. To name a few:
- TheUnited Nations Standard Products and Services Code® (UNSPSC®), managed by GS1 US™ for the UN Development Programme (UNDP).
- eCl@ss, who presents themselves as: “THE cross-industry product data standard for classification and clear description of products and services that has established itself as the only ISO/IEC compliant industry standard nationally and internationally”. eCl@ss has its main support in Germany (the home of the Mercedes E-Class).
In addition to cross-industry standards there are heaps of industry specific international, regional and national standards for product classification.
Nice post, Henrik.
As a player in a smaller country it is interesting to see what external reference data we have available by comparison to the big nations.
SIC codes are an interesting one. We do a lot of work in financial services and SIC codes are a key risk indicator. South Africa has its own SIC tables – (in fact a couple of versions thereof). These bear a resemblance to teh US and UK tables but, for example, we have a large number of codes for various minerals due to our extremely diverse mining sector
At one client we were provided a Data Quality Requirements Specification, developed by one of the Big Four consulting firms, that specified that we should test the SIC codes used by the client against the international SIC tables. Would have caused carnage had we not known better…
In many cases one needs to understand which external standard to use – as described in the post http://blog.masterdata.co.za/2011/10/12/bad-data-when-men-are-dressed-as-women/
Thanks a lot for commenting Gary. Good example with the need for better granularity with industry codes for the mining sector in South Africa. Better/bespoke granularity is also often the reason why organizations use internal defined reference data instead of or in addition to using public defined reference data.
Excellent post, as always.
Regarding international addresses, a number of private vendors provide address hygiene services using the POST*CODE database supplemented by additional data from verified sources to other companies. While this is not reference data, it eliminates the need for complex programming and provides a solution for small numbers of addresses from multiple countries.
Nice post Henrik,
I feel reference data is one of those things that gets less play in MDM discussions than it should. We, at Verdantis, generally work with UNSPSC, but have also setup systems with clients’ home-grown classification rules. While it would be a great help if all of us can agree on a single classification standard, the chances of that happening looks bleak. Do you see this happening anytime in the near future?
Thanks Merry and Vipul for adding in.
We are probably not going to see any world-wide standards for everything including addresses and product classifications as this planet can’t even agree on a standard for very common things as a date format, the length of a thing and so much more.