With the risk of having the comment area on this blog filled up with SQL statements I will follow the track and tone from the last post called Create Table Homo_Sapiens.
In the last post some challenges around modelling people in databases was discussed with focus on uniqueness. Now we will have a look at the same challenges with companies – the other big part of party master data.
Companies often act in the same role as individual people in business processes – not at least in the role as a customer. Companies also behave as persons in a lot of ways like being born (establish), change name, relocate, marry (mergers and acquisitions), divorce (split) and decease (dissolve).
All over the world a lot of people spend the days entering and updating the data held on business partners in numerous databases. The world wide sum of B2B connections between a customer and a vendor each entering and maintaining the data about the other resembles (though less aggressive) the grains on a chessboard story:
- 2 companies both exchanging goodies makes 1+1 customers and 1+1 vendors = 4 rows
- 3 companies all exchanging goodies makes 2+2+2 customers and 2+2+2 vendors = 12 rows
- 4 companies all exchanging goodies makes 3+3+3+3 customers and 3+3+3+3 vendors = 24 rows
- 5 companies all exchanging goodies makes 4+4+4+4+4 customers and 4+4+4+4+4 vendors = 40 rows
- n companies all exchanging goodies makes n*(n-1) customers and n*(n-1) vendors = 2*n*(n-1) rows
Last time I checked the D&B WorldBase held more the 150 millions companies. Some are dissolved and fortunately? everyone doesn’t do business with everyone – but as said, the sum of B2B connections is huge and the work in entering and maintaining the master data seems overwhelming.
If we look at one single company and how it may be represented differently in databases around only taking basic data as name and address into account, there will be lots of variations. Even in the same table the same real world company often occupies several rows spelled differently.
One of the most effective methods for avoiding duplicates of company master data is plugging into a business directory. By using an external sourced company ID as a key in your master data you are able to hold a golden record of that real world entity. As a bonus you are offered updates and access to a lot of additional data you would never be able to collect yourself.
I’m not sure about other countries, but in the UK, if a registered company has been dissolved, you are free to re-use the company name. Perhaps both the dissolved, and the new company ordered office furniture from ‘Office Supplies PLC’. As you stated, please ensure a unique key is derived for each company record.
Another DQ point also worth pointing out, which I found in a rejection table within the ETL process at a UK Retail Bank – If you have customers with ‘,’s in their business name, ensure that your ETL processes are not set up to process as comma-delimited.
Goof post, and something I’ve written about myself in the past. Can anyone recommend organisation reference data other than D&B?!
* * *
Another excellent post Henrik,
I couldn’t help but make another SQL joke – perhaps there is a SQL Geek Support Group I could INNER JOIN?
Interesting – but does one:
a) Store the whole universe of D&B data
b) Do a period batch update of internalexternal data
3) Build a webservice
I am also intrigued by the problems of subtle misspellings which prevent a Boolean match.
For example (in the absence of MDM), companies are often suffixed GmBH, Ltd, PVY or labelled colloquially by each region/territory/sales rep.
This often involves the development of fuzzy matching techniques to bring together subtly different naming conventions.
Thanks Phil, Simon, Jim and Stuart.
Business directory integration most often has to start with that the existing customers/vendors are matched against the business directory in order to assign every row possible with the ID in the business directory, e.g. the Duns_Number.
Name is as Phil remarks of course not a very useful joining element and as Stuart says very often the name and address is not spelled exactly the same in the customer/vendor table to be matched and the business directory.
So theoretically you will have to compare every row in the customer/vendor table with every row in the business directory for fuzzy similarity comparison which should be done a number of times that equals Jim’s cartesian product. Therefore you also have to introduce fuzzy candidate selection – and as a person having built match solutions with the D&B WorldBase, I could talk hours about that subject.
Next step could be that you introduce a function where any new customer/vendor is picked from the business directory that may be stored on premise as a copy or accessed remote (in the cloud). But I have also participated in implementations were a daily production of new party records are automatically matched with the business directory using a webservice.
The D&B WorldBase is of course a very obvious choice if you have international data as the database has a consistent layout for all countries and have world wide business family trees.
Eurocontactpool covers a range of European countries.
For national data I think every country have a range of providers each having some pros and cons.
Most business directories are based on public sector company registrations which have very varying completeness and standardisation between countries – that fact is also seen in the D&B WorldBase.
Nice one Henrik!!!
As u rightly depict ‘golden record of that real world entity’ is what today’s organizations are struggling to achieve and thats truly a business challenge. If D & B or similar other information providers hold a universal database of clean records (i know am too ambitious!!!)it would make life simple for us 🙂
Good post ..
.. there is a few more dimensions that make this issue even more complex, for example:
– companies have different locations (e.g. branch offices) with different addresses, phone numbers, contact persons etc.
– companies are “related” to another .. e.g. company A owns company B or (still more complex) company D owns 50% of company E.
All of this is relevant when evaluating “creditworthiness” but might not even be public knowledge. I’m pretty sure that this information is not available from D&B or anywhere else.
So the next question then is .. what’s the business case for “make vs. buy” of the information? I’m still trying to figure that out …
Thanks Satesh and Thorsten
The points you are making is exactly what I mean with that such business directories are dependent on public sector data in different countries.
For example for Denmark uniqueness is pretty good in public sector data (except for governmental bodies), multiple business addresses are covered and even partly ownership is available.
But surely you will always find some internal registrations that are correct but not found in a business directory. I made a post on that subject a while ago called The art of Business Directory Matching.
It’s worth noting that D&B consider there to be an ownership link when one company owns 50% or more of another, so they do track that kind of part ownership information.