When doing data matching with company names a basic challenge is that a proper company name in most cultures in most cases have two elements:
- The actual company name
- The legal form
Some worldwide examples:
- Informatica Corporation
- Talend SA
- SAP Deutschland AG & Co. KG
- Sony Kabushiki Kaisha
- LEGO A/S
There are hundreds of different legal forms in full and abbreviated forms. Wikipedia has a list here (here called types of business entity).
However, when typing in company names in databases the legal form is often omitted. And even where legal forms are present they may be represented differently in full or abbreviated forms, with varying spelling and punctuation and so on. As the actual company names also suffer from this fuzziness, the complexity is overwhelming.
A common way of handling this issue in data matching is to separate the legal form and then emphasize on comparing the remaining part being the actual company name. When doing that it has to be done country specific or else you may remove the entire name of a company like with a name of an Italian company called Société Anonyme, which is a French legal form.
While the practice of having legal forms in company names may serve well for the original purpose of knowing the risk of doing business with that entity, it is certainly not serving the purpose of having the uniqueness data quality dimension solved.
One should think that it is time for changing the bad (legal demanded) practice of mixing legal forms with company names and serve the original purpose in another more data quality friendly way.