The fuzzy post and comments including mine circles around how the relation between “Bill” and “William” must be handled in data matching.
While “Bill” and “William” may be used interchangeable in modern Anglo-Saxon data, it may be a mistake in time (anachronism) to use them interchangeable related to the grand old playwright.
Also it may be a mistake in place to use them interchangeable in other cultures.
For example in my home country Denmark “Bill” and “William” are two different names. Globalization has been going on for a long time as far more people are baptized (or given the name otherwise) William than the original Danish form Wilhelm. There are only 286 people with the name Wilhelm today opposite to 7,355 with the name William including 800 new during the last year. And then there are 353 different people with the name Bill.
But the same use of nicknames has not been localized here yet.
So with Danish data matching “Bill Nielsen” and “William Nielsen” is almost certainly a false positive.
It’s not that it’s a big problem; the risk of making the mistake is very low. The problem is rather that focus should be on different more pressing issues with specific challenges (and possibilities) related to data from each culture and country.