Today I stumbled upon an article from Australia on BMC: Medical Informatics and Decision Making. The article is called The effect of data cleaning on record linkage quality.
The result of the described research is:
“Data cleaning made little difference to the overall linkage quality, with heavy cleaning leading to a decrease in quality. Further examination showed that decreases in linkage quality were due to cleaning techniques typically reducing the variability – although correct records were now more likely to match, incorrect records were also more likely to match, and these incorrect matches outweighed the correct matches, reducing quality overall.”
What are your experiences?