Do you mean deduplication or deduplication?

The term deduplication may be two different things in computing:

The storage kind of deduplication
The data quality kind of deduplication

The storage kind of deduplication refers to reducing the data volumes stored and backed up by finding exactly the same file (or other assemblies of data I guess) and eliminate all but one copy.

The data quality kind of deduplication is about finding entities in databases that don’t have a common unique key and are not spelled exactly the same but are so similar, that we may consider them representing the same real world object.

The result of the data quality kind of deduplication may be that all but one duplicate row are eliminated, but most often we actually will add more bytes by linking the duplicate rows and perhaps make a new golden record.

This disambiguation sometimes leads to mixing it all up.

I remember some years ago when I started as employee number no 1 in Omikron Data Quality in the Nordics we made a meeting booking campaign. This was done by a telemarketing bureau. They booked a lot of meetings for me including one at a company that was very interested in tools for deduplication.

It was a very strange meeting until that we after 12 minutes and 34 seconds concluded, that indeed there are two kinds of deduplication in computing.

Also I noticed lately that a leading vendor of the data quality kind of deduplication tools promoted their product by referring to articles on cost savings and more related to the storage kind of deduplication.

	Henrik Gabs Lilienda… on Balancing the Business Partner…
	Jeppe Thing Sørensen on Balancing the Business Partner…
	peolsolutions on MDM, Cloud, SaaS, PaaS, IaaS a…
	Henrik Gabs Lilienda… on Is the Holiday Season called C…
	Michael D. on Is the Holiday Season called C…
	Jay Ram on The Disruptive MDM List is…
	Henrik Gabs Lilienda… on The Intersection of Data Obser…
	Shanker on The Intersection of Data Obser…
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on Data Matching Efficiency
	Bhavani Shanker on Data Matching Efficiency
	Henrik Gabs Lilienda… on From Platforms to Ecosyst…
	Michael Fieg on From Platforms to Ecosyst…
	From Platforms to Ec… on What is Collaborative Product…
	From Platforms to Ec… on MDM and Knowledge Graph

Liliendahl on Data Quality

A blog about Master Data Management, Product Information Management, Data Quality Management and more

Do you mean deduplication or deduplication?

Related

Leave a comment Cancel reply