Say you have a table with a lot of names, postal addresses, phone numbers and e-mail addresses and you want to remove duplicate rows in this table. Duplicates may be spelled exactly the same, but may also be spelled somewhat different, but still describe the same real world individual or company.
You can do the deduplicating with a spreadsheet.
In old times some spreadsheets had a limit of number of rows to be processed like the 64,000 limit in Excel, but today spreadsheets can process a lot of rows.
In this case you may have the following columns:
- Name (could be given name and surname or a company name)
- House number

- Street name
- Postal code
- City name
- Phone number
- E-mail address
What you do is that first you sort the sheet by name, then postal code and then street name.
Then you browse down all the rows and focus at one row at the time and from there looks up and down if the rows before or after seems to duplicates. If so, you delete all but one row being the same real world entity.
When finished with all the rows sorted by name, postal code and street name you make an alternate sort, because some possible duplicates may not begin with the same letters in the name field.
So what you do is that you sort the sheet by postal code and then street name and then house number.
Then you browse down all the rows and focus at one row at the time and from there looks up and down if the rows before or after seems to duplicates. If so, you delete all but one row being the same real world entity.
When finished with all the rows sorted by postal code, street name and house number you make an alternate sort, because some possible duplicates may not have the proper postal code assigned or the street name may not start with the same letters.
So what you do is that you sort the sheet by city name and then house number and then name.
Then you browse down all the rows and focus at one row at the time and from there looks up and down if the rows before or after seems to duplicates. If so, you delete all but one row being the same real world entity.
When finished with all the rows sorted by postal code, street name and house number you make an alternate sort, because some duplicates may have moved or have different addresses for other reasons .
So what you do is that you sort the sheet by phone number, then by name and then by postal code.
Then you browse down all the rows and focus at one row at the time and from there looks up and down if the rows before or after seems to duplicates. If so, you delete all but one row being the same real world entity.
When finished with all the rows sorted by phone number, name and then by postal code you make an alternate sort, because some duplicates may not have a phone number or may have different phone numbers.
So what you do is that you sort the sheet by e-mail address, then by name and then by postal code.
Then you browse down all the rows and focus at one row at the time and from there looks up and down if the rows before or after seems to duplicates. If so, you delete all but one row being the same real world entity.
You may:
- If you only have a few rows do this process within a few hours and possibly find all the duplicates
- If you have a lot of rows do this process within a few years and possibly find some of the duplicates
PS: The better option is of course avoiding having duplicates in the first place. Unfortunately this is not the case in many situations – here is The Top 5 Reasons for Downstream Cleansing.










