Follow

How To Clean Data In Excel -- Deleting Duplicates

If you want a dataset with unique rows, you will need to look for duplicates. 

First, sort your column alphabetically. Select a column, go to Data -> Sort , then select the column you want to organize and select 'A to Z'.

Now that everything is ordered alphabetically by the column you want to check, you can use the IF formula to highlight duplicates. The way you do it is by comparing one cell to the one above. The IF formula works like this: 

Excel formula:
=IF(logical_text, [value_if_true],[value_if_false])

Explained: =IF([logical test],[If logical is true then what?],[If logical test is false then what?])

In our case we want to start by comparing cell C3 to cell C2 to see if they are the same, this is the logical test. If they are the same we want the cell we are using to check for duplicates to write Duplicate (value_if_true) and if they are not the same (value_if_false), the cell should just be blank. The formula will look like this:

Excel formula: 
=IF(C3=C2,"Duplicate","")

The quotation marks tells Excel that it should be considered plain text, this still works even though you don't add any text between the quotation marks which is why the cell will appear blank. In the image below, you will see that 'Georg Lucas' is recognized as a duplicate, all the other cells are blank because the logical test is false and thus returns a blank cell. 

Your final step is to use the paste special -> values for the 'Check for duplicates' column, then you sort the column alphabetically so all the duplicates congregate before you highlight the rows and delete them.

Voila! You have now cleaned your dataset without spending hours. These are only some of the things you can do, what other tips do you know of that can clean your data? Let us know in the comment section below. 

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments