Thursday, February 7, 2008

Find and remove duplicate rows from a table

One of the most important features of Oracle is the ability to detect and remove duplicate rows from a table. While many Oracle DBA place primary key referential integrity constraints on a table, many shops do not use RI because they need the flexibility.

The most effective way to detect duplicate rows is to join the table against itself as shown below.

SELECT

BOOK_UNIQUE_ID,

PAGE_SEQ_NBR,

IMAGE_KEY

FROM

page_image A

WHERE

rowid >

(SELECT min(rowid) FROM page_image B

WHERE

B.key1 = A.key1

and

B.key2 = A.key2

and

B.key3 = A.key3

);

Please note that you must specify all of the columns that make the row a duplicate in the SQL where clause. Once you have detected the duplicate rows, you may modify the SQL statement to remove the duplicates as shown below:

DELETE FROM

table_name A

WHERE

A.rowid >

ANY (SELECT B.rowid

FROM

table_name B

WHERE

A.col1 = B.col1

AND

A.col2 = B.col2

);

You can also detect and delete duplicate rows using Oracle analytic functions:

delete from

customer

where rowid in

(select rowid from

(select

rowid,

row_number()

over

(partition by custnbr order by custnbr) dup

from customer)

where dup > 1);

Simple syntax to delete duplicate rows from a table

DELETE FROM our_table

WHERE rowid not in

(SELECT MIN(rowid)

FROM our_table

GROUP BY column1, column2, column3... );

1 comment:

Pavan Turlapati said...

Atlast,I was active and opened your blog and found it is very useful, because it showed a database administrator's internal mind.

This particular post is very useful and is like a key to me.

I had worked on the same logic in the past few months and used to fail miserably creating new problems for me.

But I was able to find the real problem that
IsNull(column1,'')=IsNull(column2,'') should be used in comparing values instead of the regular column1=column2.

Above all, great work, keep working; Rock your life because it is meant for that..
Cheers,
Pavan