Remove duplicate rows in MySQL

Question

I have a table with the following fields:

id (Unique)
url (Unique)
title
company
site_id

Now, I need to remove rows having same title, company and site_id. One way to do it will be using the following SQL along with a script (PHP):

SELECT title, site_id, location, id, count( * ) 
FROM jobs
GROUP BY site_id, company, title, location
HAVING count( * ) >1

After running this query, I can remove duplicates using a server side script. But, I want to know if this can be done only using SQL query.

Quick question: do always want duplicate (title, company, site_id) to not exist? If so, I'd set up a constraint in the database to enforce title, company, and site_id to be unique. Which would mean you wouldn't need a cleanup process. And it only takes a single line of SQL.
Please refer this link of stackoverflow.It worked for me as a charm.

Chris Henry · Accepted Answer · 2010-07-22 18:24:05Z

up vote 44 down vote accepted

A really easy way to do this is to add a UNIQUE index on the 3 columns. When you write the ALTER statement, include the IGNORE keyword. Like so:

ALTER IGNORE TABLE jobs ADD UNIQUE INDEX idx_name (site_id, title, company );

This will drop all the duplicate rows. As an added benefit, future INSERTs that are duplicates will error out. As always, you may want to take a backup before running something like this...

answered Jul 22 '10 at 18:24

Chris Henry
3,1481119

1

Interesting, but the assumptions the IGNORE clause makes for removing those duplicates is a concern that might not match needs. Incorrect values being truncated to the closest acceptable match sound good to you? – OMG Ponies Jul 22 '10 at 18:32

In this particular case, that's definitely true. The collation of the title and company columns definitely matter. What, exactly, does incorrect values mean? I smell another question... – Chris Henry Jul 22 '10 at 19:08

this did the job, thanks a lot! – Chetan Jul 22 '10 at 19:26

2

Just for the record if your using InnoDB then you may have an issue with it, there is a known bug about using ALTER IGNORE TABLE with InnoDB databases. – DarkMantis Jan 7 at 16:57

1

The aforementioned bug @DarkMantis referred to and it's solution. – Jordan Arseno Jan 23 at 20:47

show 5 more comments

Andomar · Answer 2 · 2010-07-22 18:26:48Z

MySQL has restrictions about referring to the table you are deleting from. You can work around that with a temporary table, like:

create temporary table tmpTable (id int);

insert  tmpTable
        (id)
select  id
from    YourTable yt
where   exists
        (
        select  *
        from    YourTabe yt2
        where   yt2.title = yt.title
                and yt2.company = yt.company
                and yt2.site_id = yt.site_id
                and yt2.id > yt.id
        );

delete  
from    YourTable
where   ID in (select id from tmpTable);

@andomar, this works fine except when one of the fields in the where clause contain nulls. Example: sqlfiddle.com/#!2/983f3/1
Is the Insert SQL an expensive one? I'm wondering because it times out in my MySQL database.

kamylko · Answer 3 · 2013-01-31 10:10:53Z

If IGNORE statement won't work like in my case, you can use:

CREATE TABLE your_table_deduped like your_table;
INSERT your_table_deduped SELECT * FROM your_table GROUP BY index1_id, index2_id;
RENAME TABLE your_table TO your_table_with_dupes;
RENAME TABLE your_table_deduped TO your_table;
#OPTIONAL
ALTER TABLE `your_table` ADD UNIQUE `unique_index` (`index1_id`, `index2_id`);
#OPTIONAL
DROP TABLE your_table_with_dupes;

eiefai · Answer 4 · 2010-07-22 18:33:41Z

up vote 1 down vote

I have this query snipet for SQLServer but I think It can be used in others DBMS with little changes:

DELETE
FROM Table
WHERE Table.idTable IN  (  
    SELECT MAX(idTable)
    FROM idTable
    GROUP BY field1, field2, field3
    HAVING COUNT(*) > 1)

I forgot to tell you that this query doesn't remove the row with the lowest id of the duplicated rows. If this works for you try this query:

DELETE
FROM jobs
WHERE jobs.id IN  (  
    SELECT MAX(id)
    FROM jobs
    GROUP BY site_id, company, title, location
    HAVING COUNT(*) > 1)

edited Jul 22 '10 at 18:33

answered Jul 22 '10 at 18:22

eiefai
371111

That won't work if there's more than two duplicates of a group. – OMG Ponies Jul 22 '10 at 18:23

4

Unfortunately, MySQL does not allow you to select from the table you are deleting from ERROR 1093: You can't specify target table 'Table' for update in FROM clause – Andomar Jul 22 '10 at 18:29

OMG Ponies, I know that, this is just a snipet that I use sometimes and seemed to fit the question, thats why I said that It needed to be changed. Thanks for the comment. Andomar, I didn't know that. Thanks to you too. – eiefai Jul 22 '10 at 18:43

Michael Tel · Answer 5 · 2013-05-21 20:51:56Z

I like to be a bit more specific as to which records I delete so here is my solution:

delete
from jobs c1
where not c1.location = 'Paris'
and  c1.site_id > 64218
and exists 
(  
select * from jobs c2 
where c2.site_id = c1.site_id
and   c2.company = c1.company
and   c2.location = c1.location
and   c2.title = c1.title
and   c2.site_id > 63412
and   c2.site_id < 64219
)

asked	2 years ago
viewed	21828 times
active	30 days ago

Remove duplicate rows in MySQL

5 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged sql mysql query duplicate-removal or ask your own question.

Linked

Remove duplicate rows in MySQL

5 Answers

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged sql mysql query duplicate-removal or ask your own question.

Linked

Related