What's the best way to delete duplicate records in a mysql database using rails or mysql queries?
|
What you can do is copy the distinct records into a new table by:
|
|||||||
|
Here's another idea in no particular language:
Edit: Kudos to Olaf for that "having" hint :) |
||||
well, if it's a small table, from rails console you can do
|
|||
|
Check for Duplicate entries :
Remove Duplicate Queries :
Replace req_field and table_name - should work without any issues. |
|||
|
New to SQL :-) This is a classic question - often asked in interviews:-) I don't know whether it'll work in MYSQL but it works in most databases -
|
|||||
|
If you have PK (id) in table (EMP) and want to older delete duplicate records with name column. For large data following query may be good approach.
|
||||
|
I am using
|
||||
|
suppose we have a table name tbl_product and there is duplicacy in the field p_pi_code and p_nats_id in maximum no of count then
first create a new table insert the data from existing table ...
after that we see all the duplicacy in the field is removed |
||||
|
I had to do this recently on Oracle, but the steps would have been the same on MySQL. It was a lot of data, at least compared to what I'm used to working with, so my process to de-dup was comparatively heavyweight. I'm including it here in case someone else comes along with a similar problem. My duplicate records had different IDs, different I used a combination of Rails logic and SQL to get it done. Step one: run a rake script to identify the IDs of the duplicate records, using model logic. IDs go in a text file. Step two: create a temporary table with one column, the IDs to delete, loaded from the text file. Step three: create another temporary table with all the records I'm going to delete (just in case!).
Step four: actual deleting.
|
|||
|
You can use: http://lenniedevilliers.blogspot.com/2008/10/weekly-code-find-duplicates-in-sql.html to get the duplicates and then just delete them via Ruby code or SQL code (I would do it in SQL code but thats up to you :-) |
|||
|
If your table has a PK (or you can easily give it one), you can specify any number of columns in the table to be equal (to qualify is as a duplicate) with the following query (may be a bit messy looking but it works):
) This will leave the first record entered into the database, deleting the 'newest' duplicates. If you want to keep the last record, switch the > to <. |
|||
|
In MySql when I put something like
mySql said something like "you can't use the same table in the select part of the delete operation." I've just have to delete some duplicate records, and I have succeeded with a .php program like that
|
|||
|
I used @krukid's answer above to do the following on a table with around 70,000 entries:
|
|||
|