I wrote this python script to import a specific xls file into mysql. It works fine but if it's run twice on the same data it will create duplicate entries. I'm pretty sure I need to use MySQL JOIN
but I'm not clear on how to do that. Also is executemany() going to have the same overhead as doing inserts in a loop? I'm obviously trying to avoid that.
Here's the code in question...
for row in range(sheet.nrows):
"""name is in the 0th col. email is the 4th col."""
name = sheet.cell(row, 0).value
email = sheet.cell(row, 4).value
if name and email:
mailing_list[name.lstrip()] = email.strip()
for n, e in sorted(mailing_list.iteritems()):
rows.append((n, e))
db = MySQLdb.connect(host=host, user=user, db=dbname, passwd=pwd)
cursor = db.cursor()
cursor.executemany("""
INSERT IGNORE INTO mailing_list (name, email) VALUES (%s,%s)""",(rows))
CLARIFICATION...
I read here that...
To be sure, executemany() is effectively the same as simple iteration. However, it is typically faster. It provides an optimized means of affecting INSERT and REPLACE across multiple rows.
Also I took Unodes suggestion and used the UNIQUE constraint. But the IGNORE
keyword is better than ON DUPLICATE KEY UPDATE
because I want it to fail silently.
TL;DR
1. What's the best way prevent duplicate inserts?
ANSWER 1: UNIQUE contraint on column with SELECT IGNORE
to fail silently or ON DUPLICATE KEY UPDATE
to increment the duplicate value and insert it.
- Is executemany() as expensive as INSERT in a loop?
@Unode says it's not but my research tells me otherwise. I would like a definitive answer. - Is this the best way or is it going to be really slow with bigger tables and how would I test to be sure?
unique
constraint on column, where you want to avoid duplicity