So let's say I have csv files which I have no control over:
a.csv
b.csv
c.csv
and they all have different header names. I dumped all the data into tables a, b, and c. Now, if I get another a.csv with new values (same header fields), how can I insert only the values of the new a which are not in the old one?
For example: The a table has headers name and age:
'Bob' 25
'Mary' 50
And I get a new a.csv which parses out to:
'Bob' 25
'Susie' 60
How do I add only the lines that are unique to the current table (e.g. only add Susie and not Bob to the a table)? I do not have a specific unique ID for each one, so I can't use primary key. There are also multiple header fields, so if I try to use all the header fields as the primary keys it returns with "Specified key was too long".
I need to check that the entire line is unique, then if it is, add it to the table. I tried INSERT IGNORE, but with the lack of a unique key I can't get it to work correctly. Any suggestions? I'll post any additional info if it would help.
Current attempt:
cursor.execute("ALTER TABLE temp ADD PRIMARY KEY" + uniqueline)
cursor.execute("INSERT IGNORE INTO " + tablename + " SELECT * FROM temp")
Where tablename is the name of the table, temp is where the csv code was sent, and uniqueline is CURRENTLY the first 5 fields in the form (field1, field2, field 3, field4, field5). If there is less than 5 fields, then all the fields.
Thanks!
Edit:
cursor.execute("INSERT INTO " + tablename + " SELECT * FROM temp where " + uniqueline + " NOT IN (SELECT * FROM " + tablename + ")")
It works once (with empty tables), but if I run it again to test it essentially freezes, and doesn't ever finish. Now I have these "phantom tables" that if I try to drop it says "unknown table" but if I try to create it it says "table already exists". I also can't add or delete anything from the table without it freezing. I'm going to try giving it a unique index again. Thanks for all your guys' help though!