Dismiss
Announcing Stack Overflow Documentation

We started with Q&A. Technical documentation is next, and we need your help.

Whether you're a beginner or an experienced developer, you can contribute.

Sign up and start helping → Learn more about Documentation →

I Start new Project in Python with MySQL.

I just try to insert millions of record from CSV to MySQL through MySQLdb package.

My Code:

 import pandas as pd 
 import MySQLdb

 #Connect with MySQL
 db = MySQLdb.connect('localhost','root','****','MY_DB')
 cur = db.cursor()

 #Reading CSV
 df = pd.read_csv('/home/shankar/LAB/Python/Rough/******.csv')

 for i in df.COMPANY_NAME:
    i = i.replace("'","")
    i = i.replace("\\","")
   #i = i.encode('latin-1', 'ignore')
    cur.execute("INSERT INTO polls_company (name) VALUES ('" + i + "')")
 db.commit()

This code working fine in some sort of CSV files, but having issues in few CSV files.

Errors :

 ---------------------------------------------------------------------------
 UnicodeEncodeError                        Traceback (most recent call last)
 <ipython-input-7-aac849862588> in <module>()
 13     i = i.replace("\\","")
 14     #i = i.encode('latin-1', 'ignore')
 ---> 15     cur.execute("INSERT INTO polls_company (name) VALUES ('" + i + "')")
 16 db.commit()

 /home/shankar/.local/lib/python3.5/site-packages/MySQLdb/cursors.py in execute(self, query, args)
 211 
 212         if isinstance(query, unicode):
 --> 213             query = query.encode(db.unicode_literal.charset, 'surrogateescape')
 214 
 215         res = None

 UnicodeEncodeError: 'latin-1' codec can't encode character '\ufffd' in position 49: ordinal not in range(256)

Here, this "Character Encoding" issue is occurred in some CSV files only, but i want automatic Insertion with common encoding techniques.

Because CSV Files encoded works with "utf-8", "latin-1" and more...

If i use utf-8 : then i got error in latin-1 and vise versa

So, Is there any ways to operate all kind of CSV file with common encoding

or

any other ways to solve this ?

[Thanks in Advance...]

share|improve this question

I would let the pandas take care of encoding and you don't need to loop through your DF. Let's do it pandas way:

import pandas as pd 
import MySQLdb

#Connect with MySQL
db = MySQLdb.connect('localhost','root','****','MY_DB')
cur = db.cursor()

#Reading CSV
df = pd.read_csv('/home/shankar/LAB/Python/Rough/******.csv')

df.COMPANY_NAME.str.replace(r"['\]*", "").rename(columns={'COMPANY_NAME':'name'}).to_sql('polls_company', db, if_exists='append', index=False)
share|improve this answer
    
Then what abt, if i process more than one column means ? – Shankar Thiyagaraajan Aug 22 at 15:30
    
But this code generate error as "error: unterminated character set at position 0" – Shankar Thiyagaraajan Aug 22 at 15:31

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.