Stack Overflow is a community of 4.7 million programmers, just like you, helping each other.

Join them; it only takes a minute:

Sign up
Join the Stack Overflow community to:
  1. Ask programming questions
  2. Answer and help your peers
  3. Get recognized for your expertise

I am trying to import csv files into a sqlite3 database which I'm writing using python. I'm fairly new to importing data this way and to sqlite3 in general.

The data files I've got tend to have an irregular format at the beginning, but the rows I want will always begin with a date field (dd/mm/yy) and time field (hh:mm:ss). Example file is below:

Hello I am a file, , , ,
I am a type of csv file, , , ,
Date, Time, ID number, Message
12/12/2012, 13:12:13, 1, Hello World
13/12/2012, 13:12:13, 2, Goodbye

So I want to just import the 4th and 5th rows (and all subsequent rows with a date), but skip the first 3 which describe the file and column headers.

So far I've got the code which reads in the data and then puts it into a table along with a unique identifier (logID), but at the moment I'm just reading from line 4 onwards - I'd like a more general solution for the case when a file doesn't have the 3 leading rows of blurb.

#code above here opens the database and creates a cursor
with open(file) as csvfile:
    reader = csv.reader(csvfile, dialect='excel')
    for row in reader:
        if reader.line_num >= 4:
            values = logID + str(row[0]) + str(row[1]) + str(row[2]) + str(row[3])
            cursor.execute("INSERT INTO table VALUES(" +values+ ");")

Thanks in advance for your help!

share|improve this question
up vote 2 down vote accepted

Try converting the first item in row into a Python datetime.datetime object. If it works, process the line, if not, continue through the loop:

import datetime as DT
with open(file, 'rb') as csvfile:
    reader = csv.reader(csvfile, dialect='excel')
    for row in reader:
        try:
            date = DT.datetime.strptime(row[0], '%m/%d/%Y')
        except ValueError:
            continue 
    args = [logID] + row  
    sql = "INSERT INTO table VALUES ({})".format(', '.join(['?']*len(args)))
    cursor.execute(sql, args)

The above will work for small to moderate-sized CSV. If you have a huge CSV file, calling cursor.execute once per line can be slow. To improve the performance, read a chunk of (say, 1000) lines at a time, and accumulate the values to be inserted into a list of lists, manyargs. Then call

cursor.executemany("INSERT INTO table VALUES(?,?,?,?,?)", manyargs)

once per chunk.


By the way, constructing SQL commands with string formatting such as:

cursor.execute("INSERT INTO table VALUES(" +values+ ");")

can be a security risk. A maliciously constructed CSV file could take advantage of this line to run arbitrary SQL comands (see SQL injection and Little Bobby Tables). To protect against this, use parametrized SQL:

cursor.execute("INSERT INTO table VALUES(?,?,?,?,?)", args)

The ? is a placemarker -- one placemarker for each field value. The args is a sequence (such as a tuple or list) containing the values you wish to insert. Note that the args is passed as a second argument to cursor.execute. You do not have to convert the values in args to strings (though in this case, they are strings since csv.reader yields lists of strings). The sqlite3 module will handle the quoting of the arguments for you.

share|improve this answer
    
Thanks for the help - I'd spotted the datetime conversion in other solutions but wasn't sure how to implement it. I'm expecting this software to be passed data for different tables with differing numbers of columns. Can I still use parameterized SQL or do you have to be able to define the number of "?" in the statement? – Cara Dec 9 '13 at 13:04
    
I've modified the code above to show how you can handle an arbitrary number of columns. – unutbu Dec 9 '13 at 13:24
    
thank you for your help – Cara Dec 9 '13 at 13:53

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.