8

I am trying to insert a pandas DataFrame into a Postgresql DB (9.1) in the most efficient way (using Python 2.7).
Using "cursor.execute_many" is really slow, so is "DataFrame.to_csv(buffer,...)" together with "copy_from".
I found an already much! faster solution on the web (http://eatthedots.blogspot.de/2008/08/faking-read-support-for-psycopgs.html) which I adapted to work with pandas.
My code can be found below.
My question is whether the method of this related question (using "copy from stdin with binary") can be easily transferred to work with DataFrames and if this would be much faster.
Use binary COPY table FROM with psycopg2
Unfortunately my Python skills aren't sufficient to understand the implementation of this approach.
This is my approach:


import psycopg2
import connectDB # this is simply a module that returns a connection to the db
from datetime import datetime

class ReadFaker:
    """
    This could be extended to include the index column optionally. Right now the index
    is not inserted
    """
    def __init__(self, data):
        self.iter = data.itertuples()

    def readline(self, size=None):
        try:
            line = self.iter.next()[1:]  # element 0 is the index
            row = '\t'.join(x.encode('utf8') if isinstance(x, unicode) else str(x) for x in line) + '\n'
        # in my case all strings in line are unicode objects.
        except StopIteration:
            return ''
        else:
            return row

    read = readline

def insert(df, table, con=None, columns = None):

    time1 = datetime.now()
    close_con = False
    if not con:
        try:
            con = connectDB.getCon()   ###dbLoader returns a connection with my settings
            close_con = True
        except psycopg2.Error, e:
            print e.pgerror
            print e.pgcode
            return "failed"
    inserted_rows = df.shape[0]
    data = ReadFaker(df)

    try:
        curs = con.cursor()
        print 'inserting %s entries into %s ...' % (inserted_rows, table)
        if columns is not None:
            curs.copy_from(data, table, null='nan', columns=[col for col in columns])
        else:
            curs.copy_from(data, table, null='nan')
        con.commit()
        curs.close()
        if close_con:
            con.close()
    except psycopg2.Error, e:
        print e.pgerror
        print e.pgcode
        con.rollback()
        if close_con:
            con.close()
        return "failed"

    time2 = datetime.now()
    print time2 - time1
    return inserted_rows
3
  • 1
    any update which of the answers yieled best performance? Commented May 17, 2014 at 5:41
  • 1
    None of the answers actually answered my question. Commented Sep 29, 2015 at 8:57
  • Check this question: stackoverflow.com/questions/41875817/…. There is quite complete answer. Commented Jan 29, 2017 at 12:24

2 Answers 2

0

Pandas dataframes now have a .to_sql method. Postgresql is not supported yet, but there's a patch for it that looks like it works. See the issues here and here.

Sign up to request clarification or add additional context in comments.

2 Comments

That does not help since it simply uses a standard method which is significantly slower.
.to_sql is as slow as well. I'm using python 2.7
-1

I have not tested the performance, but maybe you can use something like this:

  1. Iterate thru the rows of the DataFrame, yielding a string representing a row (see below)
  2. Convert this iterable in a stream, using for example Python: Convert an iterable to a stream?
  3. Finally use psycopg's copy_from on this stream.

To yield rows of a DataFrame efficiently use something like:

    def r(df):
            for idx, row in df.iterrows():
                    yield ','.join(map(str, row))

1 Comment

This also streams a CSV which is what my class above is doing. My question is about whether a binary representation would yield a faster result when creating the stream.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.