How can I copy data from a SQL_ASCII to a UTF8 postgresql db using SQLAlchemy?

Question

I am writing a db data migration tool using SQLAlchemy's expression language as the basic tool.

My source database may be in UTF8, or it may be in SQL_ASCII. My target DB will always be in UTF8.

I'm using the psycopg2 driver in SQLAlchemy 0.6.6

My general migration process looks like this:

for t in target_tables:
    log.info("Migrating data from %s", t.fullname)
    source = self.source_md.tables[self.source_schema + "." + t.name]
    for row in source.select().execute():
        with sql_logging(logging.INFO):
            conn.execute(t.insert(), row)

If I don't set anything encoding-related on the engines, I get this when I iterate over the select() results:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

If I set use_native_unicode=True, encoding='utf-8' on the engines, I get this when I attempt to insert the new row:

sqlalchemy.exc.DataError: (DataError) invalid byte sequence for encoding "UTF8": 0xeb6d20
HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
 'INSERT INTO project_ghtests_survey000005.employees (first_name, employee_id) VALUES (%(first_name)s, %(employee_id)s)' {'first_name': 'Art\xebm', 'employee_id': '1234'}

Update details

To make inquires a bit quicker, here's the software stack in play:

source_db encoding: SQL_ASCII
target_db encoding: UTF8
python 2.7
sqlalchemy 0.6.6
psycopg2 2.2.2
PostgreSQL 8.2 server

Chris R · Accepted Answer · 2010-12-16 19:38:19Z

1

It turns out that the solution was to set the connection client_encoding to 'latin1'

I accomplished this using a PoolListener like so:

class EncodingListener(PoolListener):

    def connect(self, dbapi_con, con_record):
        with closing(dbapi_con.cursor()) as cur:
            cur.execute('show client_encoding')
            encoding = cur.fetchone()[0]

        if encoding.upper() == 'UTF8':
            return

        dbapi_con.set_client_encoding('latin1')

answered Dec 16, 2010 at 19:38

Chris R

18k26 gold badges110 silver badges175 bronze badges

Add a comment |

Wolph · Accepted Answer · 2010-12-15 00:53:42Z

0

Since UTF-8 is backwards compatible with UTF-8, why use SQL_ASCII isntead of UTF8?

I think your encoding problems are probably more along the lines of latin1 or similar encodings. Not with ASCII to UTF8.

answered Dec 15, 2010 at 0:53

Wolph

80.3k12 gold badges142 silver badges152 bronze badges

What do you mean by that? The DB is already in SQL_ASCII, I don't have a lot of say in that.

Chris R
– Chris R

12/15/2010 17:49:59
Commented Dec 15, 2010 at 17:49
1

@Chris R: I mean that it doesn't matter if you specify SQL_ASCII or UTF8 for the source database since UTF8 is backwards compatible with ASCII.

Wolph
– Wolph

12/15/2010 22:25:17
Commented Dec 15, 2010 at 22:25

Add a comment |

Collectives™ on Stack Overflow

How can I copy data from a SQL_ASCII to a UTF8 postgresql db using SQLAlchemy?

Update details

2 Answers 2

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Update details

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Related