I'm having a problem with encoding in my application and didn't find the solution anywhere on web.

Here is the scenario:

  • PostgreSQL with UTF-8 encoding (CREATE DATABASE xxxx WITH ENCODING 'UTF8')

  • Python logic also with UTF-8 encoding (# -*- coding: utf-8 -*-)

  • Jinja2 to show my HTML pages. Python and Jinja2 are used on Flask, which is the microframework I'm using.

The header of my pages have: <meta http-equiv="content-type" content="text/html; charset=utf-8"/>

Well, using psycopg2 to do a simple query and print it on Jinja2, this is what I get:

{% for company in list %}
    <li>
        {{ company }}
    </li>
{% endfor %}

(1, 'Casa das M\xc3\xa1quinas', 'R. Tr\xc3\xaas, Mineiros - Goi\xc3\xa1s')

(2, 'Ar do Z\xc3\xa9', 'Av. S\xc3\xa9tima, Mineiros - Goi\xc3\xa1s')

If I try do get more deep into the fields:

{% for company in list %}
    <li>
        {% for field in company %}
            <li>
                {{ field }}
            </li>
        {% endfor %}
    </li>
 {% endfor %}

I get the following error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)

However, if I do a print of the list fields before sending them to Jinja2, I get the expected result (which is also how is presented in postgresql):

1 Casa das Máquinas R. Três, Mineiros - Goiás

2 Ar do Zé Av. Sétima, Mineiros - Goiás

When I get the error, Flask offers an option to "debug". This is where the code breaks File "/home/anonimou/Desktop/flask/lib/python2.7/site-packages/jinja2/_markupsafe/_native.py", line 21, in escape return Markup(unicode(s)

And I can also do:

[console ready]

>>> print s
Casa das Máquinas

>>> s
'Casa das M\xc3\xa1quinas'

>>> unicode(s)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)

>>> s.decode('utf-8')
u'Casa das M\xe1quinas'

>>> s.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)

>>> s.decode('utf-8').encode('utf-8')
'Casa das M\xc3\xa1quinas'

>>> print s.decode('utf-8').encode('utf-8')
Casa das Máquinas

>>> print s.decode('utf-8')
Casa das Máquinas

I've already tried to break the list, decode, encode, in python code before sending it to Jinja2. The same error.

Sooo, not sure what I can do here. =(

Thanks in advance!

share|improve this question

1 Answer

up vote 2 down vote accepted

The issue is that psycopg2 returns byte strings by default in Python 2:

When reading data from the database, in Python 2 the strings returned are usually 8 bit str objects encoded in the database client encoding

So you can either:

  • Manually decode all of the data to UTF-8:

    # Decode the byte strings into Unicode objects using
    # the encoding you know that your database is using.
    companies = [company.decode("utf-8") for company in companies]
    return render_template("companies.html", companies=companies)
    

or

  • Set the encoders when you first import psycopg2 as per the note in the same section of the manual:

    Note In Python 2, if you want to uniformly receive all your database input in Unicode, you can register the related typecasters globally as soon as Psycopg is imported:

    import psycopg2
    import psycopg2.extensions
    psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
    psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)
    

    and then forget about this story.

share|improve this answer
Thank you so much! I used the 2nd alternative, which is less "code invasive" for me. Worked fine. – anonimou 2 days ago

Your Answer

 
or
required, but never shown
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.