I'm running a Django 1.4.2/Python 2.7.3/MySQL 5.5.28 site. One of the features of the site is that the admin can send an email to the server which calls a Python script via procmail that parses the email and tosses it into the DB. I maintain two versions of the site - a development and a production site. Both sites use different but identical vitualenvs (I even deleted them both and reinstalled all packages just to make sure).
I'm experiencing a weird issue. The exact same script succeeds on the dev server and fails on the production server. It fails with this error:
...django/db/backends/mysql/base.py:114: Warning: Incorrect string value: '\x92t kno...' for column 'message' at row 1
I'm well aware of the unicode issues Django has, and I know there are a ton of questions here on SO about this error, but I made sure to setup the database as UTF-8 from the beginning:
mysql> show variables like "character_set_database";
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| character_set_database | utf8 |
+------------------------+-------+
1 row in set (0.00 sec)
mysql> show variables like "collation_database";
+--------------------+-----------------+
| Variable_name | Value |
+--------------------+-----------------+
| collation_database | utf8_general_ci |
+--------------------+-----------------+
1 row in set (0.00 sec)
Additionally, I know that each column can have its own charset, but the message
column is indeed UTF-8:
mysql> show full columns in listserv_post;
+------------+--------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+------------+--------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
| id | int(11) | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | |
| thread_id | int(11) | NULL | NO | MUL | NULL | | select,insert,update,references | |
| timestamp | datetime | NULL | NO | | NULL | | select,insert,update,references | |
| from_name | varchar(100) | utf8_general_ci | NO | | NULL | | select,insert,update,references | |
| from_email | varchar(75) | utf8_general_ci | NO | | NULL | | select,insert,update,references | |
| message | longtext | utf8_general_ci | NO | | NULL | | select,insert,update,references | |
+------------+--------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
6 rows in set (0.00 sec)
Does anyone have any idea why I'm getting this error? Why is it happening under the production config but not the dev config?
Thanks!
[edit 1]
To be clear, the data are the same as well. I send a single email to the server, and procmail sends it off. This is what the .procmailrc looks like:
VERBOSE=off
:0
{
:0c
| <path>/dev/ein/scripts/process_new_mail.py dev > outputdev
:0
| <path>/prd/ein/scripts/process_new_mail.py prd > outputprd
}
There are 2 copies of process_new_mail.py, but that's just because it's version controlled so that I can maintain two separate environments. If I diff the two output files (which contain the message received), they're identical.
[edit 2]
I actually just discovered that both dev and prd configs are failing. The difference is that the dev config fails silently (maybe having to do with the DEBUG
setting?). The problem is that there are some unicode characters in one of the messages, and Django is choking on them for some reason. I'm making progress....
I've tried editing the code to explicitly encode the message as ASCII and UTF-8, but it's still not working. I'm getting closer, though.
dev
environment access theprd
database and vice-versa, and see which one fails. – mgibsonbr Nov 20 '12 at 0:03charset
anduse_unicode
will make a difference. – Pedro Romano Nov 20 '12 at 0:13