I'm entering large amounts of data into a PostgreSQL database using Perl and the Perl DBI. I have been getting errors as my file is improperly encoded. I have the PostgreSQL encoding set to 'utf8' and used the debian 'file' command to determine that my file has "Non-ISO extended-ASCII text, with very long lines, with CRLF line terminators", and when I run my program the DBI fails due to an "invalid byte sequence". I already added a line in my Perl program to sub the '\r' carriage returns for nothing, but how can I convert my files to 'utf8' or get PostgreSQL to accept my file encoding. Thanks.
Join them; it only takes a minute:
When you connect to PostgreSQL using There are tons of ways how you can achieve that, and they all depend on how you read the file in the first place. The most basic one is if you use
This is basically equivalent to opening the file without an encoding layer and converting manually using
A lot of other modules that receive data from external sources and return it (think of HTTP downloads with So what you have to do is:
|
|||
|
file
won't reliably detect the encoding. You need to actually know what the text encoding is in order to convert it correctly. If you've really got no idea, try to use a tool that does more robust encoding detection thanfile
. – Craig Ringer Aug 25 '13 at 6:01