Join the Stack Overflow Community
Stack Overflow is a community of 6.7 million programmers, just like you, helping each other.
Join them; it only takes a minute:
Sign up

I'm entering large amounts of data into a PostgreSQL database using Perl and the Perl DBI. I have been getting errors as my file is improperly encoded. I have the PostgreSQL encoding set to 'utf8' and used the debian 'file' command to determine that my file has "Non-ISO extended-ASCII text, with very long lines, with CRLF line terminators", and when I run my program the DBI fails due to an "invalid byte sequence". I already added a line in my Perl program to sub the '\r' carriage returns for nothing, but how can I convert my files to 'utf8' or get PostgreSQL to accept my file encoding. Thanks.

share|improve this question
    
file won't reliably detect the encoding. You need to actually know what the text encoding is in order to convert it correctly. If you've really got no idea, try to use a tool that does more robust encoding detection than file. – Craig Ringer Aug 25 '13 at 6:01

When you connect to PostgreSQL using DBI->connect(..., { pg_enable_utf8 => 1}) then the data used in all modifying DBI calls (SQL INSERT, UPDATE, DELETE, everywhere you use placeholders in queries etc) has to be encoded in Perl's internal encoding so that DBI itself can convert to the wire protocol correctly.

There are tons of ways how you can achieve that, and they all depend on how you read the file in the first place. The most basic one is if you use open (or one of the methods based directly on it like IO::File->open). You can then use Perl's I/O layers (see the open link above) and let Perl do that for you. Assuming your file is encoded in UTF-8 already you'll get away with:

open(my $fh, "<:encoding(UTF-8)", "filename");
while (my $line = <$fh>) {
  # process query
}

This is basically equivalent to opening the file without an encoding layer and converting manually using Encode::decode, e.g. like this:

open(my $fh, "<", "filename");
while (my $line = <$fh>) {
  $line = Encode::decode('UTF-8', $line);
  # process query
}

A lot of other modules that receive data from external sources and return it (think of HTTP downloads with LWP, for example) return values that have already been converted into Perl's internal encoding.

So what you have to do is:

  • Figure out which encoding your file actually uses (try using iconv on the shell for that)
  • Tell DBI to enable UTF-8
  • Open the file with the correct encoding
  • Read line(s), process query, repeat
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.