Change file encoding for PostgreSQL w/Perl

Question

I'm entering large amounts of data into a PostgreSQL database using Perl and the Perl DBI. I have been getting errors as my file is improperly encoded. I have the PostgreSQL encoding set to 'utf8' and used the debian 'file' command to determine that my file has "Non-ISO extended-ASCII text, with very long lines, with CRLF line terminators", and when I run my program the DBI fails due to an "invalid byte sequence". I already added a line in my Perl program to sub the '\r' carriage returns for nothing, but how can I convert my files to 'utf8' or get PostgreSQL to accept my file encoding. Thanks.

file won't reliably detect the encoding. You need to actually know what the text encoding is in order to convert it correctly. If you've really got no idea, try to use a tool that does more robust encoding detection than file. — Craig Ringer, Aug 25 '13 at 6:01

Moritz Bunkus · Answer 1 · 2013-08-25 07:48:21Z

When you connect to PostgreSQL using DBI->connect(..., { pg_enable_utf8 => 1}) then the data used in all modifying DBI calls (SQL INSERT, UPDATE, DELETE, everywhere you use placeholders in queries etc) has to be encoded in Perl's internal encoding so that DBI itself can convert to the wire protocol correctly.

There are tons of ways how you can achieve that, and they all depend on how you read the file in the first place. The most basic one is if you use open (or one of the methods based directly on it like IO::File->open). You can then use Perl's I/O layers (see the open link above) and let Perl do that for you. Assuming your file is encoded in UTF-8 already you'll get away with:

open(my $fh, "<:encoding(UTF-8)", "filename");
while (my $line = <$fh>) {
  # process query
}

This is basically equivalent to opening the file without an encoding layer and converting manually using Encode::decode, e.g. like this:

open(my $fh, "<", "filename");
while (my $line = <$fh>) {
  $line = Encode::decode('UTF-8', $line);
  # process query
}

A lot of other modules that receive data from external sources and return it (think of HTTP downloads with LWP, for example) return values that have already been converted into Perl's internal encoding.

So what you have to do is:

Figure out which encoding your file actually uses (try using iconv on the shell for that)
Tell DBI to enable UTF-8
Open the file with the correct encoding
Read line(s), process query, repeat

asked	3 years ago
viewed	572 times
active	3 years ago

Change file encoding for PostgreSQL w/Perl

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged perl postgresql encoding utf-8 or ask your own question.

Hot Network Questions

Change file encoding for PostgreSQL w/Perl

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged perl postgresql encoding utf-8 or ask your own question.

Related

Hot Network Questions