I'm loading data from files and some of the files have text that is ISO-8859 encoded mixed-in with text that is UTF-8 encoded. postgres stops and shows an error when the ISO-8859 encoded lines are encountered. For small files, I can fix or edit out the few bad lines, but for very large files (hundreds of MB) that doesn't scale. So here is the question:
Can postgres skip the ISO-8859 encoded lines and just load the UTF-8 ones?
Here is the command I use to load files:
COPY words (word, source_id) FROM '/path/to/file.txt' with delimiter E'\t';
Here is the error postgres shows when a bad line is encountered:
ERROR: invalid byte sequence for encoding "UTF8": 0xa2
CONTEXT: COPY words, line 2268
Thanks for any advice.
Edit: There are also other errors (beside encoding) that stop the load. For example, when a string is longer than a column permits, etc. I could write a program to loop through the file loading each line individually and logging errors, I was just curious if postgres had something to handle this built-in. I edited the title to be more generic.