137

I get the following error when inserting data from mysql into postgres.

Do I have to manually remove all null characters from my input data? Is there a way to get postgres to do this for me?

ERROR: invalid byte sequence for encoding "UTF8": 0x00

8 Answers 8

120

PostgreSQL doesn't support storing NULL (\0x00) characters in text fields (this is obviously different from the database NULL value, which is fully supported).

Source: http://www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE

If you need to store the NULL character, you must use a bytea field - which should store anything you want, but won't support text operations on it.

Given that PostgreSQL doesn't support it in text values, there's no good way to get it to remove it. You could import your data into bytea and later convert it to text using a special function (in perl or something, maybe?), but it's likely going to be easier to do that in preprocessing before you load it.

Sign up to request clarification or add additional context in comments.

4 Comments

Example: CREATE TABLE store_bytes ( key INTEGER NOT NULL, data bytea NOT NULL );
Is NULL code point the only one not supported? I couldn't find the part in the linked documentation that states either way
I think I've found it! "Quoted identifiers can contain any character, except the character with code zero."
I think this might be a better documentation reference: postgresql.org/docs/current/… . "Regardless of the specific character set, the character with code zero (sometimes called NUL) cannot be stored."
50

If you are using Java, you could just replace the x00 characters before the insert like following:

myValue.replaceAll("\u0000", "")

The solution was provided and explained by Csaba in following post:

https://www.postgresql.org/message-id/1171970019.3101.328.camel%40coppola.muc.ecircle.de

Respectively:

in Java you can actually have a "0x0" character in your string, and that's valid unicode. So that's translated to the character 0x0 in UTF8, which in turn is not accepted because the server uses null terminated strings... so the only way is to make sure your strings don't contain the character '\u0000'.

2 Comments

So, postgres is not UTF-8 compliant. Looks like I'll be up all night fixing a production issue because they lied.
it is better to use myValue.replace("\u0000", ""). It will do exactly the same, but it doesn't use regexp. It will be faster.
22

Just regex out null bytes:

s/\x00//g;

2 Comments

is empty string considered as a null byte? Won't replaceAll("s/\x00//g","") result in replacing them with other nulls?
Empty strings are not considered as null bytes. Null byte values are actual characters, but invisible.
2

You can first insert data into blob field and then copy to text field with the folloing function

CREATE OR REPLACE FUNCTION blob2text() RETURNS void AS $$
Declare
    ref record;
    i integer;
Begin
    FOR ref IN SELECT id, blob_field FROM table LOOP

          --  find 0x00 and replace with space    
      i := position(E'\\000'::bytea in ref.blob_field);
      WHILE i > 0 LOOP
        ref.bob_field := set_byte(ref.blob_field, i-1, 20);
        i := position(E'\\000'::bytea in ref.blobl_field);
      END LOOP

    UPDATE table SET field = encode(ref.blob_field, 'escape') WHERE id = ref.id;
    END LOOP;

End; $$ LANGUAGE plpgsql; 

--

SELECT blob2text();

Comments

2

Only this regex worked for me:

sed 's/\\0//g'

So as you get your data do this: $ get_data | sed 's/\\0//g' which will output your data without 0x00

Comments

2

If you need to store null characters in text fields and don't want to change your data type other than text then you can follow my solution too:

Before insert:

myValue = myValue.replaceAll("\u0000", "SomeVerySpecialText")

After select:

myValue = myValue.replaceAll("SomeVerySpecialText","\u0000")

I've used "null" as my SomeVerySpecialText which I am sure that there will be no any "null" string in my values at all.

Comments

0

If anyone is arriving here looking how to remove the 0x00 in Python:

new_row = row.replace("\x00", "")

Comments

0

if anyone arriving here facing error in the spring boot application while performing Jpa operation engine.jdbc.spi.SqlExceptionHelper : ERROR: invalid byte sequence for encoding "UTF8": 0x00 Where: unnamed portal parameter $12 ..

If this error occurs check the table column exist in the entity . At least if any one of the column not present in the entity means it throw the error like this .. or apply the column value default null in the table

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.