gh-ost aborts on trying to UPDATE, errors on invalid character #290

mizioumt · 2016-10-26T16:04:30Z

v1.0.21

gh-ost with --test-on-replica failed like this:

FATAL Error 1366: Incorrect string value: '\xE1...' for column 'column_name' at row 1; query=

column_name is a latin1 VARCHAR and the value coming in the UPDATE statement contained this problematic character:

á

So this may be related to the safe SQL mode settings.

The binlogs showed this:

first, a row with the relevant id is INSERTed into the ghost table. At the moment, the data contains no \xE1 character.
eventually, there is incoming UPDATE for the original table that introduces the \xE1 character. gh-ost refuses to handle it and aborts

mizioumt · 2016-10-26T16:09:28Z

I attempted to gh-ost ALTER a one row table with \xE1 already in the original table. This finished fine. So this sort of character is properly handled by gh-ost when it is INSERTing rows as part of table to table copy. But when the character comes from a binlog UPDATE it is not handled properly.

So it may be possible to construct a simple testcase over a small table when a problematic character is fed through the binlog. I need to learn how stop gh-ost at exactly the right moment. There will likely be no time for interactive command if the table is small.

shlomi-noach · 2016-10-26T17:48:59Z

@mizioumt thank you.

This is strange, because this bug was fixed in 1.0.21, or at least so I thought. See #236 and #227.

gh-ost supports localtests, which emulate the exact behavior you'd expect. Let me see if updates are properly covered.

shlomi-noach · 2016-10-26T17:51:51Z

please share show create table and the gh-ost invocation you attempted.
If table description is too long, please just share the essentials.

shlomi-noach · 2016-10-26T18:04:12Z

See #291

mizioumt · 2016-10-27T16:01:41Z

the table description is too long and I can't guess what stuff is more essential than whichever other stuff.
So here's the entire thing:

*************************** 1. row ***************************
       Table: Table_name
Create Table: CREATE TABLE `Table_name` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `another_column` bigint(20) unsigned NOT NULL DEFAULT '0',
  `column_1` int(10) unsigned NOT NULL DEFAULT '0',
  `column_2` date NOT NULL DEFAULT '0000-00-00',
  `column_3` date NOT NULL DEFAULT '0000-00-00',
  `column_4` int(10) unsigned DEFAULT NULL,
  `column_5` int(10) unsigned DEFAULT NULL,
  `column_6` varchar(80) CHARACTER SET utf8mb4 DEFAULT NULL,
  `column_7` smallint(5) unsigned NOT NULL DEFAULT '0',
  `column_8` double NOT NULL DEFAULT '0',
  `column_9` varchar(1024) DEFAULT NULL,
  `column_10` varchar(1024) DEFAULT NULL,
  `column_11` varchar(1024) DEFAULT NULL,
  `column_12` enum(value_list) NOT NULL DEFAULT 'a_string',
  `column_13` char(3) NOT NULL DEFAULT '',
  `last_change` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `column_14` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `column_15` varchar(255) DEFAULT NULL,
  `column_16` float NOT NULL DEFAULT '0',
  `column_17` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `column_18` int(11) DEFAULT NULL,
  `column_19` int(11) NOT NULL DEFAULT '0',
  `column_20` double DEFAULT NULL,
  `column_21` double DEFAULT NULL,
  `column_22` date DEFAULT NULL,
  `column_23` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `column_24` varchar(40) DEFAULT NULL,
  `column_25` char(1) DEFAULT NULL,
  `COLUMN_THAT_FAILED_TO_BE_UPDATED` text,
  `column_26` enum(value_list) DEFAULT NULL,
  `column_27` text CHARACTER SET utf8mb4,
  `column_28` text CHARACTER SET utf8mb4,
  `column_29` tinyint(4) NOT NULL DEFAULT '1',
  `column_30` double NOT NULL DEFAULT '0',
  `column_31` varchar(512) DEFAULT NULL,
  `column_32` varchar(255) DEFAULT NULL,
  `column_33` tinyint(4) NOT NULL DEFAULT '1',
  `column_34` int(11) DEFAULT NULL,
  `column_35` tinyint(3) unsigned DEFAULT NULL,
  `column_36` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `column_37` bigint(20) unsigned DEFAULT NULL,
  `column_38` varchar(8) DEFAULT NULL,
  `column_39` bigint(20) unsigned NOT NULL DEFAULT '0',
  `column_40` int(10) unsigned NOT NULL DEFAULT '0',
  `column_41` smallint(5) unsigned NOT NULL DEFAULT '0',
  `column_42` bigint(20) unsigned NOT NULL DEFAULT '0' COMMENT 'points_to_elsewhere',
  `column_43` bigint(20) unsigned NOT NULL DEFAULT '0' COMMENT 'points_to_elsewhere',
  `column_44` bigint(20) unsigned NOT NULL DEFAULT '0' COMMENT 'points_to_elsewhere',
  PRIMARY KEY (`id`),
  KEY `column_3` (`column_3`,`column_2`),
  KEY `column_12` (`column_12`),
  KEY `another_column` (`another_column`),
  KEY `column_29` (`column_29`),
  KEY `column_14` (`column_14`),
  KEY `column_1` (`column_1`),
  KEY `column_2` (`column_2`,`column_3`),
  KEY `column_39` (`column_39`),
  KEY `last_change` (`last_change`),
  KEY `column_40_2` (`column_40`,`last_change`),
  KEY `column_42` (`column_42`),
  KEY `column_40` (`column_40`,`column_2`)
) ENGINE=InnoDB AUTO_INCREMENT=N DEFAULT CHARSET=latin1

and gh-ost was invoked as follows:

 ./gh-ost-20161013142859 -conf ~/.my_gh-ost.cnf -database database_name -table Table_name 
-alter " modify column_1 bigint unsigned NOT NULL DEFAULT '0', drop column column_6, 
add column x tinyint not null default 1 , drop key column_2, drop key column_39,
 add key k(column_39,column_1), add key l(column_3,column_12,another_column,column_2)" 
-test-on-replica -host `hostname` -initially-drop-ghost-table -initially-drop-socket-file 
-serve-socket-file ~/gh-ost.hr.socket -assume-rbr --default-retries 1 --debug 
--verbose --stack -execute

shlomi-noach · 2016-10-27T17:12:30Z

Not saying your alter isn't valid, but OMG.

I'm curious whether this is a production DDL you must perform, or is this a random collection of modifications, or maybe this is the accumulation of migration requests over months, or...?

It is very difficult to debug with such complexity. Again, this is valid and on my side I should try and find the problem. But in your side, you're experimenting with gh-ost, don't you feel it would be easier to try and isolate those migrations? Throwing out everything at once turns this hunt into needle in a haystack.

mizioumt · 2016-10-28T11:11:58Z

the table structure is real but the ALTER is not. I'm just testing gh-ost.
I think this character bug will happen for any alter, even for a noop one as long as the right UPDATE comes through the binlog which is not a gimme. But I may be wrong. I'll try to reproduce it with something simple next week.

shlomi-noach · 2016-10-28T12:49:21Z

Is that the same setup where you don't have log_slave_updates, as in #287 (comment) ?

mizioumt · 2016-10-28T13:53:01Z

no, on this one I always had log slave updates. In particular, when I browsed through the binlogs for the last reoccurrence I saw record for the offending id for both the original and ghost tables.
FYI the sequence was:

INSERT into dbname._TableName_gho
UPDATE dbname.TableName -> that had the problematic character, gh-ost died
more of UPDATE dbname.TableName but no more on the gh-ost table because gh-osd had died

mizioumt · 2016-10-28T14:23:42Z

replicationwise this is a sibling of the host from #287 however on this one MySQL version is lower

 select @@version,@@log_slave_updates;
+------------+---------------------+
| @@version  | @@log_slave_updates |
+------------+---------------------+
| 5.6.33-log |                   1 |
+------------+---------------------+

I'm trying to ensure gh-ost works on 5.6 and 5.7, also in different TZ

jeisen · 2016-10-31T22:14:15Z

I believe we've just run into the same issue while trying to run a migration on a large latin1-encoded table that contains UTF8 characters. The plan has been to use gh-ost to run migrations which will convert our tables so that they are stored and labeled correctly as UTF8. However, until we reach that state, we have many rows containing UTF8 data stored in latin1-labeled columns. That data is perfectly readable over a latin1 connection, but would become incorrectly re-converted under a UTF8 connection, rendering garbage characters.

The problem we're facing is that during a migration run ("MODIFY mycol TINYINT(1) DEFAULT '0'"), gh-ost reached one of those rows containing UTF8 data, and we observed the following:

gh-ost correctly copied the original row to _mytable_gho
The app updated the row, changing two non-character columns
gh-ost picked up the change over the replication log and attempted to rewrite the entire row, UTF8 data included, and hit an encoding error:

`2016-10-31 19:20:28 ERROR Error 1366: Incorrect string value: '\xEF\xBF\xBD\xC3\xA0\xC2...' for column 'mycol' at row 1; query=                        update /* gh-ost `mydb`.`_mytable_gho` */                                        `mydb`.`_mytable_gho`
                                set
                                        `id`=?, <columns = ?>..., `mycol`=?                                where
                                        ((`id` = ?))
                ; args=[64938571<...> à¸ªà¸³à¸«à¸£à¸±à¸š "à¸šà¸²à¸‡à¸„à¸™" à¹€à¸�à¸´à¸”à¸¡à¸²à¹€à¸žà
¸·à¹ˆà¸à¹ƒà¸«à¹‰à¹€à¸£à¸²à¸£à¸±à¸�]

The garbage text above is how it appears rendered under a UTF8 connection; under latin1 is it:
สำหรับ "บางคน" เกิดมาเพื่อให้เรารัก

The hex value is:
E0B8AAE0B8B3E0B8ABE0B8A3E0B8B1E0B89A2022E0B89AE0B8B2E0B887E0B884E0B8992220E0B980E0B881E0B8B4E0B894E0B8A1E0B8B2E0B980E0B89EE0B8B7E0B988E0B8ADE0B983E0B8ABE0B989E0B980E0B8A3E0B0B8B1E0B881

After that error, gh-ost kept repeating the following:

2016-10-31 19:21:20 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction; query=                        update /* gh-ost `mydb`.`_mytable_gho` */                                        `mydb`.`_mytable_gho` ... <same update as above>

Eventually, I had to kill the migration.

Here is the command I used:

gh-ost --max-load=Threads_running=7 --critical-load=Threads_running=25 --chunk-size=1000 --max-lag-millis=2000 --verbose --assume-rbr --cut-over=default --ok-to-drop-table --exact-rowcount --concurrent-rowcount --default-retries=120 --panic-flag-file=/tmp/ghost.panic.flag --postpone-cut-over-flag-file=/tmp/ghost.postpone.flag --serve-socket-file=/tmp/gh-ost.sock --initially-drop-ghost-table --throttle-control-replicas="<slave1>,<slave2>" --user=<user> --password=<pass> --host=<slave1> --database=<mydb> --table=<mytable> --alter="MODIFY <mycol> tinyint(1) DEFAULT '0'" --execute

It's also worth noting that the database and server default charset is utf8, but all apps connect to it as latin1 at this time.

shlomi-noach · 2016-11-01T13:52:28Z

@jeisen thank you for the detailed comment. I'm going to try and reproduce this; however I'm not too certain I understand the meaning of having utf8 characters in a latin1 charset text. Are you able to provide me with a single utf8 character that you store in latin1 that I can experiment with?

shlomi-noach · 2016-11-01T14:06:51Z

@jeisen I'm able to reproduce this now. I'll look into it; right now I'm unsure how to correctly solve generically.

Reproduced in #303

jeisen · 2016-11-01T16:17:40Z

To give a little more context, the database is entirely latin1 for historical reasons that no longer apply. The application connects to the DB using a latin1 connection, but receives UTF8 data as user input. The app has no idea the data is in a different character set, and just sees it as a sequence of bytes, so it writes those bytes out to the DB in "latin1", because MySQL doesn't check to see if the characters are valid when in latin1. So the HEX() value of this data in latin1 would be identical to if it were written over a UTF8 connection to a UTF8-labeled column.

mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)
mysql> select mycol from mytable where id=1\G
*************************** 1. row ***************************
mycol: สำหรับ "บางคน" เกิดมาเพื่อให้เรารัก
1 row in set (0.00 sec)

The main problem is that if you were to connect over a UTF8 connection, MySQL will attempt to convert the "latin1" data. Instead of returning the correct string of bytes, it would send a series of garbage characters that are the result of running them through the UTF8 conversion.

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)
mysql> select mycol, from mytable where id=1\G
*************************** 1. row ***************************
mycol: à¸ªà¸³à¸«à¸£à¸±à¸š "à¸šà¸²à¸‡à¸„à¸™" à¹€à¸à¸´à¸”à¸¡à¸²à¹€à¸žà¸·à¹ˆà¸à¹ƒà¸«à¹‰à¹€à¸£à¸²à¸£à¸±à¸
1 row in set (0.00 sec)

You can see the severity of the problem by looking at the HEX() values before and after running it through CONVERT():

mysql> select hex(mycol), hex(convert(mycol using utf8)) from mytable where id=1\G
*************************** 1. row ***************************
                    hex(mycol): E0B8AAE0B8B3E0B8ABE0B8A3E0B8B1E0B89A2022E0B89AE0B8B2E0B887E0B884E0B8992220E0B980E0B881E0B8B4E0B894E0B8A1E0B8B2E0B980E0B89EE0B8B7E0B988E0B8ADE0B983E0B8ABE0B989E0B980E0B8A3E0B8B2E0B8A3E0B8B1E0B881
hex(convert(mycol using utf8)): C3A0C2B8C2AAC3A0C2B8C2B3C3A0C2B8C2ABC3A0C2B8C2A3C3A0C2B8C2B1C3A0C2B8C5A12022C3A0C2B8C5A1C3A0C2B8C2B2C3A0C2B8E280A1C3A0C2B8E2809EC3A0C2B8E284A22220C3A0C2B9E282ACC3A0C2B8C281C3A0C2B8C2B4C3A0C2B8E2809DC3A0C2B8C2A1C3A0C2B8C2B2C3A0C2B9E282ACC3A0C2B8C5BEC3A0C2B8C2B7C3A0C2B9CB86C3A0C2B8C2ADC3A0C2B9C692C3A0C2B8C2ABC3A0C2B9E280B0C3A0C2B9E282ACC3A0C2B8C2A3C3A0C2B8C2B2C3A0C2B8C2A3C3A0C2B8C2B1C3A0C2B8C281
1 row in set (0.00 sec)

Worse, if you try to directly run an ALTER changing the charset of the column, it would automatically run that CONVERT() step, permanently corrupting the data. The solution is that we need to force MySQL to forget its charset label by temporarily changing the VARCHAR into a VARBINARY. Once that's done, the app can start connecting via a UTF8 connection and the data will still be in the correct encoding. Then, we can do a second migration to change the VARBINARY back into a VARCHAR, this time with the right label, and we're good moving forward.

ggunson · 2016-11-01T17:07:32Z

Note to self: We will likely want to add a note about the outcome of this to the docs (whether double encoding is a caveat, or requires a special flag, or if there's a feature to fix it).

I'm not sure how common double encoding still is, but it certainly was a problem years ago.

shlomi-noach · 2016-11-02T09:38:43Z

@mizioumt can you confirm your case is similar to @jeisen's? i.e. you have utf8 characters in your latin1 charset column?
If not, these are two different issues.

w/r utf8 in a latin1 charset, I'm unsure yet what the correct handling should be. How can I know in advance what's written in the column? An option might be letting the user specify a --default-charset=latin1 flag or similar. Another option might be to attempt to copy texts as binaries, in the event a column's charset is identical before/after migration.

shlomi-noach · 2016-11-02T10:52:29Z

I have the code to be able to write back the binary value of a text column in the event it never changed. If succeeds for utf8 column, succeeds for utf8 characters in latin1 columns, and fails on pure-latin1 columns.

Here's the problem: I cannot identify those cases beforehand. Even if the user submits some flag, there may still be different columns in the migrated table, which behave differently. So "fix" to one would break another. I would certainly not want the user to specify per-column configs such as --oh-and-turn-this-column-to-utf8-latin1=name_of_column.

By the time I read the column value it has already been converted to utf8 in the binary log (binlog events always write strings in utf8).

Unsure how this should be tackled.

mizioumt · 2016-11-02T13:18:29Z

I think my problem is kind of independent from problem reported by @jeisen. My problem is that gh-ost refuses to handle at least one character that does belong to latin1:

E1  á  00E1    LATIN SMALL LETTER A WITH ACUTE

based on http://dev.mysql.com/doc/refman/5.7/en/charset-we-sets.html and http://www.cp1252.com/
and indeed it's a part of a common name (not English, of course) which contains that

So I simply want this character to stay the same. The UTF considerations I think are beyond the scope of gh-ost I think. If the column is latin1 and we do not ask gh-ost to change the charset on it, then it should stay binary identical to the original. If we do change the charset and/or how to deal with a situation that the latin1 looking characters are part of UTF sequences then I could express an opinion. However in my case the column stays latin1 so UTF does not really play a role.

shlomi-noach · 2016-11-02T14:58:33Z

@mizioumt then I will need you to be able to reproduce this. Please consider existing tests that confirm gh-ost does indeed support latin1. There could be an edge case that I don't understand, but then again, I cannot reproduce the problem on your side.

jeisen · 2016-11-02T16:42:40Z

@shlomi-noach If I were to strip away any unicode beforehand and leave only "pure" latin1 in the VARCHARs, would you consider gh-ost still safe to use on an ALTER table CHARACTER SET utf8 COLLATE utf8_unicode_ci (when the table starts as latin1)?

shlomi-noach · 2016-11-02T19:32:15Z

would you consider gh-ost still safe to use on an ALTER table CHARACTER SET utf8 COLLATE utf8_unicode_ci (when the table starts as latin1)?

Yes, this conversion is supported and tested. I strongly urge you to --test-on-replica (I don't think you have done so in the above) and feel safe about this operation before running this on master.

shlomi-noach · 2016-11-03T09:16:51Z

Unless I get good insight on how to reliably detect latin1 columns that happen to have utf8 characters inside, as opposed to latin1 columns that do not have utf8 characters inside, I'm going to put the utf8-in-latin1 issue aside, as it is currently not in our focus.

If anyone else want to pick this up, there's initial code to check/support this scenario, in the form of "not converting when reading and not converting when writing", which breaks "normal" latin1. See some messy experiments in #303

mizioumt · 2016-11-04T09:53:00Z

reproduces with 1.0.28, will try to find time to create a simple testcase

mizioumt · 2016-11-11T09:25:21Z

looks like to get a simple testcase is not as easy as I thought...

shlomi-noach · 2016-11-11T10:10:20Z

@mizioumt let's take a step back, then.

The particular character you mention, á, is explicitly tested in https://github.com/github/gh-ost/blob/master/localtests/latin1/create.sql

Do you know what the character set is for the connection that generates the UPDATE statement? Is it utf8?
Is it a perl client? Other?

Answering a previous question of yours:

So I simply want this character to stay the same. The UTF considerations I think are beyond the scope of gh-ost I think. If the column is latin1 and we do not ask gh-ost to change the charset on it, then it should stay binary identical to the original. If we do change the charset and/or how to deal with a situation that the latin1 looking characters are part of UTF sequences then I could express an opinion. However in my case the column stays latin1 so UTF does not really play a role.

As per a previous comment I made, utf8 does play a role, because strings are always utf8 in the binary log. So whether you and I like it or not, your text is converted to utf8 in the binlogs, from where gh-ost reads, and converts back to latin1.

Lastly, can you confirm once again table definition on master & replica is identical? Perhaps a character set is changed between the two?

mizioumt · 2016-11-11T12:13:50Z

we do SET NAMES utf8mb4 upon connection
the table structures are identical, the tables have DEFAULT CHARSET=latin1, the column is defined as simply 'text'

mizioumt · 2016-11-17T13:17:55Z

I found out a little more about the application:

column T1.c1 is utf8mb4, column T2.c2 is latin1
a value is inserted into T1.c1 that contains the offending character such as a with acute accent, or similar
this value is copied to T2.c2. This is done in such a way as to trigger an SQL warning on 'invalid character'. The result is still a good latin1
It is T2 that I'm trying to ALTER with gh-ost and fail.

In my attempts to introduce I was unable to trigger that warning and I think this is why I was unable to reproduce the issue in a simple way so far. So as soon as I manage to get that warning I must be able to reproduce this. A direct insert in mysql client in to a latin1 column of a value such as 'EL ESPAÑOL' does not trigger the warning.

So the warningless and warningful assignment must be replicated differently somehow

shlomi-noach · 2016-11-17T13:34:38Z

@mizioumt allow me to break down and reorder your own comment, to confirm I understand it correctly:

You are running gh-ost on table T2, which has column c2 ... CHARACTER SET latin1
gh-ost exits with error when you try to copy data from T1 to T2, where the data comes from a utf8mb4 column, using a direct SQL statement such as:

UPDATE T1, T2 SET T2.c2 = T1.c1 WHERE <something>

Did I make this right?

mizioumt · 2016-11-18T13:45:43Z

well on the second point I do not know how that copy is implemented exactly but logically this is correct. This chain has ROW replication throughout so it's difficult to get the exact SQL involved. But not impossible, I just need to find time to use sources other than binlogs.
I'm a bit at a loss as to how this error/warning can show up upon a latin1 column, though, if latin1 covers all of 0-255 single characters.

shlomi-noach · 2016-11-18T13:56:29Z

@mizioumt if the scenario is roughly as I describe, then I think it is similar to the case presented by @jeisen. You are copying utf8mb4 values onto latin1 and I suspect there's double encoding there.

I will potentially not fix this; but let me see how easy for me it is to build a test case for this, next week,

mizioumt · 2016-11-18T14:25:29Z

my problem is that there is not double encoding in my case, that is the latin1 value stays at say ''EL ESPAÑOL' in latin1 sense. I checked it with SELECT HEX(whatever), the 'Ñ' is internally one byte.
This is another thing I can't understand, it is possible to trigger this warning but the data would end up mangled while it is not mangled in this case.
This is similar to entirely unrelated issue which I recently saw in an application: C is VARBINARY and the app would try to do ... WHERE C = ?. The ? was passed as a utf8mb4 string apparently, in any case MySQL was tricked into evaluating VARBINARY = utf8mb4 by casting VARBINARY to utf8mb4 and not the other way around. This would trigger that same warning.

mizioumt · 2016-11-18T15:42:57Z

I noticed that modern gh-ost dumps the offending value as [n1 n2 ...]. This is what it dumped on the last occurrence:

.... 71 252 110 ...

then complained about 252, u-umlaut. This does not look like double encoding to me.

mizioumt · 2016-11-21T11:59:09Z

I'm yet to build a testacase but I think I have an idea of what the problem is. gh-ost uses the replication and it fails to replicate exactly. If the binlog sets SQL_MODE then gh-ost should follow that. So in this case enforcing STRICT_ALL_TABLES on top of that may result in trouble.
So if I'm right the probem can be redefined as follows: gh-ost fails to follow MySQL replication.

shlomi-noach · 2016-11-21T12:31:05Z

gh-ost fails to follow MySQL replication.

I'm not sure what this means.

As per STRICT_ALL_TABLES: this is a protection mechanism that makes sure gh-ost doesn't do the wrong thing. This has proven useful in managing character sets, bigints and more.
gh-ost will keep on using strict mode. The problem we get right now is that gh-ost attempts to write a character that is invalid, and this should be either fixed or not-fixed, but never ignored.

mizioumt · 2016-11-28T10:44:08Z

there is a mirror hypothesis: MySQL fails to replicate some variables that should be replicated and this tricks gh-ost to being too strict.
If I sound confused it is because I am. Still I think I have something that I hope will help me make a testcase tonight.

mizioumt · 2016-11-28T15:37:31Z

no luck ...

ffdlx · 2017-01-10T09:32:06Z

About the character problem, i have an idea.

We can encode any string to hex string.

For example

root@test 04:46:36>show create table test_char\G
*************************** 1. row ***************************
       Table: test_char
Create Table: CREATE TABLE `test_char` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `content` varchar(32) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=gbk
1 row in set (0.00 sec)

root@test 04:46:43>select * from test_char;
Empty set (0.00 sec)

root@test 04:47:03>insert into test_char(content) value('你好');
Query OK, 1 row affected (0.00 sec)

root@test 04:47:09>select * from test_char;
+----+---------+
| id | content |
+----+---------+
|  2 | 你好    |
+----+---------+
1 row in set (0.00 sec)

root@test 04:47:10>insert into test_char(content) value(x'C4E3BAC3');
Query OK, 1 row affected (0.00 sec)

root@test 04:47:24>select * from test_char;
+----+---------+
| id | content |
+----+---------+
|  2 | 你好    |
|  3 | 你好    |
+----+---------+
2 rows in set (0.00 sec)

In the example,
SQL "insert into test_char(content) value('你好');"
is equal to
SQL "insert into test_char(content) value(x'C4E3BAC3');"

In my practice, the idea seems to be worked. I am glad to share it.

Thank you

MrMauricioLeite · 2017-07-19T17:19:37Z

I may have hit the same error.
Here is the command on my output:

replace /* gh-ost `zeropaper`.`_expenses_gho` */ into
				`zeropaper`.`_expenses_gho`
					(`id`, `user_account_id`, `suplier`, `price`, `duedate`, `paydate`, `comment`, `expense_category_id`, `processed`, `scan`, `created_at`, `document_id`, `expense_sub_category_id`, `payed`, `instalment_id`, `split_id`, `split`, `expense_part_id`, `instalment_type`, `instalment_number`, `instalment_total`, `updated_at`, `ofx_name`, `ofx_id`, `user_account_cost_center_id`, `expense_type`, `transfer_id`, `expense_payment_type_id`, `expense_payment_type_card_acquire_id`, `document_number`)
				values
					(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
		; args=[393798300 1579084 025 - EMISSÃO_GABRIELA JESUS 338.9 <nil> 2017-07-17 03:00:00 [68 101 115 116 105 110 111 58 32 67 71 72 47 83 68 85 10 68 97 116 97 58 32 49 56 32 74 85 76 10 67 243 100 105 103 111 32 100 101 32 82 101 115 101 114 118 97 58 32 85 82 70 50 83 89 10 69 109 105 115 115 111 114 58 32 65 69 82 79 80 10 70 111 114 109 97 32 100 101 32 80 97 103 97 109 101 110 116 111 58 32 70 65 84 85 82 65 68 65 32 10 69 45 84 105 99 107 101 116 58 32 50 52 55 32 50 52 51 49 52 51 54 50 49 51 10 84 97 120 97 32 100 101 32 69 109 98 97 114 113 117 101 58 32 82 36 32 50 57 44 57 48] 5 1 0 2017-07-19 17:12:40 <nil> <nil> 0 <nil> <nil> 0 20726871 <nil> <nil> <nil> 2017-07-19 17:12:40 <nil> <nil> 1740932 S <nil> <nil> <nil> <nil>]

this is the create statement for the original and ghost table

CREATE TABLE `expenses` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `user_account_id` int(11) NOT NULL,
  `suplier` varchar(150) DEFAULT NULL,
  `price` decimal(10,2) DEFAULT NULL,
  `duedate` datetime DEFAULT NULL,
  `paydate` datetime DEFAULT NULL,
  `comment` text,
  `expense_category_id` int(11) NOT NULL,
  `processed` tinyint(1) NOT NULL,
  `scan` tinyint(1) NOT NULL,
  `created_at` datetime DEFAULT NULL,
  `document_id` int(11) DEFAULT NULL,
  `expense_sub_category_id` int(11) DEFAULT NULL,
  `payed` tinyint(1) DEFAULT '0',
  `instalment_id` int(11) DEFAULT NULL,
  `split_id` int(11) DEFAULT NULL,
  `split` tinyint(1) DEFAULT NULL,
  `expense_part_id` int(11) DEFAULT NULL,
  `instalment_type` varchar(255) DEFAULT NULL,
  `instalment_number` int(11) DEFAULT NULL,
  `instalment_total` int(11) DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  `ofx_name` varchar(255) DEFAULT NULL,
  `ofx_id` varchar(255) DEFAULT NULL,
  `user_account_cost_center_id` int(11) DEFAULT NULL,
  `expense_type` varchar(1) DEFAULT NULL,
  `transfer_id` int(11) DEFAULT NULL,
  `expense_payment_type_id` int(11) DEFAULT NULL,
  `expense_payment_type_card_acquire_id` int(11) DEFAULT NULL,
  `document_number` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `fk_expense_category` (`expense_category_id`),
  KEY `fk_expense_account` (`user_account_id`),
  KEY `fk_expense_paydate` (`paydate`),
  KEY `idx_split_id` (`split_id`),
  KEY `idx_transfer_id` (`transfer_id`),
  KEY `idx_cost_center` (`user_account_cost_center_id`),
  KEY `idx_instalment_id` (`instalment_id`)
) ENGINE=InnoDB AUTO_INCREMENT=393802251 DEFAULT CHARSET=latin1;

CREATE TABLE `_expenses_gho` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `user_account_id` int(11) NOT NULL,
  `suplier` varchar(150) DEFAULT NULL,
  `price` decimal(10,2) DEFAULT NULL,
  `duedate` datetime DEFAULT NULL,
  `paydate` datetime DEFAULT NULL,
  `comment` text,
  `expense_category_id` int(11) NOT NULL,
  `processed` tinyint(1) NOT NULL,
  `scan` tinyint(1) NOT NULL,
  `created_at` datetime DEFAULT NULL,
  `document_id` int(11) DEFAULT NULL,
  `expense_sub_category_id` int(11) DEFAULT NULL,
  `payed` tinyint(1) DEFAULT '0',
  `instalment_id` int(11) DEFAULT NULL,
  `split_id` int(11) DEFAULT NULL,
  `split` tinyint(1) DEFAULT NULL,
  `expense_part_id` int(11) DEFAULT NULL,
  `instalment_type` varchar(255) DEFAULT NULL,
  `instalment_number` int(11) DEFAULT NULL,
  `instalment_total` int(11) DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  `ofx_name` varchar(255) DEFAULT NULL,
  `ofx_id` varchar(255) DEFAULT NULL,
  `user_account_cost_center_id` int(11) DEFAULT NULL,
  `expense_type` varchar(1) DEFAULT NULL,
  `transfer_id` int(11) DEFAULT NULL,
  `expense_payment_type_id` int(11) DEFAULT NULL,
  `expense_payment_type_card_acquire_id` int(11) DEFAULT NULL,
  `document_number` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `fk_expense_category` (`expense_category_id`),
  KEY `fk_expense_account` (`user_account_id`),
  KEY `fk_expense_paydate` (`paydate`),
  KEY `idx_split_id` (`split_id`),
  KEY `idx_transfer_id` (`transfer_id`),
  KEY `idx_cost_center` (`user_account_cost_center_id`),
  KEY `idx_instalment_id` (`instalment_id`),
  KEY `idx_company_bacc_category_subcategory` (`user_account_id`,`user_account_cost_center_id`,`expense_category_id`,`expense_sub_category_id`),
  KEY `idx_company_bacc_subcategory` (`user_account_id`,`user_account_cost_center_id`,`expense_sub_category_id`),
  KEY `idx_company_bacc_category_paydate` (`user_account_id`,`user_account_cost_center_id`,`expense_category_id`,`paydate`),
  KEY `idx_company_bacc_category_duedate` (`user_account_id`,`user_account_cost_center_id`,`expense_category_id`,`duedate`),
  KEY `idx_company_category` (`user_account_id`,`expense_category_id`,`expense_sub_category_id`),
  KEY `idx_company_subcategory` (`user_account_id`,`expense_sub_category_id`),
  KEY `idx_company_paydate` (`user_account_id`,`paydate`),
  KEY `idx_company_supplier` (`user_account_id`,`suplier`),
  KEY `idx_company_duedate` (`user_account_id`,`duedate`),
  KEY `idx_company_exppart` (`user_account_id`,`expense_part_id`),
  KEY `idx_company_exppart_paydate` (`user_account_id`,`expense_part_id`,`paydate`),
  KEY `idx_company_exppart_duedate` (`user_account_id`,`expense_part_id`,`duedate`),
  KEY `idx_updated_at` (`updated_at`)
) ENGINE=InnoDB AUTO_INCREMENT=393801719 DEFAULT CHARSET=latin1;

I've tried to change the connection default charset to latin1 on my.cnf but had no luck. Do you guys have a workaround for it?

MrMauricioLeite · 2017-07-19T22:31:25Z

@shlomi-noach I've just got it to work.
But the solution is terrible.... I needed to MODIFY comment TEXT charset binary.

Now my column have mixed results. What was inserted by the copy routine is in plain text (charset working fine) but what was inserted by the UPDATE/REPLACE/INSERTis in binary format and is only readable if I convert using latin1.

From what I see the REPLACE and UPDATE commands need simply to have a convert( [04 37 21 84 32] using latin1) or similar approach. What do you think?

MrMauricioLeite · 2017-07-20T00:12:02Z

Last suggestion, and I promise to stop bugging until you find time to answer.
I saw that to fix timezone conversion you added a customization for affected columns on buildColumnsPreparedValues function (go/sql/builder.go:39)

I am no pro, but can we have the same for charset customization? Letting me control if all columns with type varchar and text should be forced on a convert(? using latin1) (or any other charset)? Maybe using a new command line option --force-text-columns-to-charset=latin1.

This would definitely fix my case and potentially the other's too.
Let me know your thoughts.

shlomi-noach · 2017-07-20T14:19:23Z

can we have the same for charset customization? Letting me control if all columns with type varchar and text should be forced on a convert(? using latin1) (or any other charset)? Maybe using a new command line option --force-text-columns-to-charset=latin1.

@MrMauricioLeite To clarify, you're suggesting a --force-text-columns-to-charset=X flag that would enforce encoding of all textual columns to charset X?

I think this would be doable. There was a past attempt at doing convert(? using charset) a while back. To be honest, it only made me even more confused, and it didn't solve the case I was looking at.

I need a precise test case though. Is #455 a good test case? Can you please suggest another?

MrMauricioLeite · 2017-07-21T01:13:38Z

Well the issue here happens when the INSERT prepared statement receive arguments like:
insert into table _xyz_gho (text_column) values (?): args ([01 03 87 30 95 17 85])

As you may notice it's sending the binary representation of a text value (maybe with special chars), not the value in plain text. Can you reproduce INSERT or UPDATE statements doing that? Any test that gives you that is the perfect test case. All you will need is a table using latin1 charset and a statement like this one and the table will refuse to insert this value.

I can try to build a test case for you next week.

To solve my problem here I compiled my own gh-ost code that forces any column with name comment (my problematic column) to use convert(? using latin1).

About your question: Exactly, I am suggesting a --force-text-columns-to-charset=X that will enforce all VARCHAR and TEXT columns to use convert(? using charset).

shlomi-noach · 2017-07-23T08:32:08Z

To solve my problem here I compiled my own gh-ost code that forces any column with name comment (my problematic column) to use convert(? using latin1).

Can you please submit a pull request so I can see the change?

shlomi-noach self-assigned this Oct 26, 2016

shlomi-noach mentioned this issue Oct 26, 2016

more DML tests for latin1 #291

Merged

shlomi-noach added the bug label Nov 1, 2016

shlomi-noach mentioned this issue Nov 1, 2016

WIP: adding tests for latin1 charset column that has utf8 characters #303

Open

shlomi-noach mentioned this issue Dec 23, 2016

Missing --binary-mode alike parameter? Tool fails with FATAL Error 1366: Incorrect string value. #342

Open

shlomi-noach mentioned this issue Jul 20, 2017

latin1 tests with TEXT columns #455

Merged

cenkore mentioned this issue Jan 3, 2018

error 1366 on invalid character #532

Closed

github / gh-ost

Join GitHub today

GitHub is where the world builds software

gh-ost aborts on trying to UPDATE, errors on invalid character #290

gh-ost aborts on trying to UPDATE, errors on invalid character #290

Comments

mizioumt commented Oct 26, 2016 • edited by shlomi-noach

mizioumt commented Oct 26, 2016

shlomi-noach commented Oct 26, 2016

shlomi-noach commented Oct 26, 2016 • edited

shlomi-noach commented Oct 26, 2016

mizioumt commented Oct 27, 2016

shlomi-noach commented Oct 27, 2016 • edited

mizioumt commented Oct 28, 2016

shlomi-noach commented Oct 28, 2016

mizioumt commented Oct 28, 2016 • edited

mizioumt commented Oct 28, 2016

jeisen commented Oct 31, 2016 • edited by shlomi-noach

shlomi-noach commented Nov 1, 2016

shlomi-noach commented Nov 1, 2016

jeisen commented Nov 1, 2016

ggunson commented Nov 1, 2016

shlomi-noach commented Nov 2, 2016

shlomi-noach commented Nov 2, 2016

mizioumt commented Nov 2, 2016

shlomi-noach commented Nov 2, 2016

jeisen commented Nov 2, 2016

shlomi-noach commented Nov 2, 2016

shlomi-noach commented Nov 3, 2016

mizioumt commented Nov 4, 2016

mizioumt commented Nov 11, 2016

shlomi-noach commented Nov 11, 2016

mizioumt commented Nov 11, 2016

mizioumt commented Nov 17, 2016

shlomi-noach commented Nov 17, 2016

mizioumt commented Nov 18, 2016

shlomi-noach commented Nov 18, 2016

mizioumt commented Nov 18, 2016

mizioumt commented Nov 18, 2016

mizioumt commented Nov 21, 2016 • edited

shlomi-noach commented Nov 21, 2016

mizioumt commented Nov 28, 2016

mizioumt commented Nov 28, 2016

ffdlx commented Jan 10, 2017

MrMauricioLeite commented Jul 19, 2017 • edited

MrMauricioLeite commented Jul 19, 2017 • edited

MrMauricioLeite commented Jul 20, 2017 • edited

shlomi-noach commented Jul 20, 2017 • edited

MrMauricioLeite commented Jul 21, 2017

shlomi-noach commented Jul 23, 2017

Essential cookies

Always active

Analytics cookies

mizioumt commented Oct 26, 2016 •

edited by shlomi-noach

shlomi-noach commented Oct 26, 2016 •

edited

shlomi-noach commented Oct 27, 2016 •

edited

mizioumt commented Oct 28, 2016 •

edited

jeisen commented Oct 31, 2016 •

edited by shlomi-noach

mizioumt commented Nov 21, 2016 •

edited

MrMauricioLeite commented Jul 19, 2017 •

edited

MrMauricioLeite commented Jul 19, 2017 •

edited

MrMauricioLeite commented Jul 20, 2017 •

edited

shlomi-noach commented Jul 20, 2017 •

edited