Replication of big5, cp932, gbk, sjis strings makes wrong values on slave

Description

This does look to be a legitimate bug. This would apply to any character
set where charset_info_st field escape_with_backslash_is_dangerous is
true, which currently is: big5, cp932, gbk, sjis.

The problem here is that string parameters coming from prepared
statements are being converted into 0xHHHH form indiscriminately in
append_query_string, which is producing the string to be binlogged for
statement-based replication. While that works okay for insertion of
strings into string fields, it causes the
conversion-from-string-to-integer which is happening on the master for
insertion of a string into an integer field to not be happening on the
slave, since 0xHHHH form is more properly an integer than a string.

This can be captured by setting a breakpoint at str_to_hex and running
this test case:

Using SHOW BINLOG EVENTS shows that the problem is from the server (binlogging) side:

0xHHHH is a MySQL extension. It's a hybrid thing.
It can behave as a number and a string depending on context.

Binary log could use the X'HHHH' notation instead:
INSERT INTO t1 VALUES (a) VALUES (X'31');

which is an SQL standard thing, and which must always be a string.

However, it seems the behaviour of X'HHHH' and of 0xHHHH
is exactly the same, and X'HHHH' can also act as a number:

Proposed fix:
1. Fix X'HHHH' to work always as string.
2. Fix binlog to use X'HHHH'

Environment

None

Assignee

Alexander Barkov

Reporter

Alexander Barkov

Labels

None

Fix versions

Affects versions

Priority

Major
Configure