Replication of big5, cp932, gbk, sjis strings makes wrong values on slave

Description

This does look to be a legitimate bug. This would apply to any character
set where charset_info_st field escape_with_backslash_is_dangerous is
true, which currently is: big5, cp932, gbk, sjis.

The problem here is that string parameters coming from prepared
statements are being converted into 0xHHHH form indiscriminately in
append_query_string, which is producing the string to be binlogged for
statement-based replication. While that works okay for insertion of
strings into string fields, it causes the
conversion-from-string-to-integer which is happening on the master for
insertion of a string into an integer field to not be happening on the
slave, since 0xHHHH form is more properly an integer than a string.

This can be captured by setting a breakpoint at str_to_hex and running
this test case:

Using SHOW BINLOG EVENTS shows that the problem is from the server (binlogging) side:

0xHHHH is a MySQL extension. It's a hybrid thing.
It can behave as a number and a string depending on context.

Binary log could use the X'HHHH' notation instead:
INSERT INTO t1 VALUES (a) VALUES (X'31');

which is an SQL standard thing, and which must always be a string.

However, it seems the behaviour of X'HHHH' and of 0xHHHH
is exactly the same, and X'HHHH' can also act as a number:

Proposed fix:
1. Fix X'HHHH' to work always as string.
2. Fix binlog to use X'HHHH'

Environment

None

Status

Assignee

Alexander Barkov

Reporter

Alexander Barkov

Labels

None

External issue ID

None

External issue ID

None

Fix versions

Affects versions

Priority

Major
Configure