Details
-
Type:
Bug
-
Status: Open
-
Priority:
Minor
-
Resolution: Unresolved
-
Affects Version/s: 5.3.12, 5.5.39, 10.0.13
-
Fix Version/s: 10.0
-
Component/s: None
-
Labels:None
Description
SET NAMES utf8;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8);
INSERT INTO t1 VALUES ('a'),('b');
SELECT
GROUP_CONCAT(a SEPARATOR '😃') AS a,
GROUP_CONCAT(a SEPARATOR 0xFF) AS b,
GROUP_CONCAT(a SEPARATOR X'FF') AS c,
GROUP_CONCAT(a SEPARATOR B'11111111') AS d
FROM t1;
The output is:
+--------+------+------+------+ | a | b | c | d | +--------+------+------+------+ | a😃b | a�b | a�b | a�b | +--------+------+------+------+
Note, '😃' is a 4-byte UTF-8 character, which is not supported by MariaDB's utf8 (one should use utf8mb4 to make 4-byte characters work).
So all separators in the above query are wrong MariaDB's utf8 sequences.
They should be replaced to question marks, instead of being copied
into the result as is.
The same result is returned if I do "SET NAMES binary":
SET NAMES binary; DROP TABLE IF EXISTS t1; CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8); INSERT INTO t1 VALUES ('a'),('b'); SELECT GROUP_CONCAT(a SEPARATOR '😃') AS a, GROUP_CONCAT(a SEPARATOR 0xFF) AS b, GROUP_CONCAT(a SEPARATOR X'FF') AS c, GROUP_CONCAT(a SEPARATOR B'11111111') AS d FROM t1;
The output is:
+--------+------+------+------+ | a | b | c | d | +--------+------+------+------+ | a😃b | a�b | a�b | a�b | +--------+------+------+------+
The invalid separators are in the output again.
Note, CHARSET() for all above GROUP_COINCATs:
SET NAMES binary; DROP TABLE IF EXISTS t1; CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8); INSERT INTO t1 VALUES ('a'),('b'); SELECT CHARSET(GROUP_CONCAT(a SEPARATOR '😃')) AS a, CHARSET(GROUP_CONCAT(a SEPARATOR 0xFF)) AS b, CHARSET(GROUP_CONCAT(a SEPARATOR X'FF')) AS c, CHARSET(GROUP_CONCAT(a SEPARATOR B'11111111')) AS d FROM t1;
is reported as "utf8", not "binary":
+------+------+------+------+ | a | b | c | d | +------+------+------+------+ | utf8 | utf8 | utf8 | utf8 | +------+------+------+------+
It should be fixed:
- either to return BINARY
This is probably not a good idea, because the SEPATATOR syntax accepts only quoted string,
hex and bit literals, but does not accept neither expressions nor literals with character set introducers.
Therefore, the syntax implies that SEPARATOR should not affect the result character set. - or to replace bad separators to question marks.
- or allow strings with introducers (or even all constant expressions) in SEPARATOR and make separator participate in character set and collation aggregation
Gliffy Diagrams
Attachments
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions