Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6598

GROUP_CONCAT accepts malformed SEPARATOR string

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 5.3.12, 5.5.39, 10.0.13
    • Fix Version/s: 10.0
    • Component/s: None
    • Labels:
      None

      Description

      SET NAMES utf8;
      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8);
      INSERT INTO t1 VALUES ('a'),('b');
      SELECT
        GROUP_CONCAT(a SEPARATOR '😃') AS a,
        GROUP_CONCAT(a SEPARATOR 0xFF) AS b,
        GROUP_CONCAT(a SEPARATOR X'FF') AS c,
        GROUP_CONCAT(a SEPARATOR B'11111111') AS d
      FROM t1;
      

      The output is:

      +--------+------+------+------+
      | a      | b    | c    | d    |
      +--------+------+------+------+
      | a😃b     | a�b   | a�b   | a�b   |
      +--------+------+------+------+
      

      Note, '😃' is a 4-byte UTF-8 character, which is not supported by MariaDB's utf8 (one should use utf8mb4 to make 4-byte characters work).

      So all separators in the above query are wrong MariaDB's utf8 sequences.

      They should be replaced to question marks, instead of being copied
      into the result as is.

      The same result is returned if I do "SET NAMES binary":

      SET NAMES binary;
      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8);
      INSERT INTO t1 VALUES ('a'),('b');
      SELECT
        GROUP_CONCAT(a SEPARATOR '😃') AS a,
        GROUP_CONCAT(a SEPARATOR 0xFF) AS b,
        GROUP_CONCAT(a SEPARATOR X'FF') AS c,
        GROUP_CONCAT(a SEPARATOR B'11111111') AS d
      FROM t1;
      

      The output is:

      +--------+------+------+------+
      | a      | b    | c    | d    |
      +--------+------+------+------+
      | a😃b     | a�b   | a�b   | a�b   |
      +--------+------+------+------+
      

      The invalid separators are in the output again.

      Note, CHARSET() for all above GROUP_COINCATs:

      SET NAMES binary;
      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8);
      INSERT INTO t1 VALUES ('a'),('b');
      SELECT
        CHARSET(GROUP_CONCAT(a SEPARATOR '😃')) AS a,
        CHARSET(GROUP_CONCAT(a SEPARATOR 0xFF)) AS b,
        CHARSET(GROUP_CONCAT(a SEPARATOR X'FF')) AS c,
        CHARSET(GROUP_CONCAT(a SEPARATOR B'11111111')) AS d
      FROM t1;
      

      is reported as "utf8", not "binary":

      +------+------+------+------+
      | a    | b    | c    | d    |
      +------+------+------+------+
      | utf8 | utf8 | utf8 | utf8 |
      +------+------+------+------+
      

      It should be fixed:

      • either to return BINARY
        This is probably not a good idea, because the SEPATATOR syntax accepts only quoted string,
        hex and bit literals, but does not accept neither expressions nor literals with character set introducers.
        Therefore, the syntax implies that SEPARATOR should not affect the result character set.
      • or to replace bad separators to question marks.
      • or allow strings with introducers (or even all constant expressions) in SEPARATOR and make separator participate in character set and collation aggregation

        Gliffy Diagrams

          Attachments

            Activity

            There are no comments yet on this issue.

              People

              • Assignee:
                bar Alexander Barkov
                Reporter:
                bar Alexander Barkov
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: