Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8362

dash '-' is not recognized in charset armscii8 on select where query

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 5.1.67, 5.2.14, 5.3.12, 10.1, 10.0, 5.5
    • Fix Version/s: 10.1.6
    • Component/s: Character Sets
    • Labels:
    • Environment:
      # mysql --version
      mysql Ver 15.1 Distrib 5.5.39-MariaDB, for debian-linux-gnu (x86_64) using readline 5.1
    • Sprint:
      10.1.6-2

      Description

      It looks like that, the db server could not query if the value has a dash "-" inside, as far as I know, the affected charset is armscii8.

      For more see the repo command as below, and the db dump is attached:

      MariaDB [bugtest]> create table test(columnname varchar(64) CHARACTER SET armscii8);
      Query OK, 0 rows affected (0.07 sec)
      
      MariaDB [bugtest]> insert into test values ('abc-def');
      Query OK, 1 row affected (0.04 sec)
      
      MariaDB [bugtest]> select * from test where columnname = 'abc-def';
      Empty set (0.00 sec)
      
      MariaDB [bugtest]> select * from test where columnname like 'abc%';
      +------------+
      | columnname |
      +------------+
      | abc-def    |
      +------------+
      1 row in set (0.00 sec)
      
      

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            elenst Elena Stepanova added a comment - - edited

            Thanks for the report.

            Same on MySQL 5.7, so if it's a bug, it's an upstream issue.
            Alexander Barkov,
            It does look like a bug to me, but I don't know how much this charset is supported.
            If you decide it should be fixed, but prefer to treat it as an upstream bug, please report it at bugs.mysql.com (or maybe you know it has already been reported?). Alternatively, it can be fixed directly in MariaDB.

            Show
            elenst Elena Stepanova added a comment - - edited Thanks for the report. Same on MySQL 5.7, so if it's a bug, it's an upstream issue. Alexander Barkov , It does look like a bug to me, but I don't know how much this charset is supported. If you decide it should be fixed, but prefer to treat it as an upstream bug, please report it at bugs.mysql.com (or maybe you know it has already been reported?). Alternatively, it can be fixed directly in MariaDB.
            Hide
            bar Alexander Barkov added a comment - - edited

            It seems that the problem happens during utf8-to-armscii8 conversion because
            the following ASCII characters have double encoding in the 8-bit range (0x80..0xFF):

            0xA4   U+0029   RIGHT PARENTHESIS
            0xA5   U+0028   LEFT PARENTHESIS
            0xA9   U+002E   FULL STOP
            0xAB   U+002C   COMMA
            0xAC   U+002D   HYPHEN-MINUS
            0xFF   U+0027   APOSTROPHE
            

            So utf8 dash '-' is erroneously converted to armscii 0xAC instead of 0x2D:

            MariaDB [test]> SELECT HEX(CONVERT(_utf8 0x2D USING armscii8));
            +-----------------------------------------+
            | HEX(CONVERT(_utf8 0x2D USING armscii8)) |
            +-----------------------------------------+
            | AC                                      |
            +-----------------------------------------+
            1 row in set (0.00 sec)
            

            This should be fixed.

            There is also a problem in the collation definition. It should probably sort the double coded characters as equal (e.g. armscii 0x2D should be equal to 0xAC).

            Show
            bar Alexander Barkov added a comment - - edited It seems that the problem happens during utf8-to-armscii8 conversion because the following ASCII characters have double encoding in the 8-bit range (0x80..0xFF): 0xA4 U+0029 RIGHT PARENTHESIS 0xA5 U+0028 LEFT PARENTHESIS 0xA9 U+002E FULL STOP 0xAB U+002C COMMA 0xAC U+002D HYPHEN-MINUS 0xFF U+0027 APOSTROPHE So utf8 dash '-' is erroneously converted to armscii 0xAC instead of 0x2D: MariaDB [test]> SELECT HEX(CONVERT(_utf8 0x2D USING armscii8)); +-----------------------------------------+ | HEX(CONVERT(_utf8 0x2D USING armscii8)) | +-----------------------------------------+ | AC | +-----------------------------------------+ 1 row in set (0.00 sec) This should be fixed. There is also a problem in the collation definition. It should probably sort the double coded characters as equal (e.g. armscii 0x2D should be equal to 0xAC).
            Hide
            winguse Yingyu Cheng added a comment -

            @Elena Stepanova, I did not test or report to upstream MySQL. Do I need to do that? Or maybe you can help?

            Show
            winguse Yingyu Cheng added a comment - @Elena Stepanova, I did not test or report to upstream MySQL. Do I need to do that? Or maybe you can help?
            Hide
            elenst Elena Stepanova added a comment -

            Yingyu Cheng,
            No problem, if we decide it's worth trying, then either Alexander Barkov or I will do that.

            Show
            elenst Elena Stepanova added a comment - Yingyu Cheng , No problem, if we decide it's worth trying, then either Alexander Barkov or I will do that.

              People

              • Assignee:
                bar Alexander Barkov
                Reporter:
                winguse Yingyu Cheng
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Agile