Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6643

Improve performance of string processing in the parser

    Details

    • Type: Task
    • Status: In Review
    • Priority: Major
    • Resolution: Unresolved
    • Fix Version/s: 10.2
    • Component/s: None
    • Labels:
      None

      Description

      There is a bottleneck in how string literals are processed in the parser.
      It especially affects long BLOB/TEXT values.

      1. The function get_text() is sql_lex.cc allocates an unescaped copy of every string literal (unescaping backslashes and double quotes, if any). Strangely, copying happens even if there are no really any escapes in the string.

      2. The syntax parser in sql_yacc.yy creates Item_string using the new unescaped buffer.
      Furthermore, in case of a multi-byte connection character set (e.g. utf8), the constructor for Item_string performs another loop on the unescaped buffer, to calculate length in characters, which is needed to set max_length properly.

      I would be nice to create Items using directly the SQL fragment, without making a copy, including escaped values.

      Length in characters can also be calculated during the very first pass in get_text(), without any additional loops in the Item constructors.

      Unescaping can be done in the very end, when the value is actually needed:

      • Either in Field::store(), if the string value is used for:
        INSERT INTO t1 VALUES('string');
        

        Unescaping should be done directly to the Field buffer, without any intermediary temporary storage.

      • Or in val_str(), if the string value is used elsewhere (in SELECT list, functions, operators, etc).

      The unescaped value should be cached, to make sure that val_str() does not do unescaping multiple times (e.g.per multiple rows), like in:

      SELECT * FROM t1 WHERE a='string with backslash or quote escapes';
      

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              bar Alexander Barkov added a comment -

              Sent a new version for review, with the most important problems fixed:

              • removed a number of small classes (grouped them into bigger ones)
              • moved most of the new code into a separate file sql_strconv.h

              Now it should be somewhat easier to review.

              Show
              bar Alexander Barkov added a comment - Sent a new version for review, with the most important problems fixed: removed a number of small classes (grouped them into bigger ones) moved most of the new code into a separate file sql_strconv.h Now it should be somewhat easier to review.

                People

                • Assignee:
                  monty Michael Widenius
                  Reporter:
                  bar Alexander Barkov
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  2 Start watching this issue

                  Dates

                  • Created:
                    Updated:

                    Time Tracking

                    Estimated:
                    Original Estimate - Not Specified
                    Not Specified
                    Remaining:
                    Remaining Estimate - 0 minutes
                    0m
                    Logged:
                    Time Spent - 6 hours
                    6h