Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6878

Use of uninitialized saved_primary_key in Mrr_ordered_index_reader::resume_read()

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 10.0.12
    • Fix Version/s: 10.0.15
    • Component/s: Optimizer
    • Labels:

      Description

      I believe that Mrr_ordered_index_reader::resume_read() is using saved_primary_key uninitialized if the current read hasn't been interrupted. This manifests itself in our case with the following case:

      SET SESSION
        optimizer_switch="mrr=on,mrr_sort_keys=on",
        join_cache_level=8; 
      
      SELECT DISTINCT a.x FROM a LEFT JOIN b ON (a.x = b.x) WHERE ...
      

      Our query produces an EXPLAIN containing:

      "Using join buffer (flat, BKAH join); Key-ordered Rowid-ordered scan"
      

      With the result of the query we get many warnings like:

      "Warning 1366 Incorrect string value: '\xE6S\x01\x00\x00\x00...' for column 'y' at row 93"
      

      This is because the buffer being purported to be column 'y' is uninitialized or random data, and doesn't pass as valid UTF-8.

      Unfortunately I don't have a minimal test case for this yet, but I am able to reproduce it with sensitive data locally and can prove that the below patch fixes the symptom.

      Patch follows:

      --- sql/multi_range_read.cc        2014-07-10 23:01:30.000000000 -0700
      +++ sql/multi_range_read.cc        2014-10-15 19:34:56.000000000 -0700
      @@ -467,6 +467,9 @@ void Mrr_ordered_index_reader::position(
       
       void Mrr_ordered_index_reader::resume_read()
       {
      +  if (have_saved_rowid == FALSE)
      +    return;
      +
         TABLE *table= file->get_table();
         KEY *used_index= &table->key_info[file->active_index];
         key_restore(table->record[0], saved_key_tuple, 
      @@ -477,6 +480,8 @@ void Mrr_ordered_index_reader::resume_re
                       &table->key_info[table->s->primary_key],
                       table->key_info[table->s->primary_key].key_length);
         }
      +
      +  have_saved_rowid= FALSE;
       }
      

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            psergey Sergei Petrunia added a comment - - edited

            I could create a small testcase:

            create table ten(a int);
            insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
            
            create table t14 (
              pk varchar(32) character set utf8 primary key,
              kp1 char(32) not null,
              col1 varchar(32),
              key (kp1)
            );
            
            insert into t14 
            select 
              concat('pk-', 1000 +A.a),
              concat('kp1-', 1000 +A.a),
              concat('val-', 1000 +A.a)
            from test.ten A ;
            
            create table t16 as select kp1 as a from t14;
            
            set join_cache_level=8;
            set optimizer_switch='mrr=on,mrr_sort_keys=on';
            select * from t16 straight_join t14 force index(kp1) where t14.kp1=t16.a;
            show warnings;
            +---------+------+---------------------------------------------------------------------------------+
            | Level   | Code | Message                                                                         |
            +---------+------+---------------------------------------------------------------------------------+
            | Warning | 1366 | Incorrect string value: '\xA5\xA5\xA5\xA5\xA5\xA5...' for column 'pk' at row 11 |
            +---------+------+---------------------------------------------------------------------------------+
            
            explain select * from t16 straight_join t14 force index(kp1) where t14.kp1=t16.a;
            +------+-------------+-------+------+---------------+------+---------+-----------+------+---------------------------------------------------------------------+
            | id   | select_type | table | type | possible_keys | key  | key_len | ref       | rows | Extra                                                               |
            +------+-------------+-------+------+---------------+------+---------+-----------+------+---------------------------------------------------------------------+
            |    1 | SIMPLE      | t16   | ALL  | NULL          | NULL | NULL    | NULL      |   10 |                                                                     |
            |    1 | SIMPLE      | t14   | ref  | kp1           | kp1  | 32      | j12.t16.a |    1 | Using join buffer (flat, BKAH join); Key-ordered Rowid-ordered scan |
            +------+-------------+-------+------+---------------+------+---------+-----------+------+---------------------------------------------------------------------+
            
            Show
            psergey Sergei Petrunia added a comment - - edited I could create a small testcase: create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table t14 ( pk varchar(32) character set utf8 primary key, kp1 char(32) not null, col1 varchar(32), key (kp1) ); insert into t14 select concat('pk-', 1000 +A.a), concat('kp1-', 1000 +A.a), concat('val-', 1000 +A.a) from test.ten A ; create table t16 as select kp1 as a from t14; set join_cache_level=8; set optimizer_switch='mrr=on,mrr_sort_keys=on'; select * from t16 straight_join t14 force index(kp1) where t14.kp1=t16.a; show warnings; +---------+------+---------------------------------------------------------------------------------+ | Level | Code | Message | +---------+------+---------------------------------------------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xA5\xA5\xA5\xA5\xA5\xA5...' for column 'pk' at row 11 | +---------+------+---------------------------------------------------------------------------------+ explain select * from t16 straight_join t14 force index(kp1) where t14.kp1=t16.a; +------+-------------+-------+------+---------------+------+---------+-----------+------+---------------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+-----------+------+---------------------------------------------------------------------+ | 1 | SIMPLE | t16 | ALL | NULL | NULL | NULL | NULL | 10 | | | 1 | SIMPLE | t14 | ref | kp1 | kp1 | 32 | j12.t16.a | 1 | Using join buffer (flat, BKAH join); Key-ordered Rowid-ordered scan | +------+-------------+-------+------+---------------+------+---------+-----------+------+---------------------------------------------------------------------+
            Hide
            psergey Sergei Petrunia added a comment -

            I'm a little concerned about checking have_saved_rowid. It is defined as

              /* TRUE <=> saved_rowid has the last saved rowid */
              bool have_saved_rowid;
            

            which I suspect is not exactly the same as "whether saved_key_tuple and saved_primary_key point to valid tuples that resume_read() should restore"

            Show
            psergey Sergei Petrunia added a comment - I'm a little concerned about checking have_saved_rowid. It is defined as /* TRUE <=> saved_rowid has the last saved rowid */ bool have_saved_rowid; which I suspect is not exactly the same as "whether saved_key_tuple and saved_primary_key point to valid tuples that resume_read() should restore"
            Hide
            psergey Sergei Petrunia added a comment -

            The patch is here:

            http://lists.askmonty.org/pipermail/commits/2014-October/006793.html

            @ Jeremy Cole, could you test your case with this patch?

            Show
            psergey Sergei Petrunia added a comment - The patch is here: http://lists.askmonty.org/pipermail/commits/2014-October/006793.html @ Jeremy Cole , could you test your case with this patch?
            Hide
            jeremycole Jeremy Cole added a comment -

            Sergei Petrunia : I was wondering about that. I thought about adding a new boolean was_interrupted but it seemed like (to my uninformed self) that have_saved_rowid was already for that purpose. It looks like your patch also corrects the symptom we were seeing with production data/query as well, so the patch looks fine to me.

            I am a bit concerned that none of this fairly important optimizer code (it would seem most of MRR) has any meaningful test coverage, such that even my potentially incorrect patch doesn't cause any tests to fail, and so that these bugs can exist up to now.

            Show
            jeremycole Jeremy Cole added a comment - Sergei Petrunia : I was wondering about that. I thought about adding a new boolean was_interrupted but it seemed like (to my uninformed self) that have_saved_rowid was already for that purpose. It looks like your patch also corrects the symptom we were seeing with production data/query as well, so the patch looks fine to me. I am a bit concerned that none of this fairly important optimizer code (it would seem most of MRR) has any meaningful test coverage, such that even my potentially incorrect patch doesn't cause any tests to fail, and so that these bugs can exist up to now.
            Hide
            jeremycole Jeremy Cole added a comment -

            All tests pass here, with the new patch.

            Show
            jeremycole Jeremy Cole added a comment - All tests pass here, with the new patch.
            Hide
            psergey Sergei Petrunia added a comment -

            (didn't see the comments)

            Show
            psergey Sergei Petrunia added a comment - (didn't see the comments)
            Hide
            psergey Sergei Petrunia added a comment -

            Jeremy Cole, there was some RQG-based testing before (the interrupt_read()/resume_read() functions were added as fix for https://bugs.launchpad.net/maria/+bug/671340, before 5.3/5.5 was GA). MP AB's jira shows there was more RQG testing done after 5.3/5.5 GA, also.

            I agree with your argument that mtr tests do not have much coverage - to fully test code around interrupt/resume_read one needs to run out of buffer space and/or lookup keys at some particular points, etc. This makes it hard to test with MTR - one needs to run loads of similar queries.

            I've asked Elena Stepanova to run RQG tests again on fix for this bug.

            Show
            psergey Sergei Petrunia added a comment - Jeremy Cole , there was some RQG-based testing before (the interrupt_read()/resume_read() functions were added as fix for https://bugs.launchpad.net/maria/+bug/671340 , before 5.3/5.5 was GA). MP AB's jira shows there was more RQG testing done after 5.3/5.5 GA, also. I agree with your argument that mtr tests do not have much coverage - to fully test code around interrupt/resume_read one needs to run out of buffer space and/or lookup keys at some particular points, etc. This makes it hard to test with MTR - one needs to run loads of similar queries. I've asked Elena Stepanova to run RQG tests again on fix for this bug.

              People

              • Assignee:
                psergey Sergei Petrunia
                Reporter:
                jeremycole Jeremy Cole
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: