Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-5980

EITS: if condition is used for REF access, its selectivity is still in filtered%

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.0.9
    • Fix Version/s: 10.0.11
    • Component/s: None
    • Labels:

      Description

      create table ten(a int);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
      create table t1(key1 int, col1 int, key(key1));
      insert into t1 select A.a, A.a from ten A, ten B, ten C;
      
      set histogram_size=100;
      set use_stat_tables='preferably';
      set optimizer_use_condition_selectivity=4;
      analyze table t1 persistent for all;
      
      mysql> explain extended select * from t1 where col1=2;
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 1000 |     9.90 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
      

      This is ok.

      mysql> explain extended select * from t1 where key1=2;
      +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
      +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
      |    1 | SIMPLE      | t1    | ref  | key1          | key1 | 5       | const |  100 |    10.00 |       |
      +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
      

      This is not ok. "key1=2" is used for ref access. However, its selectivity is still present in "filtered%". Ref access is employed when a condition is very selective, counting its selectivity twice will cause gross under-estimation of output cardinality.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            psergey Sergei Petrunia added a comment -
            • so it is ref(const).
            • matching_candidates_in_table() is not called.

            We should recongize it as an equality that was bound by ref and so discount its
            selectivity (the selectivity comes from ... range... how do we figure out that
            ref-bound equality has contributed as a possible range scan?)

            Idea: keep array of induced selectivity

            table->added_selectivity[field_no] = 0.123456

            and if $field_no is used for ref access, than it's not contributing to
            selectivity, and we remove its selectivity from the product.

            Q: what to do about range accesses that span multiple key parts? It's not
            possible to remove one key part from selectivity calculations...

            Show
            psergey Sergei Petrunia added a comment - so it is ref(const). matching_candidates_in_table() is not called. We should recongize it as an equality that was bound by ref and so discount its selectivity (the selectivity comes from ... range... how do we figure out that ref-bound equality has contributed as a possible range scan?) Idea: keep array of induced selectivity table->added_selectivity [field_no] = 0.123456 and if $field_no is used for ref access, than it's not contributing to selectivity, and we remove its selectivity from the product. Q: what to do about range accesses that span multiple key parts? It's not possible to remove one key part from selectivity calculations...
            Hide
            psergey Sergei Petrunia added a comment -

            Modified the testcase a little bit so that data distribution is not uniform:

            create table t1a like t1;
            insert into t1a select * from t1;
            update t1a set key1=9 where key1=5 limit 50;
            update t1a set col1=9 where col1=5 limit 50;
            analyze table t1a persistent for all;
            
            explain extended select * from t1a where key1=5;
            +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
            | id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
            +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
            |    1 | SIMPLE      | t1a   | ref  | key1          | key1 | 5       | const |   50 |     5.00 |       |
            +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
            
            explain extended select * from t1a where key1=9;
            +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
            | id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
            +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
            |    1 | SIMPLE      | t1a   | ref  | key1          | key1 | 5       | const |  150 |    15.00 |       |
            +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
            

            Now, will debug this query:

            explain extended select * from t1a where key1=5;
            +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
            | id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
            +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
            |    1 | SIMPLE      | t1a   | ref  | key1          | key1 | 5       | const |   50 |     5.00 |       |
            +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+
            

            == calculate_cond_selectivity_for_table() ==
            finds quick_cond_selectivity=0.05
            sets it as table->cond_selectivity
            never attempts to perform records_in_column_ranges()-based analysis (this is
            the expected behavior)

            == table_cond_selectivity ==
            Starts with

            double sel= s->table->cond_selectivity; // = 0.05

            Eventually it runs this code:

                  if (keyuse->val->const_item())
                    sel*= table->field[fldno]->cond_selectivity; 
            

            However, field->cond_selectivity=1 there (because we've used range access
            estimates, not records_in_column_ranges() estimates).

            The rest of the function (including table_multi_eq_cond_selectivity() call) doesn't modify 'sel'. sel remains 0.05
            We return to best_extension_by_limited_search() and see

            (gdb) p partial_join_cardinality
              $8 = 2.5
            (gdb) print current_record_count
              $9 = 50
            (gdb) print pushdown_cond_selectivity
              $10 = 0.050000000000000003
            (gdb) print partial_join_cardinality
              $11 = 2.5
            
            Show
            psergey Sergei Petrunia added a comment - Modified the testcase a little bit so that data distribution is not uniform: create table t1a like t1; insert into t1a select * from t1; update t1a set key1=9 where key1=5 limit 50; update t1a set col1=9 where col1=5 limit 50; analyze table t1a persistent for all; explain extended select * from t1a where key1=5; +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ | 1 | SIMPLE | t1a | ref | key1 | key1 | 5 | const | 50 | 5.00 | | +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ explain extended select * from t1a where key1=9; +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ | 1 | SIMPLE | t1a | ref | key1 | key1 | 5 | const | 150 | 15.00 | | +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ Now, will debug this query: explain extended select * from t1a where key1=5; +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ | 1 | SIMPLE | t1a | ref | key1 | key1 | 5 | const | 50 | 5.00 | | +------+-------------+-------+------+---------------+------+---------+-------+------+----------+-------+ == calculate_cond_selectivity_for_table() == finds quick_cond_selectivity=0.05 sets it as table->cond_selectivity never attempts to perform records_in_column_ranges()-based analysis (this is the expected behavior) == table_cond_selectivity == Starts with double sel= s->table->cond_selectivity; // = 0.05 Eventually it runs this code: if (keyuse->val->const_item()) sel*= table->field[fldno]->cond_selectivity; However, field->cond_selectivity=1 there (because we've used range access estimates, not records_in_column_ranges() estimates). The rest of the function (including table_multi_eq_cond_selectivity() call) doesn't modify 'sel'. sel remains 0.05 We return to best_extension_by_limited_search() and see (gdb) p partial_join_cardinality $8 = 2.5 (gdb) print current_record_count $9 = 50 (gdb) print pushdown_cond_selectivity $10 = 0.050000000000000003 (gdb) print partial_join_cardinality $11 = 2.5
            Hide
            psergey Sergei Petrunia added a comment -

            Patch submitted for review

            Show
            psergey Sergei Petrunia added a comment - Patch submitted for review

              People

              • Assignee:
                psergey Sergei Petrunia
                Reporter:
                psergey Sergei Petrunia
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: