Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4144

simple subquery causes full scan instead of range scan

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.0.0, 5.5.29, 5.3.12
    • Fix Version/s: 10.0.2, 5.5.31, 5.3.13
    • Component/s: None
    • Labels:

      Description

      Test case:

      --source include/have_innodb.inc
      
      CREATE TABLE t (
        id int not null auto_increment,
        x int not null,
        primary key(id)
      )engine=innodb;
      
      insert into t (x) values(0),(0),(0);
      insert into t (x) select 0 from t as t1,t as t2;
      insert into t (x) select 0 from t as t1,t as t2;
      insert into t (x) select 0 from t as t1,t as t2;
      
      SELECT (SELECT MAX(id) - 1000 FROM t) INTO @a;
      FLUSH STATUS;
      SELECT x FROM t WHERE id > @a ORDER BY x LIMIT 1;
      SHOW STATUS LIKE 'handler_read%';
      FLUSH STATUS;
      SELECT x FROM t WHERE id > (SELECT MAX(id) - 1000 FROM t) ORDER BY x LIMIT 1;
      SHOW STATUS LIKE 'handler_read%';
      DROP TABLE t;
      

      output on mariadb (tested 5.3.5, 5.5.28, 5.5.29 and 10.0.0):

      mysql [localhost] {msandbox} (test) > SELECT x FROM t WHERE id > @a ORDER BY x LIMIT 1;
      +---+
      | x |
      +---+
      | 0 |
      +---+
      1 row in set (0.00 sec)
      
      mysql [localhost] {msandbox} (test) > SHOW STATUS LIKE 'handler_read%';
      +--------------------------+-------+
      | Variable_name            | Value |
      +--------------------------+-------+
      | Handler_read_first       | 0     |
      | Handler_read_key         | 1     |
      | Handler_read_last        | 0     |
      | Handler_read_next        | 1000  |
      | Handler_read_prev        | 0     |
      | Handler_read_rnd         | 0     |
      | Handler_read_rnd_deleted | 0     |
      | Handler_read_rnd_next    | 0     |
      +--------------------------+-------+
      8 rows in set (0.00 sec)
      
      mysql [localhost] {msandbox} (test) > FLUSH STATUS;
      Query OK, 0 rows affected (0.00 sec)
      
      mysql [localhost] {msandbox} (test) > SELECT x FROM t WHERE id > (SELECT MAX(id) - 1000 FROM t) ORDER BY x DESC LIMIT 1;
      +---+
      | x |
      +-- +
      | 0 |
      +---+
      1 row in set (0.00 sec)
      
      mysql [localhost] {msandbox} (test) > SHOW STATUS LIKE 'handler_read%';
      +--------------------------+-------+
      | Variable_name            | Value |
      +--------------------------+-------+
      | Handler_read_first       | 0     |
      | Handler_read_key         | 0     |
      | Handler_read_last        | 1     |
      | Handler_read_next        | 0     |
      | Handler_read_prev        | 0     |
      | Handler_read_rnd         | 0     |
      | Handler_read_rnd_deleted | 0     |
      | Handler_read_rnd_next    | 24493 |
      +--------------------------+-------+
      

      output on mysql (tested 5.0 and 5.1):

      mysql> SELECT x FROM t WHERE id > @a ORDER BY x LIMIT 1;
      +---+
      | x |
      +---+
      | 0 |
      +---+
      1 row in set (0.00 sec)
      
      mysql> SHOW STATUS LIKE 'handler_read%';
      +-----------------------+-------+
      | Variable_name         | Value |
      +-----------------------+-------+
      | Handler_read_first    | 0     |
      | Handler_read_key      | 3     |
      | Handler_read_next     | 1000  |
      | Handler_read_prev     | 0     |
      | Handler_read_rnd      | 0     |
      | Handler_read_rnd_next | 0     |
      +-----------------------+-------+
      6 rows in set (0.00 sec)
      
      mysql> FLUSH STATUS;
      Query OK, 0 rows affected (0.00 sec)
      
      mysql> SELECT x FROM t WHERE id > (SELECT MAX(id) - 1000 FROM t) ORDER BY x LIMIT 1;
      +---+
      | x |
      +---+
      | 0 |
      +---+
      1 row in set (0.00 sec)
      
      mysql> SHOW STATUS LIKE 'handler_read%';
      +-----------------------+-------+
      | Variable_name         | Value |
      +-----------------------+-------+
      | Handler_read_first    | 0     |
      | Handler_read_key      | 5     |
      | Handler_read_next     | 1000  |
      | Handler_read_prev     | 0     |
      | Handler_read_rnd      | 0     |
      | Handler_read_rnd_next | 0     |
      +-----------------------+-------+
      

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            psergey Sergei Petrunia added a comment -

            Agree.. The subquery is considered expensive, and range optimizer doesn't consider the "id > (SELECT ...)" predicate.

            Show
            psergey Sergei Petrunia added a comment - Agree.. The subquery is considered expensive, and range optimizer doesn't consider the "id > (SELECT ...)" predicate.
            Hide
            psergey Sergei Petrunia added a comment -

            What is weird, is why is this subquery considered expensive. The code in item_subselect.cc: Item_subselect::is_expensive() looks as if it was written to be able to distinguish between the expensive and in-expensive subqueries. Yet, debugging shows that execution in Item_subselect::is_expensive() returns here:

            /* If a subquery is not optimized we cannot estimate its cost. */
            if (!cur_join->join_tab)
            return true;

            I'm wondering whether Patryk's addition may hit the subqueries that are not-yet-optimized (Do those subqueries may have cur_join->table_count == cur_join->const_tables?).

            Will discuss with Timour (his code).

            Show
            psergey Sergei Petrunia added a comment - What is weird, is why is this subquery considered expensive. The code in item_subselect.cc: Item_subselect::is_expensive() looks as if it was written to be able to distinguish between the expensive and in-expensive subqueries. Yet, debugging shows that execution in Item_subselect::is_expensive() returns here: /* If a subquery is not optimized we cannot estimate its cost. */ if (!cur_join->join_tab) return true; I'm wondering whether Patryk's addition may hit the subqueries that are not-yet-optimized (Do those subqueries may have cur_join->table_count == cur_join->const_tables?). Will discuss with Timour (his code).
            Hide
            pomyk Patryk Pomykalski added a comment -

            There's actually 'optimized' flag in join so maybe it should be something like this:

            — sql/item_subselect.cc 2013-03-17 10:41:25 +0000
            +++ sql/item_subselect.cc 2013-03-27 14:03:06 +0000
            @@ -548,8 +548,11 @@
            continue;

            /* If a subquery is not optimized we cannot estimate its cost. */
            + if (!cur_join->optimized)
            + return true;
            +
            if (!cur_join->join_tab)

            • return true;
              + continue;

            if (sl->first_inner_unit())
            {

            Show
            pomyk Patryk Pomykalski added a comment - There's actually 'optimized' flag in join so maybe it should be something like this: — sql/item_subselect.cc 2013-03-17 10:41:25 +0000 +++ sql/item_subselect.cc 2013-03-27 14:03:06 +0000 @@ -548,8 +548,11 @@ continue; /* If a subquery is not optimized we cannot estimate its cost. */ + if (!cur_join->optimized) + return true; + if (!cur_join->join_tab) return true; + continue; if (sl->first_inner_unit()) {
            Hide
            timour Timour Katchaounov added a comment -

            Patryk, you are very close to the solution, however JOIN::optimized is set in the very beginning of JOIN::optimize.
            As a result it may happen that JOIN::optimize didn't produce an executable plan. At the same time, one of the
            ideas behind ::is_expensive() is to tell the optimizer that an expression is not expensive, and thus it can be evaluated
            during optimization. So this method should rule out subquery plans that cannot or should not be evaluated during optimization.

            There are much more extensive flags that describe the state of a query plan in a development branch of a 10.0 feature,
            however these cannot be backported to a GA version.

            Show
            timour Timour Katchaounov added a comment - Patryk, you are very close to the solution, however JOIN::optimized is set in the very beginning of JOIN::optimize. As a result it may happen that JOIN::optimize didn't produce an executable plan. At the same time, one of the ideas behind ::is_expensive() is to tell the optimizer that an expression is not expensive, and thus it can be evaluated during optimization. So this method should rule out subquery plans that cannot or should not be evaluated during optimization. There are much more extensive flags that describe the state of a query plan in a development branch of a 10.0 feature, however these cannot be backported to a GA version.
            Hide
            timour Timour Katchaounov added a comment -

            Pushed to 5.5.31

            Show
            timour Timour Katchaounov added a comment - Pushed to 5.5.31

              People

              • Assignee:
                timour Timour Katchaounov
                Reporter:
                pomyk Patryk Pomykalski
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 2 hours
                  2h