Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-317

CHEAP SQ: A query with EXISTS subquery, aggregate function in subquery, ORDER BY and LIMIT takes much longer than on main tree if optimizer_prune_level=0

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: 5.5.27
    • Component/s: None
    • Labels:
      None

      Description

      Note: with the provided test case, the problem is only reproducible when optimizer_prune_level=0. If you decide that it's acceptable and isn't worth fixing, I won't object, but I'm filing it so you could check if it's expected.

      The following query

      SELECT a FROM t1, t2, t3 AS t3_alias
      WHERE b <= ( SELECT SUM(a) FROM t1 )
        AND EXISTS ( SELECT 1 FROM t3 WHERE c > t3_alias.c )
      ORDER BY a LIMIT 1;
      

      takes ~18 times longer on MDEV-193 tree comparing to main/5.5 revno 3426 (and to 3402).

      Reproducible with MyISAM and Aria; with InnoDB it's sporadic.
      Reproducible with the default optimizer_switch.

      EXPLAIN on MDEV-193:

      id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
      1	PRIMARY	t1	index	NULL	a	5	NULL	40	100.00	Using index; Using temporary; Using filesort
      1	PRIMARY	t2	range	b	b	5	NULL	96	100.00	Using where; Using index; Using join buffer (flat, BNL join)
      1	PRIMARY	t3_alias	index	NULL	c	5	NULL	40	100.00	Using where; Using index; Using join buffer (incremental, BNL join)
      3	DEPENDENT SUBQUERY	t3	index	c	c	5	NULL	40	100.00	Using where; Using index
      2	SUBQUERY	t1	index	NULL	a	5	NULL	40	100.00	Using index
      Warnings:
      Note	1276	Field or reference 'test.t3_alias.c' of SELECT #3 was resolved in SELECT #1
      Note	1003	select `test`.`t1`.`a` AS `a` from `test`.`t1` join `test`.`t2` join `test`.`t3` `t3_alias` where ((`test`.`t2`.`b` <= (select sum(`test`.`t1`.`a`) from `test`.`t1`)) and <expr_cache><`test`.`t3_alias`.`c`>(exists(select 1 from `test`.`t3` where (`test`.`t3`.`c` > `test`.`t3_alias`.`c`)))) order by `test`.`t1`.`a` limit 1
      

      EXPLAIN on maria/5.5:

      id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
      1	PRIMARY	t1	index	NULL	a	5	NULL	40	100.00	Using index; Using temporary; Using filesort
      1	PRIMARY	t3_alias	index	NULL	c	5	NULL	40	100.00	Using where; Using index; Using join buffer (flat, BNL join)
      1	PRIMARY	t2	index	b	b	5	NULL	100	100.00	Using where; Using index; Using join buffer (incremental, BNL join)
      3	DEPENDENT SUBQUERY	t3	index	c	c	5	NULL	40	100.00	Using where; Using index
      2	SUBQUERY	t1	index	NULL	a	5	NULL	40	100.00	Using index
      Warnings:
      Note	1276	Field or reference 'test.t3_alias.c' of SELECT #3 was resolved in SELECT #1
      Note	1003	select `test`.`t1`.`a` AS `a` from `test`.`t1` join `test`.`t2` join `test`.`t3` `t3_alias` where ((`test`.`t2`.`b` <= (select sum(`test`.`t1`.`a`) from `test`.`t1`)) and <expr_cache><`test`.`t3_alias`.`c`>(exists(select 1 from `test`.`t3` where (`test`.`t3`.`c` > `test`.`t3_alias`.`c`)))) order by `test`.`t1`.`a` limit 1
      

      Test case:

      SET optimizer_prune_level = 0;
      
      CREATE TABLE t1 (a INT, KEY(a)) ENGINE=MyISAM;
      INSERT INTO t1 VALUES
        (0),(8),(1),(8),(9),(24),(6),(1),(6),(2),
        (4),(8),(4),(4),(7),(4),(1),(9),(4),(8);
      INSERT INTO t1 SELECT * FROM t1;
      
      CREATE TABLE t2 (b INT, KEY(b)) ENGINE=MyISAM;
      INSERT INTO t2 VALUES
        (4),(8),(0),(0),(0),(7),(7),(5),(3),(188),
        (4),(9),(6),(1),(5),(6),(2),(4),(231),(4),
        (3),(3),(7),(6),(7),(9),(4),(4),(2),(1),(2),
        (194),(2),(3),(8),(4),(9),(4),(5),(5),(9),
        (3),(8),(0),(98),(3),(1),(0),(189),(8),(3),
        (3),(9),(6),(8),(3),(9),(5),(9),(2),(2),(5),
        (8),(6),(9),(0),(3),(6),(5),(8),(2),(120),
        (25),(1),(3),(1),(3),(153),(5),(9),(1),(8),
        (7),(6),(2),(4),(7),(3),(8),(4),(6),(1),(7),
        (1),(0),(2),(7),(2),(1),(5);
      
      CREATE TABLE t3 (c INT, KEY(c)) ENGINE=MyISAM;
      INSERT INTO t3 VALUES
        (7),(0),(9),(3),(4),(2),(5),(3),(1),(3),(6),
        (7),(5),(1),(204),(224),(9),(5),(0),(3);
      INSERT INTO t3 SELECT * FROM t3;
      
      SELECT a FROM t1, t2, t3 AS t3_alias
      WHERE b <= ( SELECT SUM(a) FROM t1 )
        AND EXISTS ( SELECT 1 FROM t3 WHERE c > t3_alias.c )
      ORDER BY a LIMIT 1;
      

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              timour Timour Katchaounov added a comment -

              This is not a bug, but rather a deficiency of the current optimizer that is
              completely unrelated to mdev-193. It is a regression only with respect to
              MariaDB 5.5/5.3. Detailed explanation follows.

              1. What happens in mdev-193

              The patch for mdev-193 makes MariaDB 5.5 behave like any MySQL version, or
              MariaDB 5.2, namely - it allows constant subqueries to be evaluated during
              optimization, and the resulting value to be used by the optimizer.

              Specifically in this test case, the subquery (SELECT SUM(a) FROM t1) is
              evaluated by the range optimizer, and the final query plan uses "range"
              access to table "t2".

              If we set "optimizer_prune_level = 0" (that is, not pruning, consider all
              plans), the optimizer determines that the optimal query plan is:

              ------------------------------------------------------------------------------------------------------------------------------------------------

              id select_type table type possible_keys key key_len ref rows filtered Extra

              ------------------------------------------------------------------------------------------------------------------------------------------------

              1 PRIMARY t1 index NULL a 5 NULL 40 100.00 Using index
              1 PRIMARY t2 range b b 5 NULL 96 100.00 Using where; Using index; Using join buffer (flat, BNL join)
              1 PRIMARY t3_alias index NULL c 5 NULL 40 100.00 Using where; Using index; Using join buffer (incremental, BNL join)
              3 DEPENDENT SUBQUERY t3 index c c 5 NULL 40 100.00 Using where; Using index
              2 SUBQUERY t1 index NULL a 5 NULL 40 100.00 Using index

              ------------------------------------------------------------------------------------------------------------------------------------------------
              join->best_read = 32396

              If we set "optimizer_prune_level = 1" (that is prune some plans), the optimizer
              prunes the above plan, and finds a worse plan with respect to its cost model:

              ------------------------------------------------------------------------------------------------------------------------------------------------

              id select_type table type possible_keys key key_len ref rows filtered Extra

              ------------------------------------------------------------------------------------------------------------------------------------------------

              1 PRIMARY t1 index NULL a 5 NULL 40 100.00 Using index
              1 PRIMARY t3_alias index NULL c 5 NULL 40 100.00 Using where; Using index; Using join buffer (flat, BNL join)
              1 PRIMARY t2 range b b 5 NULL 96 100.00 Using where; Using index; Using join buffer (incremental, BNL join)
              3 DEPENDENT SUBQUERY t3 index c c 5 NULL 40 100.00 Using where; Using index
              2 SUBQUERY t1 index NULL a 5 NULL 40 100.00 Using index

              ------------------------------------------------------------------------------------------------------------------------------------------------
              join->best_read = 66869 (~ two times higher cost)

              However, the second plan in reality is much faster. The reason for this is that
              table "t3_alias" appears in second place in the plan instead of in third, as a
              result the correlated EXISTS subquery is executed fewer times (1531 times). The
              first "optimal" plan puts table "t3_alias" in third place in the join, which
              results in the EXISTS subquery being executed 152011 times. At the same time
              the result of the constant subquery is cached, so it doesn't matter much how
              many times it is evaluated.

              2. Exposing the same problem in the main 5.5 branch

              If we substitute manually the aggregate subquery with its constant value, we can
              reproduce the problem in 5.5 via the following test case:

              – better plan
              SET optimizer_prune_level = 1;

              EXPLAIN
              SELECT count FROM t1, t2, t3 AS t3_alias
              WHERE b <= 236
              AND EXISTS ( SELECT 1 FROM t3 WHERE c > t3_alias.c )
              ORDER BY a LIMIT 1;

              – worse plan
              SET optimizer_prune_level = 0;

              EXPLAIN
              SELECT count FROM t1, t2, t3 AS t3_alias
              WHERE b <= 236
              AND EXISTS ( SELECT 1 FROM t3 WHERE c > t3_alias.c )
              ORDER BY a LIMIT 1;

              Show
              timour Timour Katchaounov added a comment - This is not a bug, but rather a deficiency of the current optimizer that is completely unrelated to mdev-193. It is a regression only with respect to MariaDB 5.5/5.3. Detailed explanation follows. 1. What happens in mdev-193 The patch for mdev-193 makes MariaDB 5.5 behave like any MySQL version, or MariaDB 5.2, namely - it allows constant subqueries to be evaluated during optimization, and the resulting value to be used by the optimizer. Specifically in this test case, the subquery (SELECT SUM(a) FROM t1) is evaluated by the range optimizer, and the final query plan uses "range" access to table "t2". If we set "optimizer_prune_level = 0" (that is, not pruning, consider all plans), the optimizer determines that the optimal query plan is: ----- ------------------ -------- ----- ------------- ---- ------- ---- ---- -------- -------------------------------------------------------------------- id select_type table type possible_keys key key_len ref rows filtered Extra ----- ------------------ -------- ----- ------------- ---- ------- ---- ---- -------- -------------------------------------------------------------------- 1 PRIMARY t1 index NULL a 5 NULL 40 100.00 Using index 1 PRIMARY t2 range b b 5 NULL 96 100.00 Using where; Using index; Using join buffer (flat, BNL join) 1 PRIMARY t3_alias index NULL c 5 NULL 40 100.00 Using where; Using index; Using join buffer (incremental, BNL join) 3 DEPENDENT SUBQUERY t3 index c c 5 NULL 40 100.00 Using where; Using index 2 SUBQUERY t1 index NULL a 5 NULL 40 100.00 Using index ----- ------------------ -------- ----- ------------- ---- ------- ---- ---- -------- -------------------------------------------------------------------- join->best_read = 32396 If we set "optimizer_prune_level = 1" (that is prune some plans), the optimizer prunes the above plan, and finds a worse plan with respect to its cost model: ----- ------------------ -------- ----- ------------- ---- ------- ---- ---- -------- -------------------------------------------------------------------- id select_type table type possible_keys key key_len ref rows filtered Extra ----- ------------------ -------- ----- ------------- ---- ------- ---- ---- -------- -------------------------------------------------------------------- 1 PRIMARY t1 index NULL a 5 NULL 40 100.00 Using index 1 PRIMARY t3_alias index NULL c 5 NULL 40 100.00 Using where; Using index; Using join buffer (flat, BNL join) 1 PRIMARY t2 range b b 5 NULL 96 100.00 Using where; Using index; Using join buffer (incremental, BNL join) 3 DEPENDENT SUBQUERY t3 index c c 5 NULL 40 100.00 Using where; Using index 2 SUBQUERY t1 index NULL a 5 NULL 40 100.00 Using index ----- ------------------ -------- ----- ------------- ---- ------- ---- ---- -------- -------------------------------------------------------------------- join->best_read = 66869 (~ two times higher cost) However, the second plan in reality is much faster. The reason for this is that table "t3_alias" appears in second place in the plan instead of in third, as a result the correlated EXISTS subquery is executed fewer times (1531 times). The first "optimal" plan puts table "t3_alias" in third place in the join, which results in the EXISTS subquery being executed 152011 times. At the same time the result of the constant subquery is cached, so it doesn't matter much how many times it is evaluated. 2. Exposing the same problem in the main 5.5 branch If we substitute manually the aggregate subquery with its constant value, we can reproduce the problem in 5.5 via the following test case: – better plan SET optimizer_prune_level = 1; EXPLAIN SELECT count FROM t1, t2, t3 AS t3_alias WHERE b <= 236 AND EXISTS ( SELECT 1 FROM t3 WHERE c > t3_alias.c ) ORDER BY a LIMIT 1; – worse plan SET optimizer_prune_level = 0; EXPLAIN SELECT count FROM t1, t2, t3 AS t3_alias WHERE b <= 236 AND EXISTS ( SELECT 1 FROM t3 WHERE c > t3_alias.c ) ORDER BY a LIMIT 1;
              Hide
              timour Timour Katchaounov added a comment -

              As explained in the last comment this is not a bug, but a deficiency of the current optimizer.
              This problem was filed as a new task MDEV-338.

              Show
              timour Timour Katchaounov added a comment - As explained in the last comment this is not a bug, but a deficiency of the current optimizer. This problem was filed as a new task MDEV-338 .

                People

                • Assignee:
                  timour Timour Katchaounov
                  Reporter:
                  elenst Elena Stepanova
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  2 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Time Tracking

                    Estimated:
                    Original Estimate - Not Specified
                    Not Specified
                    Remaining:
                    Remaining Estimate - 0 minutes
                    0m
                    Logged:
                    Time Spent - 5 hours
                    5h