Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-321

CHEAP SQ: A query with inner joins and EXISTS subquery takes several times longer on MDEV-193 tree than on 5.5 main tree

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: 5.5.27
    • Component/s: None
    • Labels:
      None

      Description

      The following query

      SELECT MAX(al1.a) 
      FROM t1 AS al1, t1 AS al2, t1 AS al3, t1 AS al4
      WHERE al4.a = al3.a 
        AND (
          EXISTS ( SELECT 1 FROM t1 )
          OR al1.a > al2.a
        )
      

      takes several times longer on MDEV-193 tree revno 3407 comparing to maria/5.5 tree 3426.

      Reproducible with the default optimizer_switch as well as all OFF values except for in_to_exists required to execute the query.
      Reproducible with MyISAM (~3 times longer, 30 vs 10 sec on my machine), Aria and InnoDB (~6 times longer, 60 vs 10 sec).

      EXPLAIN on MDEV-193 (with the default optimizer_switch):

      id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
      1	PRIMARY	al1	index	NULL	a	5	NULL	90	100.00	Using index
      1	PRIMARY	al2	index	NULL	a	5	NULL	90	100.00	Using index; Using join buffer (flat, BNL join)
      1	PRIMARY	al3	index	a	a	5	NULL	90	100.00	Using where; Using index; Using join buffer (incremental, BNL join)
      1	PRIMARY	al4	ref	a	a	5	test.al3.a	10	100.00	Using index
      2	SUBQUERY	t1	index	NULL	a	5	NULL	90	100.00	Using index
      Warnings:
      Note	1003	select max(`test`.`al1`.`a`) AS `MAX(al1.a)` from `test`.`t1` `al1` join `test`.`t1` `al2` join `test`.`t1` `al3` join `test`.`t1` `al4` where (`test`.`al4`.`a` = `test`.`al3`.`a`)
      

      EXPLAIN on maria/5.5 (with the default optimizer_switch):

      id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
      1	PRIMARY	al3	index	a	a	5	NULL	90	100.00	Using where; Using index
      1	PRIMARY	al4	ref	a	a	5	test.al3.a	10	100.00	Using index
      1	PRIMARY	al1	index	a	a	5	NULL	90	100.00	Using index; Using join buffer (flat, BNL join)
      1	PRIMARY	al2	index	a	a	5	NULL	90	100.00	Using where; Using index; Using join buffer (incremental, BNL join)
      2	SUBQUERY	t1	index	NULL	a	5	NULL	90	100.00	Using index
      Warnings:
      Note	1003	select max(`test`.`al1`.`a`) AS `MAX(al1.a)` from `test`.`t1` `al1` join `test`.`t1` `al2` join `test`.`t1` `al3` join `test`.`t1` `al4` where ((`test`.`al4`.`a` = `test`.`al3`.`a`) and (exists(select 1 from `test`.`t1`) or (`test`.`al1`.`a` > `test`.`al2`.`a`)))
      

      Test case:

      SET optimizer_switch = 'in_to_exists=on';
      
      CREATE TABLE t1 (a INT, KEY(a));
      INSERT INTO t1 VALUES
      (7),(5),(1),(204),(224),(9),(5),(0),(3);
      
      INSERT INTO t1 SELECT al1.* FROM t1 al1, t1 al2;
      
      SELECT MAX(al1.a) 
      FROM t1 AS al1, t1 AS al2, t1 AS al3, t1 AS al4
      WHERE al4.a = al3.a 
        AND (
          EXISTS ( SELECT 1 FROM t1 )
          OR al1.a > al2.a
        );
      

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              timour Timour Katchaounov added a comment -

              This is a generic join optimizer problem that is completely unrelated to mdev-193.
              The relationship with mdev-193 is that it makes it possible to control whether some
              constant predicate is expensive or not. MDEV-193 implements this control for
              subquery predicates. When a constant subquery predicate is considered expensive,
              then it cannot be used in constant optimization during the rewrite optimization phase.

              In the example above, if the predicate is considered non-expensive, then it is evaluated
              to TRUE, and the whole disjunctive condition is optimized away. The where clause is
              thus reduced to "al4.a = al3.a".

              If one runs the simplified query directly:

              SELECT MAX(al1.a) FROM t1 AS al1, t1 AS al2, t1 AS al3, t1 AS al4 WHERE al4.a = al3.a;

              It executes equally slowly. This can also be verified with mysql-trunk (both with the simplified
              query and the original query from the test case).

              The reason for the slow execution is the join order chosen by the optimizer:

              -------------------------------------------------------------------------------------------------------------------------------------+

              id select_type table type possible_keys key key_len ref rows Extra

              -------------------------------------------------------------------------------------------------------------------------------------+

              1 SIMPLE al1 index NULL a 5 NULL 90 Using index
              1 SIMPLE al2 index NULL a 5 NULL 90 Using index; Using join buffer (flat, BNL join)
              1 SIMPLE al3 index a a 5 NULL 90 Using where; Using index; Using join buffer (incremental, BNL join)
              1 SIMPLE al4 ref a a 5 md321.al3.a 5 Using index

              -------------------------------------------------------------------------------------------------------------------------------------+

              This order coincides with the order of the tables in the FROM clause. The reason why this plan is slow, is because
              the only join condition is attached to last two tables in the plan. This means that the condition is executed as many
              times as the size of the cross-product of the first three tables (90^3 times).

              If we reorder the tables in the FROM clause (or use STRAIGHT_JOIN) as follows:

              MariaDB [md321]> explain SELECT MAX(al1.a) FROM t1 AS al4, t1 AS al3, t1 AS al2, t1 AS al1 WHERE al4.a = al3.a;
              ------------------------------------------------------------------------------------------------------------------------+

              id select_type table type possible_keys key key_len ref rows Extra

              ------------------------------------------------------------------------------------------------------------------------+

              1 SIMPLE al4 index a a 5 NULL 90 Using where; Using index
              1 SIMPLE al3 ref a a 5 md321.al4.a 5 Using index
              1 SIMPLE al2 index NULL a 5 NULL 90 Using index; Using join buffer (flat, BNL join)
              1 SIMPLE al1 index NULL a 5 NULL 90 Using index; Using join buffer (incremental, BNL join)

              ------------------------------------------------------------------------------------------------------------------------+

              This plan takes ~4 sec instead of ~50 sec of the previous plan.

              If we run the original test case with the correct table order, the plan is also very fast ~9 sec.

              Finally, if the test case query is run with 5.5 main, or with mdev-193 and set expensive_subquery_limit=0,
              execution is fast again because the optimizer keeps all join conditions. As a result, the optimizer chooses
              the optimal join order 3-4-1-2, and the selective condition "al4.a = al3.a" is evaluated with the first join in the
              plan.

              In summary, mdev-193 allows the optimizer to correctly simplify the WHERE clause, but it exposes a
              weakness in the JOIN optimizer. Most likely the problem appears when all tables are of the same
              (or similar) size.

              Show
              timour Timour Katchaounov added a comment - This is a generic join optimizer problem that is completely unrelated to mdev-193. The relationship with mdev-193 is that it makes it possible to control whether some constant predicate is expensive or not. MDEV-193 implements this control for subquery predicates. When a constant subquery predicate is considered expensive, then it cannot be used in constant optimization during the rewrite optimization phase. In the example above, if the predicate is considered non-expensive, then it is evaluated to TRUE, and the whole disjunctive condition is optimized away. The where clause is thus reduced to "al4.a = al3.a". If one runs the simplified query directly: SELECT MAX(al1.a) FROM t1 AS al1, t1 AS al2, t1 AS al3, t1 AS al4 WHERE al4.a = al3.a; It executes equally slowly. This can also be verified with mysql-trunk (both with the simplified query and the original query from the test case). The reason for the slow execution is the join order chosen by the optimizer: ----- ----------- ----- ----- ------------- ---- ------- ----------- ---- --------------------------------------------------------------------+ id select_type table type possible_keys key key_len ref rows Extra ----- ----------- ----- ----- ------------- ---- ------- ----------- ---- --------------------------------------------------------------------+ 1 SIMPLE al1 index NULL a 5 NULL 90 Using index 1 SIMPLE al2 index NULL a 5 NULL 90 Using index; Using join buffer (flat, BNL join) 1 SIMPLE al3 index a a 5 NULL 90 Using where; Using index; Using join buffer (incremental, BNL join) 1 SIMPLE al4 ref a a 5 md321.al3.a 5 Using index ----- ----------- ----- ----- ------------- ---- ------- ----------- ---- --------------------------------------------------------------------+ This order coincides with the order of the tables in the FROM clause. The reason why this plan is slow, is because the only join condition is attached to last two tables in the plan. This means that the condition is executed as many times as the size of the cross-product of the first three tables (90^3 times). If we reorder the tables in the FROM clause (or use STRAIGHT_JOIN) as follows: MariaDB [md321] > explain SELECT MAX(al1.a) FROM t1 AS al4, t1 AS al3, t1 AS al2, t1 AS al1 WHERE al4.a = al3.a; ----- ----------- ----- ----- ------------- ---- ------- ----------- ---- -------------------------------------------------------+ id select_type table type possible_keys key key_len ref rows Extra ----- ----------- ----- ----- ------------- ---- ------- ----------- ---- -------------------------------------------------------+ 1 SIMPLE al4 index a a 5 NULL 90 Using where; Using index 1 SIMPLE al3 ref a a 5 md321.al4.a 5 Using index 1 SIMPLE al2 index NULL a 5 NULL 90 Using index; Using join buffer (flat, BNL join) 1 SIMPLE al1 index NULL a 5 NULL 90 Using index; Using join buffer (incremental, BNL join) ----- ----------- ----- ----- ------------- ---- ------- ----------- ---- -------------------------------------------------------+ This plan takes ~4 sec instead of ~50 sec of the previous plan. If we run the original test case with the correct table order, the plan is also very fast ~9 sec. Finally, if the test case query is run with 5.5 main, or with mdev-193 and set expensive_subquery_limit=0, execution is fast again because the optimizer keeps all join conditions. As a result, the optimizer chooses the optimal join order 3-4-1-2, and the selective condition "al4.a = al3.a" is evaluated with the first join in the plan. In summary, mdev-193 allows the optimizer to correctly simplify the WHERE clause, but it exposes a weakness in the JOIN optimizer. Most likely the problem appears when all tables are of the same (or similar) size.
              Hide
              igor Igor Babaev added a comment - - edited

              Timour,

              If it exposes a weakness of the join optimizer then report a new bug.

              From your analysis it's not clear why the cost of the first plan is less than the cost of the second plan.
              The first plan performs (90^3) look-ups while the second only 90.
              We need the cost estimates calculated by the optimizer (including the estimates for the partial plans)
              to see the its reasons.

              Show
              igor Igor Babaev added a comment - - edited Timour, If it exposes a weakness of the join optimizer then report a new bug. From your analysis it's not clear why the cost of the first plan is less than the cost of the second plan. The first plan performs (90^3) look-ups while the second only 90. We need the cost estimates calculated by the optimizer (including the estimates for the partial plans) to see the its reasons.
              Hide
              igor Igor Babaev added a comment - - edited

              Here's what I had in the latest MariaDB 5.5:

              MariaDB [test]> explain SELECT MAX(al1.a) FROM t1 AS al4, t1 AS al3, t1 AS al2, t1 AS al1 WHERE al4.a = al3.a;
              -----------------------------------------------------------------------------------------------------------------------+

              id select_type table type possible_keys key key_len ref rows Extra

              -----------------------------------------------------------------------------------------------------------------------+

              1 SIMPLE al4 index a a 5 NULL 90 Using where; Using index
              1 SIMPLE al3 ref a a 5 test.al4.a 5 Using index
              1 SIMPLE al2 index NULL a 5 NULL 90 Using index; Using join buffer (flat, BNL join)
              1 SIMPLE al1 index NULL a 5 NULL 90 Using index; Using join buffer (incremental, BNL join)

              -----------------------------------------------------------------------------------------------------------------------+
              4 rows in set (0.00 sec)

              MariaDB [test]> SHOW SESSION STATUS LIKE 'Last_query_cost';
              ------------------------------+

              Variable_name Value

              ------------------------------+

              Last_query_cost 737304.481750

              ------------------------------+
              1 row in set (0.00 sec)

              MariaDB [test]> EXPLAIN SELECT MAX(al1.a) FROM t1 AS al1, t1 AS al2, t1 AS al3, t1 AS al4 WHERE al4.a = al3.a;
              ------------------------------------------------------------------------------------------------------------------------------------+

              id select_type table type possible_keys key key_len ref rows Extra

              ------------------------------------------------------------------------------------------------------------------------------------+

              1 SIMPLE al1 index NULL a 5 NULL 90 Using index
              1 SIMPLE al2 index NULL a 5 NULL 90 Using index; Using join buffer (flat, BNL join)
              1 SIMPLE al3 index a a 5 NULL 90 Using where; Using index; Using join buffer (incremental, BNL join)
              1 SIMPLE al4 ref a a 5 test.al3.a 5 Using index

              ------------------------------------------------------------------------------------------------------------------------------------+
              4 rows in set (0.00 sec)

              MariaDB [test]> SHOW SESSION STATUS LIKE 'Last_query_cost';
              -------------------------------+

              Variable_name Value

              -------------------------------+

              Last_query_cost 1609351.275728

              -------------------------------+
              1 row in set (0.01 sec)

              MariaDB [test]> set optimizer_prune_level=1;
              Query OK, 0 rows affected (0.00 sec)

              MariaDB [test]> EXPLAIN SELECT MAX(al1.a) FROM t1 AS al1, t1 AS al2, t1 AS al3, t1 AS al4 WHERE al4.a = al3.a;
              ---------------------------------------------------------------------------------------------------------------------------------+

              id select_type table type possible_keys key key_len ref rows Extra

              ---------------------------------------------------------------------------------------------------------------------------------+

              1 SIMPLE al1 index NULL a 5 NULL 90 Using index
              1 SIMPLE al2 index NULL a 5 NULL 90 Using index; Using join buffer (f, BNL join)
              1 SIMPLE al3 index a a 5 NULL 90 Using where; Using index; Using j buffer (incremental, BNL join)
              1 SIMPLE al4 ref a a 5 test.al3.a 5 Using index

              ---------------------------------------------------------------------------------------------------------------------------------+
              4 rows in set (0.00 sec)

              MariaDB [test]> SHOW SESSION STATUS LIKE 'Last_query_cost';
              -------------------------------+

              Variable_name Value

              -------------------------------+

              Last_query_cost 1609351.275728

              -------------------------------+
              1 row in set (0.00 sec)

              Show
              igor Igor Babaev added a comment - - edited Here's what I had in the latest MariaDB 5.5: MariaDB [test] > explain SELECT MAX(al1.a) FROM t1 AS al4, t1 AS al3, t1 AS al2, t1 AS al1 WHERE al4.a = al3.a; ----- ----------- ----- ----- ------------- ---- ------- ---------- ---- -------------------------------------------------------+ id select_type table type possible_keys key key_len ref rows Extra ----- ----------- ----- ----- ------------- ---- ------- ---------- ---- -------------------------------------------------------+ 1 SIMPLE al4 index a a 5 NULL 90 Using where; Using index 1 SIMPLE al3 ref a a 5 test.al4.a 5 Using index 1 SIMPLE al2 index NULL a 5 NULL 90 Using index; Using join buffer (flat, BNL join) 1 SIMPLE al1 index NULL a 5 NULL 90 Using index; Using join buffer (incremental, BNL join) ----- ----------- ----- ----- ------------- ---- ------- ---------- ---- -------------------------------------------------------+ 4 rows in set (0.00 sec) MariaDB [test] > SHOW SESSION STATUS LIKE 'Last_query_cost'; ---------------- --------------+ Variable_name Value ---------------- --------------+ Last_query_cost 737304.481750 ---------------- --------------+ 1 row in set (0.00 sec) MariaDB [test] > EXPLAIN SELECT MAX(al1.a) FROM t1 AS al1, t1 AS al2, t1 AS al3, t1 AS al4 WHERE al4.a = al3.a; ----- ----------- ----- ----- ------------- ---- ------- ---------- ---- --------------------------------------------------------------------+ id select_type table type possible_keys key key_len ref rows Extra ----- ----------- ----- ----- ------------- ---- ------- ---------- ---- --------------------------------------------------------------------+ 1 SIMPLE al1 index NULL a 5 NULL 90 Using index 1 SIMPLE al2 index NULL a 5 NULL 90 Using index; Using join buffer (flat, BNL join) 1 SIMPLE al3 index a a 5 NULL 90 Using where; Using index; Using join buffer (incremental, BNL join) 1 SIMPLE al4 ref a a 5 test.al3.a 5 Using index ----- ----------- ----- ----- ------------- ---- ------- ---------- ---- --------------------------------------------------------------------+ 4 rows in set (0.00 sec) MariaDB [test] > SHOW SESSION STATUS LIKE 'Last_query_cost'; ---------------- ---------------+ Variable_name Value ---------------- ---------------+ Last_query_cost 1609351.275728 ---------------- ---------------+ 1 row in set (0.01 sec) MariaDB [test] > set optimizer_prune_level=1; Query OK, 0 rows affected (0.00 sec) MariaDB [test] > EXPLAIN SELECT MAX(al1.a) FROM t1 AS al1, t1 AS al2, t1 AS al3, t1 AS al4 WHERE al4.a = al3.a; ----- ----------- ----- ----- ------------- ---- ------- ---------- ---- -----------------------------------------------------------------+ id select_type table type possible_keys key key_len ref rows Extra ----- ----------- ----- ----- ------------- ---- ------- ---------- ---- -----------------------------------------------------------------+ 1 SIMPLE al1 index NULL a 5 NULL 90 Using index 1 SIMPLE al2 index NULL a 5 NULL 90 Using index; Using join buffer (f, BNL join) 1 SIMPLE al3 index a a 5 NULL 90 Using where; Using index; Using j buffer (incremental, BNL join) 1 SIMPLE al4 ref a a 5 test.al3.a 5 Using index ----- ----------- ----- ----- ------------- ---- ------- ---------- ---- -----------------------------------------------------------------+ 4 rows in set (0.00 sec) MariaDB [test] > SHOW SESSION STATUS LIKE 'Last_query_cost'; ---------------- ---------------+ Variable_name Value ---------------- ---------------+ Last_query_cost 1609351.275728 ---------------- ---------------+ 1 row in set (0.00 sec)
              Hide
              igor Igor Babaev added a comment -

              The problem is repeatable in MariaDB 5.2.

              Show
              igor Igor Babaev added a comment - The problem is repeatable in MariaDB 5.2.
              Hide
              timour Timour Katchaounov added a comment -

              As said above, this problem is unrelated to mdev-193.
              I filed the problem as bug lp:1010395 (https://bugs.launchpad.net/maria/+bug/1010395).

              Show
              timour Timour Katchaounov added a comment - As said above, this problem is unrelated to mdev-193. I filed the problem as bug lp:1010395 ( https://bugs.launchpad.net/maria/+bug/1010395 ).

                People

                • Assignee:
                  timour Timour Katchaounov
                  Reporter:
                  elenst Elena Stepanova
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  3 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Time Tracking

                    Estimated:
                    Original Estimate - Not Specified
                    Not Specified
                    Remaining:
                    Remaining Estimate - 0 minutes
                    0m
                    Logged:
                    Time Spent - 5 hours
                    5h