Details
-
Type:
Bug
-
Status: Closed
-
Priority:
Major
-
Resolution: Won't Fix
-
Affects Version/s: None
-
Fix Version/s: 5.5.27
-
Component/s: None
-
Labels:None
Description
The following query
SELECT MAX(al1.a)
FROM t1 AS al1, t1 AS al2, t1 AS al3, t1 AS al4
WHERE al4.a = al3.a
AND (
EXISTS ( SELECT 1 FROM t1 )
OR al1.a > al2.a
)
takes several times longer on MDEV-193 tree revno 3407 comparing to maria/5.5 tree 3426.
Reproducible with the default optimizer_switch as well as all OFF values except for in_to_exists required to execute the query.
Reproducible with MyISAM (~3 times longer, 30 vs 10 sec on my machine), Aria and InnoDB (~6 times longer, 60 vs 10 sec).
EXPLAIN on MDEV-193 (with the default optimizer_switch):
id select_type table type possible_keys key key_len ref rows filtered Extra 1 PRIMARY al1 index NULL a 5 NULL 90 100.00 Using index 1 PRIMARY al2 index NULL a 5 NULL 90 100.00 Using index; Using join buffer (flat, BNL join) 1 PRIMARY al3 index a a 5 NULL 90 100.00 Using where; Using index; Using join buffer (incremental, BNL join) 1 PRIMARY al4 ref a a 5 test.al3.a 10 100.00 Using index 2 SUBQUERY t1 index NULL a 5 NULL 90 100.00 Using index Warnings: Note 1003 select max(`test`.`al1`.`a`) AS `MAX(al1.a)` from `test`.`t1` `al1` join `test`.`t1` `al2` join `test`.`t1` `al3` join `test`.`t1` `al4` where (`test`.`al4`.`a` = `test`.`al3`.`a`)
EXPLAIN on maria/5.5 (with the default optimizer_switch):
id select_type table type possible_keys key key_len ref rows filtered Extra 1 PRIMARY al3 index a a 5 NULL 90 100.00 Using where; Using index 1 PRIMARY al4 ref a a 5 test.al3.a 10 100.00 Using index 1 PRIMARY al1 index a a 5 NULL 90 100.00 Using index; Using join buffer (flat, BNL join) 1 PRIMARY al2 index a a 5 NULL 90 100.00 Using where; Using index; Using join buffer (incremental, BNL join) 2 SUBQUERY t1 index NULL a 5 NULL 90 100.00 Using index Warnings: Note 1003 select max(`test`.`al1`.`a`) AS `MAX(al1.a)` from `test`.`t1` `al1` join `test`.`t1` `al2` join `test`.`t1` `al3` join `test`.`t1` `al4` where ((`test`.`al4`.`a` = `test`.`al3`.`a`) and (exists(select 1 from `test`.`t1`) or (`test`.`al1`.`a` > `test`.`al2`.`a`)))
Test case:
SET optimizer_switch = 'in_to_exists=on';
CREATE TABLE t1 (a INT, KEY(a));
INSERT INTO t1 VALUES
(7),(5),(1),(204),(224),(9),(5),(0),(3);
INSERT INTO t1 SELECT al1.* FROM t1 al1, t1 al2;
SELECT MAX(al1.a)
FROM t1 AS al1, t1 AS al2, t1 AS al3, t1 AS al4
WHERE al4.a = al3.a
AND (
EXISTS ( SELECT 1 FROM t1 )
OR al1.a > al2.a
);
Gliffy Diagrams
Attachments
Issue Links
- relates to
-
MDEV-193 LP:944706 - Query with impossible or constant subquery in WHERE or HAVING is not precomputed and thus not part of optimization
-
- Closed
-
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
This is a generic join optimizer problem that is completely unrelated to mdev-193.
The relationship with mdev-193 is that it makes it possible to control whether some
constant predicate is expensive or not.
MDEV-193implements this control forsubquery predicates. When a constant subquery predicate is considered expensive,
then it cannot be used in constant optimization during the rewrite optimization phase.
In the example above, if the predicate is considered non-expensive, then it is evaluated
to TRUE, and the whole disjunctive condition is optimized away. The where clause is
thus reduced to "al4.a = al3.a".
If one runs the simplified query directly:
SELECT MAX(al1.a) FROM t1 AS al1, t1 AS al2, t1 AS al3, t1 AS al4 WHERE al4.a = al3.a;
It executes equally slowly. This can also be verified with mysql-trunk (both with the simplified
query and the original query from the test case).
The reason for the slow execution is the join order chosen by the optimizer:
-----
--------------------------------------------------------------------------------------------------------------------------------+-----
--------------------------------------------------------------------------------------------------------------------------------+-----
--------------------------------------------------------------------------------------------------------------------------------+This order coincides with the order of the tables in the FROM clause. The reason why this plan is slow, is because
the only join condition is attached to last two tables in the plan. This means that the condition is executed as many
times as the size of the cross-product of the first three tables (90^3 times).
If we reorder the tables in the FROM clause (or use STRAIGHT_JOIN) as follows:
MariaDB [md321]> explain SELECT MAX(al1.a) FROM t1 AS al4, t1 AS al3, t1 AS al2, t1 AS al1 WHERE al4.a = al3.a;
-----
-------------------------------------------------------------------------------------------------------------------+-----
-------------------------------------------------------------------------------------------------------------------+-----
-------------------------------------------------------------------------------------------------------------------+This plan takes ~4 sec instead of ~50 sec of the previous plan.
If we run the original test case with the correct table order, the plan is also very fast ~9 sec.
Finally, if the test case query is run with 5.5 main, or with mdev-193 and set expensive_subquery_limit=0,
execution is fast again because the optimizer keeps all join conditions. As a result, the optimizer chooses
the optimal join order 3-4-1-2, and the selective condition "al4.a = al3.a" is evaluated with the first join in the
plan.
In summary, mdev-193 allows the optimizer to correctly simplify the WHERE clause, but it exposes a
weakness in the JOIN optimizer. Most likely the problem appears when all tables are of the same
(or similar) size.