Poor optimization of JOIN and ORDER BY ... LIMIT

Description

A long standing (and informally known) issue:

Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

Example:

1 2 3 4 5 6 7 select * from t_fact join dim1 on t_fact.dim1_id= dim1.dim1_id join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000;
1 2 3 4 5 6 7 +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ | 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort | | 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | | | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+

This uses filesort and takes ~8 sec.
Now, let's force the right join order:

1 2 3 4 5 6 7 select * from t_fact straight_join dim1 on t_fact.dim1_id= dim1.dim1_id straight_join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000;
1 2 3 4 5 6 7 +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ | 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | | | 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | | | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+

This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

Dataset:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table one_k(a int); insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; create table t_fact ( fact_id int not null, dim1_id int not null, dim2_id int not null, col1 int not null, primary key(fact_id), key(dim1_id), key(dim2_id), key(col1) ); insert into t_fact select A.a+1000*B.a+1000*1000*C.a, A.a, B.a, A.a+1000*B.a+1000*1000*C.a from one_k A , one_k B, ten C where A.a<500 and B.a<500 ; create table dim1 ( dim1_id int not null primary key, col1 int ); insert into dim1 select a,a from one_k where a<500; create table dim2 ( dim2_id int not null primary key, col1 int ); insert into dim2 select a,a from one_k where a<500;

Environment

None

Status

Assignee

Sergei Petrunia

Reporter

Sergei Petrunia

External issue ID

None

External issue ID

None

Components

Fix versions

Affects versions

10.0.19

Priority

Major