Details
Description
Task setting
MDEV-83 decides which is the best join_tab where a subquery should be
pushed in order to reduce the number of times the subquery is evaluated.
This is done statically, at optimization time. This approach however
may result in performance regressions. When a subquery predicate is
sufficiently selective, it may turn out that its high selectivity may
reduce the partial join result sizes of subsequent joins. This reduction
may outweigh the reduced cost of subquery executions.
Since it is currently impossible to estimate the selectivity of a
subquery predicate, it is not possible to pick at optimization time
the right place in the join order such that it takes into account both
the cost and the selectivity of the subquery predicate, and the number
of times it will be evaluated.
This task will implement a way to decide dynamically at query execution
time where in the join plan to evaluate a subquery predicate. The decision
will be based on measuring at runtime what is the predicate selectivity
based on some sample (% of result rows).
What exactly can be moved
We can't move the subquery predicates themselves. For example, in Q20 the WHERE clause has form
ps_availqty > (select sum(l_quantity) ...) AND ...
Apparently, it is not possible to move the subquery, we can only move the whole "ps_availqty > (...)" condition.
Making this condition formal: A non-trivial WHERE clause has form of
cond1 AND cond2 AND ... AND condN
Here, individual cond_i (we call them "AND-parts of WHERE") can be attached from one join_tab or another.
Conclusion: we should wrap/move AND-parts of WHERE. AND-parts of WHERE can be arbitrary items (Item_func_xxx, etc), but we will consider moving only those items that have one or more subquery items within them.
How to achieve movable conditions
We will need to have certain conditions "jump" from one join_tab to another (possibly multiple times) in course of query execution. We intend to achieve it as follows:
- The movable condition is wrapped inside a trigger condition (an Item_func_trig_cond object (or a similar object))
- The same Item_func_trig_cond object is attached to each join_tab where we could potentially put it.
Switching condition on/off is to be done as follows:
- JOIN object will have a JOIN::curent_join_tab which is a pointer to the join_tab that we're currently evaluating condition for.
- Movable predicate will have a "JOIN_TAB *active_join_tab" member which will say where the condition should be attached
- Item_func_trig_cond (or its knock-off class) will do:
if (active_join_tab == join->current_join_tab)
return wrapped_item->val_int();
else
return 1;
Thus, there will be no "moving" of predicates during execution.
When to wrap with triggers dynamic pushdown conditions
It can be done after make_join_select() call. The procedure is as follows:
for ($T = each join_tab in query plan)
{
for ($I= each AND-part of $T->select_cond) // note: we scan top-level AND only, w/o recursion
{
if ($I->with_subquery && $I->is_expensive && !$I->is_constant)
{
// Make $I movable
$WI= Item_func_trig_cond2($I);
}
}
}
TODO: take care of semi-join nests and outer joins.
When and under what conditions to perform dynamic pushdown
- Predicate selectivity should be estimated based on some number of predicate
executions S. This number S may be reached much before the query produced
even one result row. Since some of the latest partial joins may not be executed,
there may be no dynamic statistics for these partial joins. In this case we will
use the partial join size estimates produced by the optimizer. - TODO:
- What is the exact condition under which the dynamic pushdown condition is moved?
- Care should be taken in the future when a query plan employes blocked join algorithms.
If a subquery predicate is moved after a blocked join in the plan, and it turns out
the move was based on incorrect selectivity estimates, the whole pass of filling the
join buffers will not be able to use the moved condition. If in fact the condition
was selective the join buffer will have to process a lot more rows that it would if
the condition was attached earlier in the JOIN plan. With blocked joins, moving a
condition in the plan has effect only for the subsequent block refill, but not for
the current.
The solution is to make blocked execution to cooperate with dynamic condition pushdown.
Initially run the join with very small block sizes to collect statistics. Once there
is sufficient statistics, move the predicate accordingly, and continue join execution
with its normal block size.
EXPLAIN output
The dynamic pushdown triggers will be created before EXPLAIN prints the query plan.
Based on static optimizer analysis, one of the triggers will be chosen as the best
starting point. EXPLAIN will skip all other triggers except the active one.
For the active one EXPLAIN EXTENDED will print instead of "Using where" - "Dynamic where".
Future work will implement printout of the actual predicate pushdown state during query
execution for SHOW EXPLAIN.
Implementation details
The selectivities and costs needed to decide where in the join plan to evaluate a subquery are:
- Add the following predicate counters to the trigger condition that wraps the moveable AND-part:
- the total number of subquery executions
- the number of times subquery was TRUE
- the number of times the whole pushdown predicate was executed for the JOIN_TAB it is currently attached
- the number of times the whole pushdown predicate attached to a particular join returned TRUE
The above counters allow to compute the selectivity of the subquery, and the whole pushdown predicate.
- add counters to count the sizes of the intermediate join results (one counter per JOIN_TAB)
- compute the actual number of times the subquery will be executed, thus its total cost
The decision where to execute the subquery will be done based on the selectivity of the subquery, and the actual cardinality of the partial joins.
Gliffy Diagrams
Attachments
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
It seems,
Another concerns are