Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4170

Dynamic pushdown of subquery predicates

    Details

    • Type: Task
    • Status: Stalled
    • Priority: Major
    • Resolution: Unresolved
    • Fix Version/s: 10.2
    • Component/s: None
    • Labels:

      Description

      Task setting

      MDEV-83 decides which is the best join_tab where a subquery should be
      pushed in order to reduce the number of times the subquery is evaluated.
      This is done statically, at optimization time. This approach however
      may result in performance regressions. When a subquery predicate is
      sufficiently selective, it may turn out that its high selectivity may
      reduce the partial join result sizes of subsequent joins. This reduction
      may outweigh the reduced cost of subquery executions.

      Since it is currently impossible to estimate the selectivity of a
      subquery predicate, it is not possible to pick at optimization time
      the right place in the join order such that it takes into account both
      the cost and the selectivity of the subquery predicate, and the number
      of times it will be evaluated.

      This task will implement a way to decide dynamically at query execution
      time where in the join plan to evaluate a subquery predicate. The decision
      will be based on measuring at runtime what is the predicate selectivity
      based on some sample (% of result rows).

      What exactly can be moved

      We can't move the subquery predicates themselves. For example, in Q20 the WHERE clause has form

      ps_availqty > (select sum(l_quantity) ...) AND ...
      

      Apparently, it is not possible to move the subquery, we can only move the whole "ps_availqty > (...)" condition.

      Making this condition formal: A non-trivial WHERE clause has form of

      cond1 AND cond2 AND ... AND condN
      

      Here, individual cond_i (we call them "AND-parts of WHERE") can be attached from one join_tab or another.

      Conclusion: we should wrap/move AND-parts of WHERE. AND-parts of WHERE can be arbitrary items (Item_func_xxx, etc), but we will consider moving only those items that have one or more subquery items within them.

      How to achieve movable conditions

      We will need to have certain conditions "jump" from one join_tab to another (possibly multiple times) in course of query execution. We intend to achieve it as follows:

      • The movable condition is wrapped inside a trigger condition (an Item_func_trig_cond object (or a similar object))
      • The same Item_func_trig_cond object is attached to each join_tab where we could potentially put it.

      Switching condition on/off is to be done as follows:

      • JOIN object will have a JOIN::curent_join_tab which is a pointer to the join_tab that we're currently evaluating condition for.
      • Movable predicate will have a "JOIN_TAB *active_join_tab" member which will say where the condition should be attached
      • Item_func_trig_cond (or its knock-off class) will do:
        if (active_join_tab == join->current_join_tab)
           return wrapped_item->val_int();
        else
           return 1;
      

      Thus, there will be no "moving" of predicates during execution.

      When to wrap with triggers dynamic pushdown conditions

      It can be done after make_join_select() call. The procedure is as follows:

      for ($T = each join_tab in query plan)
      {
        for ($I= each AND-part of $T->select_cond)  // note: we scan top-level AND only, w/o recursion
        {
          if ($I->with_subquery && $I->is_expensive && !$I->is_constant)
          {
             // Make $I movable
             $WI= Item_func_trig_cond2($I);
          }
        }
      }
      

      TODO: take care of semi-join nests and outer joins.

      When and under what conditions to perform dynamic pushdown

      • Predicate selectivity should be estimated based on some number of predicate
        executions S. This number S may be reached much before the query produced
        even one result row. Since some of the latest partial joins may not be executed,
        there may be no dynamic statistics for these partial joins. In this case we will
        use the partial join size estimates produced by the optimizer.
      • TODO:
        • What is the exact condition under which the dynamic pushdown condition is moved?
        • Care should be taken in the future when a query plan employes blocked join algorithms.
          If a subquery predicate is moved after a blocked join in the plan, and it turns out
          the move was based on incorrect selectivity estimates, the whole pass of filling the
          join buffers will not be able to use the moved condition. If in fact the condition
          was selective the join buffer will have to process a lot more rows that it would if
          the condition was attached earlier in the JOIN plan. With blocked joins, moving a
          condition in the plan has effect only for the subsequent block refill, but not for
          the current.

      The solution is to make blocked execution to cooperate with dynamic condition pushdown.
      Initially run the join with very small block sizes to collect statistics. Once there
      is sufficient statistics, move the predicate accordingly, and continue join execution
      with its normal block size.

      EXPLAIN output

      The dynamic pushdown triggers will be created before EXPLAIN prints the query plan.
      Based on static optimizer analysis, one of the triggers will be chosen as the best
      starting point. EXPLAIN will skip all other triggers except the active one.
      For the active one EXPLAIN EXTENDED will print instead of "Using where" - "Dynamic where".

      Future work will implement printout of the actual predicate pushdown state during query
      execution for SHOW EXPLAIN.

      Implementation details

      The selectivities and costs needed to decide where in the join plan to evaluate a subquery are:

      • Add the following predicate counters to the trigger condition that wraps the moveable AND-part:
        • the total number of subquery executions
        • the number of times subquery was TRUE
        • the number of times the whole pushdown predicate was executed for the JOIN_TAB it is currently attached
        • the number of times the whole pushdown predicate attached to a particular join returned TRUE
          The above counters allow to compute the selectivity of the subquery, and the whole pushdown predicate.
      • add counters to count the sizes of the intermediate join results (one counter per JOIN_TAB)
      • compute the actual number of times the subquery will be executed, thus its total cost

      The decision where to execute the subquery will be done based on the selectivity of the subquery, and the actual cardinality of the partial joins.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            psergey Sergei Petrunia added a comment -

            It seems,

            • Predicate counters should be kept in Item_func_trig_cond

            Another concerns are

            • EXPLAIN
            • user-visible tracking of execution
            Show
            psergey Sergei Petrunia added a comment - It seems, Predicate counters should be kept in Item_func_trig_cond Another concerns are EXPLAIN user-visible tracking of execution
            Hide
            timour Timour Katchaounov added a comment - - edited
            = Task description
            
            Let QEP is some query plan for query Q with tables T_1 ... T_n. Let's assume
            that among other predicates the query contains a subquery predicate subq_j.
            
            Depending on the cost and selectivity of subq_j, the total cost of the plan
            depends on the join operator where subq_j is pushed. Let plan QEP_1 has a
            subquery subq_j pushed to table T_m, and QEP_2 is exactly the same, but the
            subquery is pushed instead to table T_q.
            
            The costs of these two plans may differ significantly. After some number of
            subquery executions L it is possible to estimate the cost and selectivity of
            the subquery. When this number is reached, this task will:
            - Compute predicate costs and selectivities and join fanouts based on
              execution statistics,
            - Estimate the cost of all equivalent QEP_i where subq_j is pushed to
              each possible join,
            - Compare the costs of alternative QEP_i plans, and select the plan with
              sufficiently lower cost than the current placement. A plan QEP_i will be
              considered to be cheaper than QEP_j if the ratio of the plan costs is less
              than some configurable cost ratio "CR":
              (QEP_i / QEP_j < CR)
            - Dynamically "push" subq_j as prescribed by the best plan.
            
            
            = Cost model parameters:
            
            * C_cond_i
            The total cost of the conditions pushed down to JOIN_TAB i.
            This is the condition evaluated for each join record after applying the
            selection pushed inside the table access operation (e.g. index range scan).
            ** Optimize-time
               The cost of any non-subquery predicate is the constant "1/5", meaning that
               the cost of checking the where clause is 1/5 of the cost of reading one
               table record.
               MWL#83 adds the cost of each subquery to the total plan cost, but there is
               no separate logic to compute the cost of a condition that contains a subquery.
            ** Execution-time
               Same as at optimization time.
            
            * C_subq_j
            Cost of subquery 'j' which may appear in some AND-part of select_cond (pushed
            to join_tab 'i').
            ** Optimize-time
               The cost is JOIN::best_read.
            ** Execution-time
               TODO: It is not clear how to compute the actual execution cost in a way
               compatible with the optimize-time cost. Should it be the number of examined
               rows?
            
            * C_access_i - Cost of table access operator 'i'.
            Implemented as JOIN_TAB::read_time.
            
            * S_cond_i - Selectivity of select_cond, the condition pushed to join_tab 'i'
            ** Optimize-time: a combination of range and ref estimates produced by the
               optimizer.
            ** Execution-time: S_cond_i = number_of_true_results / number_of_executions
            
            * S_subq_i - Selectivity of the subquery pushed to join_tab 'i'.
            ** Optimize-time: Not available.
            ** Execution-time: S_subq_i = number_of_true_results / number_of_executions
            
            * Fj_i
            The fanout of join 'i' in the join plan. The "fanout" is the average number of
            records of table T_i that match one record of the intermediate join
            J_[i-1]. The fanout also takes into account the selectivity of the access
            method for table T_i.
            
            * Fj_1
            The fanout of the first table is 1 (this is not a join, just a selection).
            
            
            = Cost formulas:
            
            1. Partial join result size |J_k|
            
            The fanouts are relative estimates, they are not related to the actual sizes
            of the partial join results. The size of a partial join result can be computed
            by multipluing the number of rows produced by the first plan operator J_1, and
            all subsequent fanouts:
            
            |J_k| = |J_1| * MULT(1, i , Fj_i)
            
            The corresponding implementation is in JOIN::get_partial_cost_and_fanout():
                  record_count *= tab->records_read;
            
            2. Query plan cost
            
            2.1 Plan cost assuming that all conditions are cheap, and the selectivity of
            the JOIN conditions not used by the table access method is not known.
            
            Each select_cond has the cost: C_cond_i = (1/TIME_FOR_COMPARE).
            The selectivity of all conditions is taken into account inside the estimate
            for each fanout Fj_i.
            
            Cost(J_1,...,J_k) = SUM(1, k, (C_access_i + |J_i| * C_cond_i))
            
            The corresponding implementation is in JOIN::get_partial_cost_and_fanout():
                  read_time += tab->read_time + record_count / (double) TIME_FOR_COMPARE;
            
            2.2 Plan cost that takes into account subquery predicate costs, and predicate
            selectivities.
            
            In this case the cost of each pushdown condition depends on the subqueries it
            contains. Assume:
            - conjunctive conditions only
            - one subquery only
            
            Let |J_i| is the cardinality of the partial join after applying the access
            method for table T_i. This is the number of times the pushdown condition cond_i
            will be evaluated.
            
            Let |PJ_i| is the cardinality of the partial join J_i after applying all pushed
            conditions. The two cardinalities are related as follows:
            
            |PJ_i| = |J_i| * S_cond_i
            
            The cost of the plan is:
            
            Cost(J_1,...,J_k) = SUM(1, k, (C_access_i + |PJ_i| * C_cond_i)) =
                              = SUM(1, k, (C_access_i + |J_i| * S_cond_i * C_cond_i))
            
            If pushdown condition i contains a subquery subq_j in one of its AND-parts,
            then its cost and selectivity are:
            
            C_cond_i = C_subc_i + C_subq_j
            S_cond_i = S_subc_i * S_subq_j
            
            Notice that the selectivity and cost of the subuery is independent on the table
            it is attached to, so it is not indexed with the table index.
            
            Where subc_i is the conjunction of all AND-parts that do not contain the
            subquery, and subq_i is the AND-part with the subquery.
            
            
            3. Comparing the costs of two plans with alternative placement of a subquery
            
            Let plan QEP_1 has a subquery subq_j pushed to table T_m, and QEP_2 is exactly
            the same, but the subquery is pushed to table T_q. The table access costs are
            the same for both plans. Let the sum of the access costs be: C_access = SUM(1,
            k, (C_access_i)).
            
            Cost(QEP_1) = C_access +
                          (|J_1| * S_cond_1 * C_cond_1) +
                          .......
            	      (|J_m| * (S_subc_m * S_subq_j) * (C_subc_m + C_subq_j)) +
                          .......
            	      (|J_q| * S_cond_q * C_cond_q) +
                          .......
            	      (|J_k| * S_cond_k * C_cond_k)
            
            Cost(QEP_2) = C_access +
                          (|J_1| * S_cond_1 * C_cond_1) +
                          .......
            	      (|J_m| * S_cond_m * C_cond_m) +
                          .......
            	      (|J_q| * (S_subc_q * S_subq_j) * (C_subc_q + C_subq_j)) +
                          .......
            	      (|J_k| * S_cond_k * C_cond_k)
            
            The costs above depend on the partial join cardinalities which are absolute
            values. Before query execution is complete these cardinalities cannot be known
            exactly. What can be known is an estimate of the join fanouts based on the
            current number of rows retrieved by each partial join.
            
            In order to make the two costs comparable during execution, they have to be
            expressed in terms of the join fanouts Fj_i. Predicate costs and selectivity
            can be estimated based on collected execution statistics at any time during
            execution.
            
            From point (1) above it follows we can divide the costs of both plans by |J_1|,
            where |J_1| is optimizer estimate of the size of the result of the first query
            plan operator. These resulting "relative" costs are sufficient for this task,
            because we don't need the absolute cost values, we only need to be able to
            compare the ratio of the costs of alternative placements of an expensive predicate.
            The relative costs are:
            
            RelCost(QEP_1) = (C_access / |J_1|) + 
                          (|Fj_1| * S_cond_1 * C_cond_1) +
                          .......
            	      (|Fj_m| * (S_subc_m * S_subq_j) * (C_subc_m + C_subq_j)) +
                          .......
            	      (|Fj_q| * S_cond_q * C_cond_q) +
                          .......
            	      (|Fj_k| * S_cond_k * C_cond_k)
            
            RelCost(QEP_2) = (C_access / |J_1|) + 
                          (|Fj_1| * S_cond_1 * C_cond_1) +
                          .......
            	      (|Fj_m| * S_cond_m * C_cond_m) +
                          .......
            	      (|Fj_q| * (S_subc_q * S_subq_j) * (C_subc_q + C_subq_j)) +
                          .......
            	      (|Fj_k| * S_cond_k * C_cond_k)
            
            
            4. Estimating join fanouts and predicate cost/selectivity during execution
            
            After some number of subquery executions we assume that the collected
            cost/selectivity/fanout statistics is representative. Predicate costs and
            selectivities are:
              (number_of_true_results / number_of_executions).
            
            Let CPJ_i is the current partial join result size. Then:
              Fj_i = CPJ_i / CPJ_[i-1]
            where i > 1.
              Fj_1 = 1.
            
            
            Show
            timour Timour Katchaounov added a comment - - edited = Task description Let QEP is some query plan for query Q with tables T_1 ... T_n. Let's assume that among other predicates the query contains a subquery predicate subq_j. Depending on the cost and selectivity of subq_j, the total cost of the plan depends on the join operator where subq_j is pushed. Let plan QEP_1 has a subquery subq_j pushed to table T_m, and QEP_2 is exactly the same, but the subquery is pushed instead to table T_q. The costs of these two plans may differ significantly. After some number of subquery executions L it is possible to estimate the cost and selectivity of the subquery. When this number is reached, this task will: - Compute predicate costs and selectivities and join fanouts based on execution statistics, - Estimate the cost of all equivalent QEP_i where subq_j is pushed to each possible join, - Compare the costs of alternative QEP_i plans, and select the plan with sufficiently lower cost than the current placement. A plan QEP_i will be considered to be cheaper than QEP_j if the ratio of the plan costs is less than some configurable cost ratio "CR": (QEP_i / QEP_j < CR) - Dynamically "push" subq_j as prescribed by the best plan. = Cost model parameters: * C_cond_i The total cost of the conditions pushed down to JOIN_TAB i. This is the condition evaluated for each join record after applying the selection pushed inside the table access operation (e.g. index range scan). ** Optimize-time The cost of any non-subquery predicate is the constant "1/5", meaning that the cost of checking the where clause is 1/5 of the cost of reading one table record. MWL#83 adds the cost of each subquery to the total plan cost, but there is no separate logic to compute the cost of a condition that contains a subquery. ** Execution-time Same as at optimization time. * C_subq_j Cost of subquery 'j' which may appear in some AND-part of select_cond (pushed to join_tab 'i'). ** Optimize-time The cost is JOIN::best_read. ** Execution-time TODO: It is not clear how to compute the actual execution cost in a way compatible with the optimize-time cost. Should it be the number of examined rows? * C_access_i - Cost of table access operator 'i'. Implemented as JOIN_TAB::read_time. * S_cond_i - Selectivity of select_cond, the condition pushed to join_tab 'i' ** Optimize-time: a combination of range and ref estimates produced by the optimizer. ** Execution-time: S_cond_i = number_of_true_results / number_of_executions * S_subq_i - Selectivity of the subquery pushed to join_tab 'i'. ** Optimize-time: Not available. ** Execution-time: S_subq_i = number_of_true_results / number_of_executions * Fj_i The fanout of join 'i' in the join plan. The "fanout" is the average number of records of table T_i that match one record of the intermediate join J_[i-1]. The fanout also takes into account the selectivity of the access method for table T_i. * Fj_1 The fanout of the first table is 1 (this is not a join, just a selection). = Cost formulas: 1. Partial join result size |J_k| The fanouts are relative estimates, they are not related to the actual sizes of the partial join results. The size of a partial join result can be computed by multipluing the number of rows produced by the first plan operator J_1, and all subsequent fanouts: |J_k| = |J_1| * MULT(1, i , Fj_i) The corresponding implementation is in JOIN::get_partial_cost_and_fanout(): record_count *= tab->records_read; 2. Query plan cost 2.1 Plan cost assuming that all conditions are cheap, and the selectivity of the JOIN conditions not used by the table access method is not known. Each select_cond has the cost: C_cond_i = (1/TIME_FOR_COMPARE). The selectivity of all conditions is taken into account inside the estimate for each fanout Fj_i. Cost(J_1,...,J_k) = SUM(1, k, (C_access_i + |J_i| * C_cond_i)) The corresponding implementation is in JOIN::get_partial_cost_and_fanout(): read_time += tab->read_time + record_count / (double) TIME_FOR_COMPARE; 2.2 Plan cost that takes into account subquery predicate costs, and predicate selectivities. In this case the cost of each pushdown condition depends on the subqueries it contains. Assume: - conjunctive conditions only - one subquery only Let |J_i| is the cardinality of the partial join after applying the access method for table T_i. This is the number of times the pushdown condition cond_i will be evaluated. Let |PJ_i| is the cardinality of the partial join J_i after applying all pushed conditions. The two cardinalities are related as follows: |PJ_i| = |J_i| * S_cond_i The cost of the plan is: Cost(J_1,...,J_k) = SUM(1, k, (C_access_i + |PJ_i| * C_cond_i)) = = SUM(1, k, (C_access_i + |J_i| * S_cond_i * C_cond_i)) If pushdown condition i contains a subquery subq_j in one of its AND-parts, then its cost and selectivity are: C_cond_i = C_subc_i + C_subq_j S_cond_i = S_subc_i * S_subq_j Notice that the selectivity and cost of the subuery is independent on the table it is attached to, so it is not indexed with the table index. Where subc_i is the conjunction of all AND-parts that do not contain the subquery, and subq_i is the AND-part with the subquery. 3. Comparing the costs of two plans with alternative placement of a subquery Let plan QEP_1 has a subquery subq_j pushed to table T_m, and QEP_2 is exactly the same, but the subquery is pushed to table T_q. The table access costs are the same for both plans. Let the sum of the access costs be: C_access = SUM(1, k, (C_access_i)). Cost(QEP_1) = C_access + (|J_1| * S_cond_1 * C_cond_1) + ....... (|J_m| * (S_subc_m * S_subq_j) * (C_subc_m + C_subq_j)) + ....... (|J_q| * S_cond_q * C_cond_q) + ....... (|J_k| * S_cond_k * C_cond_k) Cost(QEP_2) = C_access + (|J_1| * S_cond_1 * C_cond_1) + ....... (|J_m| * S_cond_m * C_cond_m) + ....... (|J_q| * (S_subc_q * S_subq_j) * (C_subc_q + C_subq_j)) + ....... (|J_k| * S_cond_k * C_cond_k) The costs above depend on the partial join cardinalities which are absolute values. Before query execution is complete these cardinalities cannot be known exactly. What can be known is an estimate of the join fanouts based on the current number of rows retrieved by each partial join. In order to make the two costs comparable during execution, they have to be expressed in terms of the join fanouts Fj_i. Predicate costs and selectivity can be estimated based on collected execution statistics at any time during execution. From point (1) above it follows we can divide the costs of both plans by |J_1|, where |J_1| is optimizer estimate of the size of the result of the first query plan operator. These resulting "relative" costs are sufficient for this task, because we don't need the absolute cost values, we only need to be able to compare the ratio of the costs of alternative placements of an expensive predicate. The relative costs are: RelCost(QEP_1) = (C_access / |J_1|) + (|Fj_1| * S_cond_1 * C_cond_1) + ....... (|Fj_m| * (S_subc_m * S_subq_j) * (C_subc_m + C_subq_j)) + ....... (|Fj_q| * S_cond_q * C_cond_q) + ....... (|Fj_k| * S_cond_k * C_cond_k) RelCost(QEP_2) = (C_access / |J_1|) + (|Fj_1| * S_cond_1 * C_cond_1) + ....... (|Fj_m| * S_cond_m * C_cond_m) + ....... (|Fj_q| * (S_subc_q * S_subq_j) * (C_subc_q + C_subq_j)) + ....... (|Fj_k| * S_cond_k * C_cond_k) 4. Estimating join fanouts and predicate cost/selectivity during execution After some number of subquery executions we assume that the collected cost/selectivity/fanout statistics is representative. Predicate costs and selectivities are: (number_of_true_results / number_of_executions). Let CPJ_i is the current partial join result size. Then: Fj_i = CPJ_i / CPJ_[i-1] where i > 1. Fj_1 = 1.
            Hide
            psergey Sergei Petrunia added a comment - - edited

            Meeting results:

            • optimize subquery predicates independently (process each one separately, ignoring existence of any "sibling", neighbor predicates)
            • for subquery cost, use the values produced by the optimizer (i.e. don't recalculate on runtime)
            • for subquery predicate selectivity, use runtime information
            • use #define for threshold value to move the subquery

            After the above is done:

            • re-calculate subquery cost on runtime using the same cost formula as join optimizer uses.
            Show
            psergey Sergei Petrunia added a comment - - edited Meeting results: optimize subquery predicates independently (process each one separately, ignoring existence of any "sibling", neighbor predicates) for subquery cost, use the values produced by the optimizer (i.e. don't recalculate on runtime) for subquery predicate selectivity, use runtime information use #define for threshold value to move the subquery After the above is done: re-calculate subquery cost on runtime using the same cost formula as join optimizer uses.
            Hide
            timour Timour Katchaounov added a comment -

            TODO:
            Consider reordering dynamically the conjuncts in a pushed condition according to their dynamic selectivity.
            For this it is necessary to record the selectivity of all conjuncts.

            Show
            timour Timour Katchaounov added a comment - TODO: Consider reordering dynamically the conjuncts in a pushed condition according to their dynamic selectivity. For this it is necessary to record the selectivity of all conjuncts.
            Hide
            timour Timour Katchaounov added a comment -

            When/how to move dynamic conditions earlier in the plan:

            = Suppose that:

            • a join plan is: <t1, ... t_i, ..., t_j, ... t_n>, and
            • a dynamic condition D is attached to t_j, while it can be attached the earliest to table t_i, (i < j)

            = Nested loops join (no blocking):
            When D is evaluated for the first partial join result <t1, ... t_i, ..., t_j>, its value will not change for any subsequent partial join results with the same prefix <t1, ... t_i>. We can remember the result of D for this prefix.

            • If the result is FALSE, join execution should restart from the next prefix <t1, ... t_i>, there is no need to evaluate any of the continuations of the current <t1, ... t_i>.
              Move D to its best new position before restarting the join with the next prefix <t1, ... t_i>.
            • If the result is TRUE,
              Move D to its new best position, because it will not affect any of the continuations of the current <t1, ... t_i>, it will be TRUE for any of them.

            = Blocked joins:

            • While filling the buffer of table t_i sample D for a certain % of the partial rows in t_i's buffer.
            • While sampling D, collect statistics of D's selectivity and filter the sampled partial records.
            • If the condition is sufficiently selective (e.g. >50%), re-evaluate it for the whole join cache, and reduce the size of the cache and the whole partial join.
            • Now D can be moved freely to the left where needed, because D will be true for all continuations of all records in t_i's cache.
            Show
            timour Timour Katchaounov added a comment - When/how to move dynamic conditions earlier in the plan: = Suppose that: a join plan is: <t1, ... t_i, ..., t_j, ... t_n>, and a dynamic condition D is attached to t_j, while it can be attached the earliest to table t_i, (i < j) = Nested loops join (no blocking): When D is evaluated for the first partial join result <t1, ... t_i, ..., t_j>, its value will not change for any subsequent partial join results with the same prefix <t1, ... t_i>. We can remember the result of D for this prefix. If the result is FALSE, join execution should restart from the next prefix <t1, ... t_i>, there is no need to evaluate any of the continuations of the current <t1, ... t_i>. Move D to its best new position before restarting the join with the next prefix <t1, ... t_i>. If the result is TRUE, Move D to its new best position, because it will not affect any of the continuations of the current <t1, ... t_i>, it will be TRUE for any of them. = Blocked joins: While filling the buffer of table t_i sample D for a certain % of the partial rows in t_i's buffer. While sampling D, collect statistics of D's selectivity and filter the sampled partial records. If the condition is sufficiently selective (e.g. >50%), re-evaluate it for the whole join cache, and reduce the size of the cache and the whole partial join. Now D can be moved freely to the left where needed, because D will be true for all continuations of all records in t_i's cache.
            Hide
            timour Timour Katchaounov added a comment -

            The system variables to control this feature are:
            optimizer_switch = "expensive_pred_dynamic_pushdown=on/off";
            (notice that his feature interacts with MDEV-83, which is controlled by optimizer_switch="expensive_pred_static_pushdown=on/off")

            The number of partial result rows after which to calculate a new optimal position for a predicate:
            dynamic_pushdown_max_evals = 0 ... inf;

            The ratio by which the new position must improve the total plan cost in order to move the predicate:
            dynamic_pushdown_cost_ratio = 10;

            Show
            timour Timour Katchaounov added a comment - The system variables to control this feature are: optimizer_switch = "expensive_pred_dynamic_pushdown=on/off"; (notice that his feature interacts with MDEV-83 , which is controlled by optimizer_switch="expensive_pred_static_pushdown=on/off") The number of partial result rows after which to calculate a new optimal position for a predicate: dynamic_pushdown_max_evals = 0 ... inf; The ratio by which the new position must improve the total plan cost in order to move the predicate: dynamic_pushdown_cost_ratio = 10;

              People

              • Assignee:
                psergey Sergei Petrunia
                Reporter:
                timour Timour Katchaounov
              • Votes:
                2 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - Not Specified
                  Not Specified
                  Logged:
                  Time Spent - 1 week, 1 day, 7 hours
                  1w 1d 7h