Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-446

Reuse quick_condition_rows in fanout computations

    Details

    • Type: Task
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Fix Version/s: 10.0.2
    • Component/s: None
    • Labels:
      None

      Description

      Currently:

      • range optimizer produces quick_condition_rows
      • they are used by best_access_path(), but only if it considers
        full table scan acccess (see matching_candidates_in_table() call)

      We need to make it so that quick_condition_rows is used for any access
      method.

      === Details ===
      Currently, quick_condition_rows is only used for full table scans. This makes the code simple.
      When we start using quick_condition_rows with arbitrary acess method, we will be faced with this problem:

      1. quick_condition_rows estimate was made from a certain part of WHERE clause (the part of WHERE that was used to construct a range)
      2. access method that we're using also was made from a certain part of WHERE clause.

      question: are these two correlated? When does the estimate of the fanout of an access method take into account quick_condition_rows?

      We will adopt this answer:

      • Don't take quick_condition_rows into account if we're using a quick select for table access,
        or if we're using ref(const) (in this case we would construct a quick select and take its estimates into account).
      • Do adopt it otherwise.

      THere are more complex cases like
      t.keypart1=const AND t.keypart2 BETWEEN 'foo' AND 'bar' AND t.keypart2=othertbl.col

      Here:
      quick_condition_rows is made from "t.keypart1=const AND t.keypart2 BETWEEN ..."
      ref access will be on "t.keypart1=const AND t.keypart2=othertbl.col"

      when part of WHERE will be appled,

      • t.keypart1=const – will not filter anything out (because it's guaranteed to be true
        by ref access)
      • t.keypart2 BETWEEN ... – will filter something out.

      Available estimates:
      E1. rec_per_key(t.keypart1=...)
      E2. rec_per_key("t.keypart1=... AND t.keypart2=...")
      E3. records_in_range("t.keypart1=const AND t.keypart2 BETWEEN ...")

      Note, that the estimates will not necessary be in sync. For instance, we saw real-world cases where E3 > E1, even if it is clear that E3 < E1 for any dataset.

      How to compute the fanout of the WHERE predicates not covered by an access method:

      • if an access method covers all WHERE predicates, then the WHERE clause doesn't have any selectivity on its own.
      • if WHERE's selectivity hasn't been "absorbed" by the employed access method, then it has selecivity on its own.

      We assume that:

      • ref(non-const) access's selecitivity is orthogonal to quick_condition_rows selecitivity.
      • full table scan has no selectivity => WHERE condition will apply quick_condition_rows selecitivity.
      • That's it, for now. More complex analysis is possible but will be done outside of this task.
        For any type of access other than the above two, assume quick_condition_rows doesn't have any additional selectivity - selectivity(WHERE)=1.

      == Implementation ==

      • All code should be in best_access_path().
      • QUESTIONABLE:: The result should be recorded in POSITION::records_read.
        (Yes, there will be cases where POSITION::records_read < 1)

      APPARENT: POSITION::prefix_record_count is not what is needed.

      • The result should be visible in EXPLAIN EXTENDED, in "filtered" column.

      == Implementation step #1 ==

      • Take the problematic query from mdev-402
      • Remove the scalar subquery predicate.
      • Convert both IN-subqueries into a JOIN ( I don't care if it's not equivalent)
      • Use STRAIGHT_JOIN hints, if necessary to produce the join order of
        nation,supplier,partsupp,part.
      • Make it so that for table part, in EXPLAIN:
        = access is eq_ref
        = rows==1 (it's eq_ref)
        = "filtered shows" 0.0118 * 100 = 1.18 (percent).
        This may require introducing extra members in POSITION (investigate).

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              timour Timour Katchaounov added a comment -

              Traced in GDB the produced selectivity estimate for the following query (as suggested by Sergey):

              explain extended
              select sql_calc_found_rows
              s_name, s_address
              from nation straight_join supplier straight_join part straight_join partsupp
              where s_suppkey = ps_suppkey
              and ps_partkey = p_partkey
              and p_name like 'forest%'
              and s_nationkey = n_nationkey
              and n_name = 'CANADA'
              order by s_name
              limit 10;

              There is an index on table nation(n_name).

              The plan for DBT3 scale 10 is:

              ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

              id select_type table type possible_keys key key_len ref rows filtered Extra

              ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

              1 SIMPLE nation ref PRIMARY,i_n_name i_n_name 26 const 1 100.00 Using where; Using index; Using temporary; Using filesort
              1 SIMPLE supplier ref PRIMARY,i_s_nationkey i_s_nationkey 5 dbt3.nation.n_nationkey 1070 100.00  
              1 SIMPLE partsupp ref PRIMARY,i_ps_partkey,i_ps_suppkey i_ps_suppkey 4 dbt3.supplier.s_suppkey 39 100.00 Using index
              1 SIMPLE part eq_ref PRIMARY,i_p_name PRIMARY 4 dbt3.partsupp.ps_partkey 0 0.00 Using where

              ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

              Notice the "0" % filtered rows for part. This is due to JOIN_TAB::records_read being an integer instead of double.
              The actual estimate that can be seen in GDB is: 0.0242. If one stops in the end of best_access_path, the estimates are:

              (gdb) p records
              $20 = 1
              (gdb) p s->table->quick_condition_rows
              $21 = 48334
              (gdb) p s->table->file->stats.records
              $22 = 1996869

              The value of 's->table->quick_condition_rows' is what records_in_range() gives us for the predicate "p_name like 'forest%'" on index i_p_name.
              The actual value is 24037.

              The numbers for scale 1 are:

              (gdb) p s->table->quick_condition_rows
              $73 = 2387
              (gdb) p s->table->alias->Ptr
              $74 = 0x7f73120 "part"
              (gdb) p s->table->file->stats.records
              $75 = 200252
              (gdb) p pos->where_part_selectivity
              $76 = 0.011919980824161557

              This is 1.19% selectivity, which was the expected result.

              Various problems in EXPLAIN prevent it from showing the correct result (not only rounding).

              Show
              timour Timour Katchaounov added a comment - Traced in GDB the produced selectivity estimate for the following query (as suggested by Sergey): explain extended select sql_calc_found_rows s_name, s_address from nation straight_join supplier straight_join part straight_join partsupp where s_suppkey = ps_suppkey and ps_partkey = p_partkey and p_name like 'forest%' and s_nationkey = n_nationkey and n_name = 'CANADA' order by s_name limit 10; There is an index on table nation(n_name). The plan for DBT3 scale 10 is: ----- ----------- -------- ------ --------------------------------- ------------- ------- ------------------------ ---- -------- ---------------------------------------------------------- id select_type table type possible_keys key key_len ref rows filtered Extra ----- ----------- -------- ------ --------------------------------- ------------- ------- ------------------------ ---- -------- ---------------------------------------------------------- 1 SIMPLE nation ref PRIMARY,i_n_name i_n_name 26 const 1 100.00 Using where; Using index; Using temporary; Using filesort 1 SIMPLE supplier ref PRIMARY,i_s_nationkey i_s_nationkey 5 dbt3.nation.n_nationkey 1070 100.00   1 SIMPLE partsupp ref PRIMARY,i_ps_partkey,i_ps_suppkey i_ps_suppkey 4 dbt3.supplier.s_suppkey 39 100.00 Using index 1 SIMPLE part eq_ref PRIMARY,i_p_name PRIMARY 4 dbt3.partsupp.ps_partkey 0 0.00 Using where ----- ----------- -------- ------ --------------------------------- ------------- ------- ------------------------ ---- -------- ---------------------------------------------------------- Notice the "0" % filtered rows for part. This is due to JOIN_TAB::records_read being an integer instead of double. The actual estimate that can be seen in GDB is: 0.0242. If one stops in the end of best_access_path, the estimates are: (gdb) p records $20 = 1 (gdb) p s->table->quick_condition_rows $21 = 48334 (gdb) p s->table->file->stats.records $22 = 1996869 The value of 's->table->quick_condition_rows' is what records_in_range() gives us for the predicate "p_name like 'forest%'" on index i_p_name. The actual value is 24037. The numbers for scale 1 are: (gdb) p s->table->quick_condition_rows $73 = 2387 (gdb) p s->table->alias->Ptr $74 = 0x7f73120 "part" (gdb) p s->table->file->stats.records $75 = 200252 (gdb) p pos->where_part_selectivity $76 = 0.011919980824161557 This is 1.19% selectivity, which was the expected result. Various problems in EXPLAIN prevent it from showing the correct result (not only rounding).
              Hide
              timour Timour Katchaounov added a comment -

              This task is no longer needed because MWL#253 provides a generic solution for predicate selectivity.

              Show
              timour Timour Katchaounov added a comment - This task is no longer needed because MWL#253 provides a generic solution for predicate selectivity.

                People

                • Assignee:
                  timour Timour Katchaounov
                  Reporter:
                  psergey Sergei Petrunia
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  2 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Time Tracking

                    Estimated:
                    Original Estimate - 2 hours Original Estimate - 2 hours
                    2h
                    Remaining:
                    Remaining Estimate - 0 minutes
                    0m
                    Logged:
                    Time Spent - 1 week, 1 day, 4 hours
                    1w 1d 4h