Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6015

Measure impact of optimizer_switch='exists_to_in=on' in 10.0

    Details

    • Type: Task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      See https://mariadb.atlassian.net/browse/MDEV-38?focusedCommentId=28609&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-28609.

      I am looking at Q4.

      ts ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority;
      +------+--------------------+----------+-------+--------------------------------------------+---------------+---------+---------------------------+--------+----------+---------------------------------------------------------------------+
      | id   | select_type        | table    | type  | possible_keys                              | key           | key_len | ref                       | rows   | filtered | Extra                                                               |
      +------+--------------------+----------+-------+--------------------------------------------+---------------+---------+---------------------------+--------+----------+---------------------------------------------------------------------+
      |    1 | PRIMARY            | orders   | range | i_o_orderdate                              | i_o_orderdate | 4       | NULL                      | 137994 |   100.00 | Using index condition; Using where; Using temporary; Using filesort |
      |    2 | DEPENDENT SUBQUERY | lineitem | ref   | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | PRIMARY       | 4       | dbt3sf1.orders.o_orderkey |      2 |   100.00 | Using where                                                         |
      +------+--------------------+----------+-------+--------------------------------------------+---------------+---------+---------------------------+--------+----------+---------------------------------------------------------------------+
      2 rows in set, 2 warnings (0.00 sec)
      
      MariaDB [dbt3sf1]> set optimizer_switch='exists_to_in=on';
      Query OK, 0 rows affected (0.00 sec)
      
      MariaDB [dbt3sf1]> explain extended select o_orderpriority, count(*) as order_count from orders where o_orderdate >= '1995-06-06' and o_orderdate < date_add('1995-06-06', interval 3 month) and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority;
      +------+-------------+----------+-------+--------------------------------------------+---------------+---------+---------------------------+--------+----------+--------------------------------------------------------+
      | id   | select_type | table    | type  | possible_keys                              | key           | key_len | ref                       | rows   | filtered | Extra                                                  |
      +------+-------------+----------+-------+--------------------------------------------+---------------+---------+---------------------------+--------+----------+--------------------------------------------------------+
      |    1 | PRIMARY     | orders   | range | PRIMARY,i_o_orderdate                      | i_o_orderdate | 4       | NULL                      | 137994 |   100.00 | Using index condition; Using temporary; Using filesort |
      |    1 | PRIMARY     | lineitem | ref   | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | PRIMARY       | 4       | dbt3sf1.orders.o_orderkey |      2 |   100.00 | Using where; FirstMatch(orders)                        |
      +------+-------------+----------+-------+--------------------------------------------+---------------+---------+---------------------------+--------+----------+--------------------------------------------------------+
      2 rows in set, 2 warnings (0.00 sec)
      

      The query plan is similar. However, the second query should be able to use BKA. Run the query with and without BKA and measure the impact.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            psergey Sergei Petrunia added a comment -

            The Q4 variant I am using:

            select
            	o_orderpriority,
            	count(*) as order_count
            from
            	orders
            where
            	o_orderdate >= '1995-06-06'
            	and o_orderdate < date_add('1995-06-06', interval 3 month)
            	and exists (
            		select
            			*
            		from
            			lineitem
            		where
            			l_orderkey = o_orderkey
            			and l_commitdate < l_receiptdate
            	)
            group by
            	o_orderpriority
            order by
            	o_orderpriority;
            
            Show
            psergey Sergei Petrunia added a comment - The Q4 variant I am using: select o_orderpriority, count(*) as order_count from orders where o_orderdate >= '1995-06-06' and o_orderdate < date_add('1995-06-06', interval 3 month) and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority;
            Hide
            psergey Sergei Petrunia added a comment -

            Query Q22, EXPLAINs with exists_to_in=on|off.

            The query uses WHERE .... AND NOT EXISTS(SELECT * FROM orders WHERE o_custkey = c_custkey).

            It seems, that the best way run the subquery is to use index lookup on "o_custkey = c_custkey". This is used for both exists_to_in=ON and exists_to_in=OFF.
            BKA is not possible in either case, because the subquery uses NOT EXISTS. It is not a semi-join.

            Show
            psergey Sergei Petrunia added a comment - Query Q22, EXPLAINs with exists_to_in=on|off. The query uses WHERE .... AND NOT EXISTS(SELECT * FROM orders WHERE o_custkey = c_custkey). It seems, that the best way run the subquery is to use index lookup on "o_custkey = c_custkey". This is used for both exists_to_in=ON and exists_to_in=OFF. BKA is not possible in either case, because the subquery uses NOT EXISTS. It is not a semi-join.
            Hide
            axel Axel Schwenke added a comment -

            Results from running DBT3/Q4/SF30 on MariaDB-10.0.10 with exits-to-in either on or off. Execution times:

            exists_to_in=off -> 284, 276, 280 (sec)
            exists_to_in=on -> 261, 255, 257 (sec)

            EXPLAIN plans attached.

            Show
            axel Axel Schwenke added a comment - Results from running DBT3/Q4/SF30 on MariaDB-10.0.10 with exits-to-in either on or off. Execution times: exists_to_in=off -> 284, 276, 280 (sec) exists_to_in=on -> 261, 255, 257 (sec) EXPLAIN plans attached.
            Hide
            psergey Sergei Petrunia added a comment -

            Thanks.

            Looking at the EXPLAINs. The query plans have the same structure. Exists-to-in conversion has enabled BKA for table lineitem. However, it didn't bring a lot of speedup.

            Show
            psergey Sergei Petrunia added a comment - Thanks. Looking at the EXPLAINs. The query plans have the same structure. Exists-to-in conversion has enabled BKA for table lineitem. However, it didn't bring a lot of speedup.
            Hide
            psergey Sergei Petrunia added a comment -

            Graphic trace of query's io when running with scale=1, cold buffer cache.

            Show
            psergey Sergei Petrunia added a comment - Graphic trace of query's io when running with scale=1, cold buffer cache.
            Hide
            psergey Sergei Petrunia added a comment -

            IO trace for exists_to_in=off. Things to note
            1. there is no BKA (reads go back and forth)
            2 the data seems to be highly correlated - lookups in lineitem are linear.

            Show
            psergey Sergei Petrunia added a comment - IO trace for exists_to_in=off. Things to note 1. there is no BKA (reads go back and forth) 2 the data seems to be highly correlated - lookups in lineitem are linear.

              People

              • Assignee:
                axel Axel Schwenke
                Reporter:
                psergey Sergei Petrunia
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: