Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-287

CHEAP SQ: A query with subquery in SELECT list, EXISTS, inner joins takes hundreds times longer

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.5.25
    • Component/s: None
    • Labels:
      None

      Description

      The following query

      SELECT ( 
        SELECT MIN(b) FROM t1, t2 
        WHERE b = a AND 
        ( b = alias1.b OR 
           EXISTS ( SELECT * FROM t3 )
        )
      ) 
      FROM t2 alias1, t1 alias2, t1 alias3;
      

      on my machine takes 80 sec and more on the work tree vs 0.5 sec and less on the main 5.5 tree (tried revno 3413 and revno 3402).

      bzr version-info

      revision-id: timour@askmonty.org-20120518115201-s6byggvesqxcntkd
      date: 2012-05-18 14:52:01 +0300
      revno: 3404
      

      Reproducible with the default optimizer_switch as well as with all OFF values (except for in_to_exists which is required to execute the query).
      Reproducible with MyISAM, Aria, InnoDB.

      EXPLAIN on the work tree (with the default optimizer_switch):

      id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
      1	PRIMARY	alias2	ALL	NULL	NULL	NULL	NULL	20	100.00	
      1	PRIMARY	alias3	ALL	NULL	NULL	NULL	NULL	20	100.00	Using join buffer (flat, BNL join)
      1	PRIMARY	alias1	ALL	NULL	NULL	NULL	NULL	100	100.00	Using join buffer (incremental, BNL join)
      2	DEPENDENT SUBQUERY	t1	ALL	NULL	NULL	NULL	NULL	20	100.00	
      2	DEPENDENT SUBQUERY	t2	ALL	NULL	NULL	NULL	NULL	100	100.00	Using where; Using join buffer (flat, BNL join)
      3	SUBQUERY	t3	ALL	NULL	NULL	NULL	NULL	2	100.00	
      Warnings:
      Note	1276	Field or reference 'test.alias1.b' of SELECT #2 was resolved in SELECT #1
      Note	1003	select <expr_cache><>((select min(`test`.`t2`.`b`) from `test`.`t1` join `test`.`t2` where (`test`.`t2`.`b` = `test`.`t1`.`a`))) AS `( 
      SELECT MIN(b) FROM t1, t2 
      WHERE b = a AND 
      ( b = alias1.b OR 
      EXISTS ( SELECT * FROM t3 ) 
      )
      )` from `test`.`t2` `alias1` join `test`.`t1` `alias2` join `test`.`t1` `alias3`
      

      EXPLAIN on the main tree (with the default optimizer_switch):

      id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
      1	PRIMARY	alias2	ALL	NULL	NULL	NULL	NULL	20	100.00	
      1	PRIMARY	alias3	ALL	NULL	NULL	NULL	NULL	20	100.00	Using join buffer (flat, BNL join)
      1	PRIMARY	alias1	ALL	NULL	NULL	NULL	NULL	100	100.00	Using join buffer (incremental, BNL join)
      2	DEPENDENT SUBQUERY	t1	ALL	NULL	NULL	NULL	NULL	20	100.00	Using where
      2	DEPENDENT SUBQUERY	t2	ALL	NULL	NULL	NULL	NULL	100	100.00	Using where; Using join buffer (flat, BNL join)
      3	SUBQUERY	t3	ALL	NULL	NULL	NULL	NULL	2	100.00	
      Warnings:
      Note	1276	Field or reference 'test.alias1.b' of SELECT #2 was resolved in SELECT #1
      Note	1003	select <expr_cache><`test`.`alias1`.`b`>((select min(`test`.`t2`.`b`) from `test`.`t1` join `test`.`t2` where ((`test`.`t2`.`b` = `test`.`t1`.`a`) and ((`test`.`t1`.`a` = `test`.`alias1`.`b`) or exists(select 1 from `test`.`t3`))))) AS `( 
      SELECT MIN(b) FROM t1, t2 
      WHERE b = a AND 
      ( b = alias1.b OR 
      EXISTS ( SELECT * FROM t3 ) 
      )
      )` from `test`.`t2` `alias1` join `test`.`t1` `alias2` join `test`.`t1` `alias3`
      

      Test case:

      CREATE TABLE t1 (a INT);
      INSERT INTO t1 VALUES
      (1),(7),(4),(7),(0),(2),(9),(4),(0),(9),
      (1),(3),(8),(8),(18),(84),(6),(3),(6),(6);
      
      CREATE TABLE t2 (b INT);
      INSERT INTO t2 VALUES
      (4),(5),(2),(5),(1),(1),(2),(2),(2),(197),
      (4),(5),(3),(1),(2),(7),(6),(1),(156),(8),
      (7),(2),(6),(2),(1),(0),(7),(5),(7),(2),(1),
      (80),(3),(8),(5),(0),(9),(9),(7),(0),(5),
      (6),(9),(3),(91),(6),(7),(3),(161),(7),(7),
      (6),(5),(8),(7),(2),(1),(3),(6),(6),(5),(0),
      (7),(7),(6),(0),(0),(8),(0),(4),(0),(213),
      (248),(1),(6),(6),(3),(140),(0),(7),(6),(6),
      (8),(5),(8),(7),(3),(7),(3),(8),(0),(1),(3),
      (6),(8),(1),(1),(9),(0),(6);
      
      CREATE TABLE t3 (c INT);
      INSERT INTO t3 VALUES (8),(3);
      
      SELECT ( 
        SELECT MIN(b) FROM t1, t2 
        WHERE b = a AND 
        ( b = alias1.b OR 
           EXISTS ( SELECT * FROM t3 )
        )
      ) 
      FROM t2 alias1, t1 alias2, t1 alias3;
      
      

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              timour Timour Katchaounov added a comment -

              Analysis:

              The much slower execution in the case when constant subqueries are evaluated is a result of the subquery cache not working in this case. This is what happens in more detail. The subquery in the SELECT list contains a subquery. This subquery in turn contains a constant subquery.

              • If the constant subquery is considered expensive, then it is not evaluated during optimization. The subquery cache mechanism creates a cache for the correlated subquery, with a lookup argument 'alias1.b'. Execution is fast, because for all 40K results rows we execute the subquery only once, all other cases are lookups into the subquery cache.
              • If the constant subquery is consdered non-expensive, then it is evaluated during optimization. The WHERE clause of the outer subquery is reduced to "b = a" because the EXISTS predicate in the OR is TRUE, therefore the whole OR can be optimized away. Notice that because of this optimization, the subquery in the SELECT clause becomes non-correlated.
                • Item_cache_wrapper::init_on_demand() calls orig_item->get_cache_parameters(parameters) and determines that the list of parameters is empty because there are no correlated columns.
                • As a result expr_cache->init() returns immediately, and doesn't do any initialization. In particular it doesn't create the temp table for the cache.

              The ultimate result is that once the subquery becomes non-correlated, it is not cached any more, and is executed for each result row combination (40K times in the example). I consider the above behavior a deficiency of the subquery cache. This is being investigated ATM.

              Show
              timour Timour Katchaounov added a comment - Analysis: The much slower execution in the case when constant subqueries are evaluated is a result of the subquery cache not working in this case. This is what happens in more detail. The subquery in the SELECT list contains a subquery. This subquery in turn contains a constant subquery. If the constant subquery is considered expensive, then it is not evaluated during optimization. The subquery cache mechanism creates a cache for the correlated subquery, with a lookup argument 'alias1.b'. Execution is fast, because for all 40K results rows we execute the subquery only once, all other cases are lookups into the subquery cache. If the constant subquery is consdered non-expensive, then it is evaluated during optimization. The WHERE clause of the outer subquery is reduced to "b = a" because the EXISTS predicate in the OR is TRUE, therefore the whole OR can be optimized away. Notice that because of this optimization, the subquery in the SELECT clause becomes non-correlated. Item_cache_wrapper::init_on_demand() calls orig_item->get_cache_parameters(parameters) and determines that the list of parameters is empty because there are no correlated columns. As a result expr_cache->init() returns immediately, and doesn't do any initialization. In particular it doesn't create the temp table for the cache. The ultimate result is that once the subquery becomes non-correlated, it is not cached any more, and is executed for each result row combination (40K times in the example). I consider the above behavior a deficiency of the subquery cache. This is being investigated ATM.
              Hide
              timour Timour Katchaounov added a comment -

              This problem is fixed by the following patch waiting for review:

              revno: 3406
              revision-id: timour@askmonty.org-20120529211853-hww47vl7d4u4ae23
              parent: timour@askmonty.org-20120524110828-r0mm8sm1vn8a095e
              committer: timour@askmonty.org
              branch nick: 5.5-lpb944706
              timestamp: Wed 2012-05-30 00:18:53 +0300
              message:
              Patch for mdev-287: CHEAP SQ: A query with subquery in SELECT list, EXISTS, inner joins takes hundreds times longer

              Analysis:

              The fix for lp:944706 introduces early subquery optimization.
              While a subquery is being optimized some of its predicates may be
              removed. In the test case, the EXISTS subquery is constant, and is
              evaluated to TRUE. As a result the whole OR is TRUE, and thus the
              correlated condition "b = alias1.b" is optimized away. The subquery
              becomes non-correlated.

              The subquery cache is designed to work only for correlated subqueries.
              If constant subquery optimization is disallowed, then the constant
              subquery is not evaluated, the subquery remains correlated, and its
              execution is cached. As a result execution is fast.

              However, when the constant subquery was optimized away, it was neither
              cached by the subquery cache, nor it was cached by the internal subquery
              caching. The latter was due to the fact that the subquery still appeared
              as correlated to the subselect_XYZ_engine::exec methods, and they
              re-executed the subquery on each call to Item_subselect::exec.

              Solution:

              The solution is to update the correlated status of the subquery after it has
              been optimized. This status consists of:

              • st_select_lex::is_correlated
              • Item_subselect::is_correlated
              • SELECT_LEX::uncacheable
              • SELECT_LEX_UNIT::uncacheable
                The status is updated by st_select_lex::update_correlated_cache(), and its
                caller st_select_lex::optimize_unflattened_subqueries. The solution relies
                on the fact that the optimizer already called
                st_select_lex::update_used_tables() for each subquery. This allows to
                efficiently update the correlated status of each subquery without walking
                the whole subquery tree.

              Notice that his patch is an improvement over MySQL 5.6 and older, where
              subqueries are not pre-optimized, and the above analysis is not possible.

              Show
              timour Timour Katchaounov added a comment - This problem is fixed by the following patch waiting for review: revno: 3406 revision-id: timour@askmonty.org-20120529211853-hww47vl7d4u4ae23 parent: timour@askmonty.org-20120524110828-r0mm8sm1vn8a095e committer: timour@askmonty.org branch nick: 5.5-lpb944706 timestamp: Wed 2012-05-30 00:18:53 +0300 message: Patch for mdev-287: CHEAP SQ: A query with subquery in SELECT list, EXISTS, inner joins takes hundreds times longer Analysis: The fix for lp:944706 introduces early subquery optimization. While a subquery is being optimized some of its predicates may be removed. In the test case, the EXISTS subquery is constant, and is evaluated to TRUE. As a result the whole OR is TRUE, and thus the correlated condition "b = alias1.b" is optimized away. The subquery becomes non-correlated. The subquery cache is designed to work only for correlated subqueries. If constant subquery optimization is disallowed, then the constant subquery is not evaluated, the subquery remains correlated, and its execution is cached. As a result execution is fast. However, when the constant subquery was optimized away, it was neither cached by the subquery cache, nor it was cached by the internal subquery caching. The latter was due to the fact that the subquery still appeared as correlated to the subselect_XYZ_engine::exec methods, and they re-executed the subquery on each call to Item_subselect::exec. Solution: The solution is to update the correlated status of the subquery after it has been optimized. This status consists of: st_select_lex::is_correlated Item_subselect::is_correlated SELECT_LEX::uncacheable SELECT_LEX_UNIT::uncacheable The status is updated by st_select_lex::update_correlated_cache(), and its caller st_select_lex::optimize_unflattened_subqueries. The solution relies on the fact that the optimizer already called st_select_lex::update_used_tables() for each subquery. This allows to efficiently update the correlated status of each subquery without walking the whole subquery tree. Notice that his patch is an improvement over MySQL 5.6 and older, where subqueries are not pre-optimized, and the above analysis is not possible.
              Hide
              timour Timour Katchaounov added a comment -

              Merged, tested, pushed.

              Show
              timour Timour Katchaounov added a comment - Merged, tested, pushed.

                People

                • Assignee:
                  timour Timour Katchaounov
                  Reporter:
                  elenst Elena Stepanova
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  2 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Time Tracking

                    Estimated:
                    Original Estimate - Not Specified
                    Not Specified
                    Remaining:
                    Remaining Estimate - 0 minutes
                    0m
                    Logged:
                    Time Spent - 2 days, 2 hours
                    2d 2h