Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-2372

LP:944706 - Query with impossible or constant subquery in WHERE or HAVING is not precomputed and thus not part of optimization

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      The following query

      SELECT MAX( alias2.a ) AS field
      FROM t1 AS alias1, t1 AS alias2, t1 AS alias3
      WHERE alias1.a = alias2.a OR alias1.a = 'y'
      HAVING field>'B' AND ( 'Moscow' ) IN ( SELECT a FROM t1 );

      works almost instantly on MariaDB 5.2, but takes quite long, depending on the amount of data in t1, on MariaDB 5.3.

      bzr version-info
      revision-id: <email address hidden>
      date: 2012-02-29 23:28:16 -0800
      build-date: 2012-03-02 14:57:35 +0400
      revno: 3451

      bzr version-info
      revision-id: <email address hidden>
      date: 2012-02-28 13:50:30 +0200
      build-date: 2012-02-29 03:39:46 +0400
      revno: 3116
      branch-nick: maria-5.2

      EXPLAIN in 5.3:

      id select_type table type possible_keys key key_len ref rows filtered Extra
      1 PRIMARY alias3 index NULL a 19 NULL 133 100.00 Using index
      1 PRIMARY alias2 index a a 19 NULL 133 100.00 Using index; Using join buffer (flat, BNL join)
      1 PRIMARY alias1 index a a 19 NULL 133 100.00 Using where; Using index; Using join buffer (incremental, BNL join)
      2 MATERIALIZED t1 index a a 19 NULL 133 100.00 Using index
      Warnings:
      Note 1003 select max(`test`.`alias2`.`a`) AS `field` from `test`.`t1` `alias1` join `test`.`t1` `alias2` join `test`.`t1` `alias3` where ((`test`.`alias1`.`a` = `test`.`alias2`.`a`) or (`test`.`alias1`.`a` = 'y')) having ((`field` > 'B') and <expr_cache><'Moscow'>(<in_optimizer>('Moscow','Moscow' in ( <materialize> (select `test`.`t1`.`a` from `test`.`t1` ), <primary_index_lookup>('Moscow' in <temporary table> on distinct_key where (('Moscow' = `<subquery2>`.`a`)))))))

      optimizer_switch in 5.3 (default):
      index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on
      join_cache_level=2 (default)

      EXPLAIN in 5.2:

      id select_type table type possible_keys key key_len ref rows filtered Extra
      1 PRIMARY NULL NULL NULL NULL NULL NULL NULL NULL Impossible HAVING
      2 SUBQUERY t1 index_subquery a a 19 const 1 100.00 Using index; Using where
      Warnings:
      Note 1003 select max(`test`.`alias2`.`a`) AS `field` from `test`.`t1` `alias1` join `test`.`t1` `alias2` join `test`.`t1` `alias3` where (multiple equal(`test`.`alias1`.`a`, `test`.`alias2`.`a`) or multiple equal('y', `test`.`alias1`.`a`)) having 0

      optimizer_switch in 5.2 (default):
      index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,table_elimination=on

      Test case:

      CREATE TABLE t1 ( a VARCHAR(16), KEY (a) );
      INSERT INTO t1 VALUES
      ('Abilene'),('Akron'),('Albany'),('Albuquerque'),
      ('Alexandria'),('Allentown'),('Amarillo'),('Anaheim'),
      ('Anchorage'),('Ann Arbor'),('Arden-Arcade'),
      ('Arlington'),('Arlington'),('Arvada'),
      ('Athens-Clarke County'),('Atlanta'),
      ('Augusta-Richmond County'),('Aurora'),('Aurora'),
      ('Austin'),('Bakersfield'),('Baltimore'),
      ('Baton Rouge'),('Beaumont'),('Bellevue'),
      ('Berkeley'),('Billings'),('Birmingham'),
      ('Boise City'),('Boston'),('Boulder'),('Bridgeport'),
      ('Brockton'),('Brownsville'),('Buffalo'),('Burbank'),
      ('Cambridge'),('Cape Coral'),('Carrollton'),
      ('Carson'),('Cary'),('Cedar Rapids'),('Chandler'),
      ('Charleston'),('Charlotte'),('Chattanooga'),
      ('Chesapeake'),('Chicago'),('Chula Vista'),
      ('Cincinnati'),('Citrus Heights'),('Clarksville'),
      ('Clearwater'),('Cleveland'),('Colorado Springs'),
      ('Columbia'),('Columbus'),('Columbus'),('Compton'),
      ('Concord'),('Coral Springs'),('Corona'),
      ('Corpus Christi'),('Costa Mesa'),('Dallas'),
      ('Daly City'),('Davenport'),('Dayton'),('Denver'),
      ('Des Moines'),('Detroit'),('Downey'),('Durham'),
      ('East Los Angeles'),('El Cajon'),('El Monte'),
      ('El Paso'),('Elgin'),('Elizabeth'),('Erie'),
      ('Escondido'),('Eugene'),('Evansville'),('Fairfield'),
      ('Fall River'),('Fayetteville'),('Flint'),('Fontana'),
      ('Fort Collins'),('Fort Lauderdale'),('Fort Wayne'),
      ('Fort Worth'),('Fremont'),('Fresno'),('Fullerton'),
      ('Gainesville'),('Garden Grove'),('Garland'),('Gary'),
      ('Gilbert'),('Glendale'),('Glendale'),
      ('Grand Prairie'),('Grand Rapids'),('Green Bay'),
      ('Greensboro'),('Hampton'),('Hartford'),('Hayward'),
      ('Henderson'),('Hialeah'),('Hollywood'),('Honolulu'),
      ('Houston'),('Huntington Beach'),('Huntsville'),
      ('Independence'),('Indianapolis'),('Inglewood'),
      ('Irvine'),('Irving'),('Jackson'),('Jacksonville'),
      ('Jersey City'),('Joliet'),('Kansas City'),
      ('Kansas City'),('Kenosha'),('Knoxville'),
      ('Lafayette'),('Lakewood'),('Lancaster'),('Lansing')
      ;
      SELECT MAX( alias2.a ) AS field
      FROM t1 AS alias1, t1 AS alias2, t1 AS alias3
      WHERE alias1.a = alias2.a OR alias1.a = 'y'
      HAVING field>'B' AND ( 'Moscow' ) IN ( SELECT a FROM t1 );

      1. End of test case

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            timour Timour Katchaounov added a comment -

            Re: Query with impossible HAVING takes 1 millisec on 5.2 and 8 sec on 5.3
            If subqueries are not expensive (5.2, or changed 5.3) the query is optimized as follows:

            • JOIN::optimize for the outer query calls:
              having= optimize_cond(this, having, join_list, &having_value, &having_equal);
            • optimize_cond() calls remove_eq_conds()
            • remove_eq_conds() calls eval_const_cond() as follows:
              else if (cond->const_item() && !cond->is_expensive()) { *cond_value= eval_const_cond(cond) ? Item::COND_TRUE : Item::COND_FALSE; return (COND*) 0; }
            • eval_const_cond() calls Item_in_optimizer::val_int(), which in turn
              optimizes and executes the subquery, and returns FALSE to remove_eq_conds().
            • remove_eq_conds() returns a NULL item, and sets JOIN::having_value = Item::COND_FALSE.
            • JOIN::optimize checks that having_value is Item::COND_FALSE, and sets
              zero_result_cause= "Impossible HAVING"
            • JOIN::exec checks for zero_result_cause and returns empty result without executing.
              EXPLAIN shows "Impossible HAVING".

            If subqueries are expensive (5.3), the query is processed as follows:

            • optimize_cond for the outer JOIN doesn't remove the HAVING clause.
            • The optimizer optimizes both the query and the subquery.
            • Execution proceeds normally by executing the three-way join in the
              query until end_send_group evaluates the HAVING clause once, which
              results in a single subquery execution.

            Since the subquery is evaluated only once, one may wonder why this query
            takes so much time. The reason is an inefficient JOIN plan that is chosen
            with the default join_cache_level=2. The plan is:
            -----------------------------------------------------------------------------------------------------------------------------------------------

            id select_type table type possible_keys key key_len ref rows filtered Extra

            -----------------------------------------------------------------------------------------------------------------------------------------------

            1 PRIMARY alias3 index NULL a 19 NULL 133 100.00 Using index
            1 PRIMARY alias2 index a a 19 NULL 133 100.00 Using index; Using join buffer (flat, BNL join)
            1 PRIMARY alias1 index a a 19 NULL 133 100.00 Using where; Using index; Using join buffer (flat, BNL join)
            2 DEPENDENT SUBQUERY t1 index_subquery a a 19 const 10 100.00 Using index; Using where

            -----------------------------------------------------------------------------------------------------------------------------------------------

            This plan takes 1.8 sec.

            With join_cache_level=0 there is a much better plan:
            -----------------------------------------------------------------------------------------------------------

            id select_type table type possible_keys key key_len ref rows filtered Extra

            -----------------------------------------------------------------------------------------------------------

            1 PRIMARY alias1 index a a 19 NULL 133 100.00 Using index
            1 PRIMARY alias2 index a a 19 NULL 133 100.00 Using where; Using index
            1 PRIMARY alias3 index NULL a 19 NULL 133 100.00 Using index
            2 DEPENDENT SUBQUERY t1 index_subquery a a 19 const 10 100.00 Using index; Using where

            -----------------------------------------------------------------------------------------------------------

            This plan takes 0.07 sec.

            If we allow subqueries to be used for constant optimization we mask the problem
            with the inneficient JOIN plan.

            Show
            timour Timour Katchaounov added a comment - Re: Query with impossible HAVING takes 1 millisec on 5.2 and 8 sec on 5.3 If subqueries are not expensive (5.2, or changed 5.3) the query is optimized as follows: JOIN::optimize for the outer query calls: having= optimize_cond(this, having, join_list, &having_value, &having_equal); optimize_cond() calls remove_eq_conds() remove_eq_conds() calls eval_const_cond() as follows: else if (cond->const_item() && !cond->is_expensive()) { *cond_value= eval_const_cond(cond) ? Item::COND_TRUE : Item::COND_FALSE; return (COND*) 0; } eval_const_cond() calls Item_in_optimizer::val_int(), which in turn optimizes and executes the subquery, and returns FALSE to remove_eq_conds(). remove_eq_conds() returns a NULL item, and sets JOIN::having_value = Item::COND_FALSE. JOIN::optimize checks that having_value is Item::COND_FALSE, and sets zero_result_cause= "Impossible HAVING" JOIN::exec checks for zero_result_cause and returns empty result without executing. EXPLAIN shows "Impossible HAVING". If subqueries are expensive (5.3), the query is processed as follows: optimize_cond for the outer JOIN doesn't remove the HAVING clause. The optimizer optimizes both the query and the subquery. Execution proceeds normally by executing the three-way join in the query until end_send_group evaluates the HAVING clause once, which results in a single subquery execution. Since the subquery is evaluated only once, one may wonder why this query takes so much time. The reason is an inefficient JOIN plan that is chosen with the default join_cache_level=2. The plan is: --- ------------------ ------ -------------- ------------- ---- ------- ----- ---- -------- ------------------------------------------------------------- id select_type table type possible_keys key key_len ref rows filtered Extra --- ------------------ ------ -------------- ------------- ---- ------- ----- ---- -------- ------------------------------------------------------------- 1 PRIMARY alias3 index NULL a 19 NULL 133 100.00 Using index 1 PRIMARY alias2 index a a 19 NULL 133 100.00 Using index; Using join buffer (flat, BNL join) 1 PRIMARY alias1 index a a 19 NULL 133 100.00 Using where; Using index; Using join buffer (flat, BNL join) 2 DEPENDENT SUBQUERY t1 index_subquery a a 19 const 10 100.00 Using index; Using where --- ------------------ ------ -------------- ------------- ---- ------- ----- ---- -------- ------------------------------------------------------------- This plan takes 1.8 sec. With join_cache_level=0 there is a much better plan: --- ------------------ ------ -------------- ------------- ---- ------- ----- ---- -------- ------------------------- id select_type table type possible_keys key key_len ref rows filtered Extra --- ------------------ ------ -------------- ------------- ---- ------- ----- ---- -------- ------------------------- 1 PRIMARY alias1 index a a 19 NULL 133 100.00 Using index 1 PRIMARY alias2 index a a 19 NULL 133 100.00 Using where; Using index 1 PRIMARY alias3 index NULL a 19 NULL 133 100.00 Using index 2 DEPENDENT SUBQUERY t1 index_subquery a a 19 const 10 100.00 Using index; Using where --- ------------------ ------ -------------- ------------- ---- ------- ----- ---- -------- ------------------------- This plan takes 0.07 sec. If we allow subqueries to be used for constant optimization we mask the problem with the inneficient JOIN plan.
            Hide
            timour Timour Katchaounov added a comment -

            Re: Query with impossible HAVING takes 1 millisec on 5.2 and 8 sec on 5.3
            Let's analyze a simpler query:

            SELECT MAX(alias2.a)
            FROM t1 AS alias1, t1 AS alias2, t1 AS alias3
            WHERE alias1.a = alias2.a OR ('Moscow') IN ( SELECT a FROM t1 );

            If subqueries are not expensive (5.2, or changed 5.3) the query is optimized as the
            original test case, with the only difference, that optimize_cond is run for the WHERE
            clause, and the WHERE clause is transformed into: "alias1.a = alias2.a".

            This allows the JOIN optimizer to find a good plan that takes 0.02 sec:
            -----------------------------------------------------------------------------------------------------------------------------------------------

            id select_type table type possible_keys key key_len ref rows filtered Extra

            -----------------------------------------------------------------------------------------------------------------------------------------------

            1 PRIMARY alias1 index a a 19 NULL 133 100.00 Using where; Using index
            1 PRIMARY alias2 ref a a 19 lpb944706.alias1.a 1 100.00 Using index
            1 PRIMARY alias3 index NULL a 19 NULL 133 100.00 Using index; Using join buffer (flat, BNL join)
            2 DEPENDENT SUBQUERY t1 index_subquery a a 19 const 10 100.00 Using index; Using where

            -----------------------------------------------------------------------------------------------------------------------------------------------

            In 5.3 where subqueries are expensive, the IN cannot be optimized away, and the OR condition doesn't allow a plan with ref access. The resulting plan takes ~ 3 sec:

            ------------------------------------------------------------------------------------------------------------------------------------------------------

            id select_type table type possible_keys key key_len ref rows filtered Extra

            ------------------------------------------------------------------------------------------------------------------------------------------------------

            1 PRIMARY alias3 index NULL a 19 NULL 133 100.00 Using index
            1 PRIMARY alias2 index a a 19 NULL 133 100.00 Using index; Using join buffer (flat, BNL join)
            1 PRIMARY alias1 index a a 19 NULL 133 100.00 Using where; Using index; Using join buffer (incremental, BNL join)
            2 DEPENDENT SUBQUERY t1 index_subquery a a 19 const 10 100.00 Using index; Using where

            ------------------------------------------------------------------------------------------------------------------------------------------------------

            Since there is subquery caching, the subquery itself is executed only once, but the function
            that tests the cache is called for each join row.

            Finally, as in the original query the above plan can be improved by setting join_cache_level=0 down to 0.07 sec:

            -------------------------------------------------------------------------------------------

            id select_type table type possible_keys key key_len ref rows filtered Extra

            -------------------------------------------------------------------------------------------

            1 PRIMARY alias1 index a a 19 NULL 133 100.00 Using index
            1 PRIMARY alias2 index a a 19 NULL 133 100.00 Using where; Using index
            1 PRIMARY alias3 index NULL a 19 NULL 133 100.00 Using index
            2 MATERIALIZED t1 index a a 19 NULL 133 100.00 Using index

            -------------------------------------------------------------------------------------------

            Show
            timour Timour Katchaounov added a comment - Re: Query with impossible HAVING takes 1 millisec on 5.2 and 8 sec on 5.3 Let's analyze a simpler query: SELECT MAX(alias2.a) FROM t1 AS alias1, t1 AS alias2, t1 AS alias3 WHERE alias1.a = alias2.a OR ('Moscow') IN ( SELECT a FROM t1 ); If subqueries are not expensive (5.2, or changed 5.3) the query is optimized as the original test case, with the only difference, that optimize_cond is run for the WHERE clause, and the WHERE clause is transformed into: "alias1.a = alias2.a". This allows the JOIN optimizer to find a good plan that takes 0.02 sec: --- ------------------ ------ -------------- ------------- ---- ------- ------------------ ---- -------- ------------------------------------------------ id select_type table type possible_keys key key_len ref rows filtered Extra --- ------------------ ------ -------------- ------------- ---- ------- ------------------ ---- -------- ------------------------------------------------ 1 PRIMARY alias1 index a a 19 NULL 133 100.00 Using where; Using index 1 PRIMARY alias2 ref a a 19 lpb944706.alias1.a 1 100.00 Using index 1 PRIMARY alias3 index NULL a 19 NULL 133 100.00 Using index; Using join buffer (flat, BNL join) 2 DEPENDENT SUBQUERY t1 index_subquery a a 19 const 10 100.00 Using index; Using where --- ------------------ ------ -------------- ------------- ---- ------- ------------------ ---- -------- ------------------------------------------------ In 5.3 where subqueries are expensive, the IN cannot be optimized away, and the OR condition doesn't allow a plan with ref access. The resulting plan takes ~ 3 sec: --- ------------------ ------ -------------- ------------- ---- ------- ----- ---- -------- -------------------------------------------------------------------- id select_type table type possible_keys key key_len ref rows filtered Extra --- ------------------ ------ -------------- ------------- ---- ------- ----- ---- -------- -------------------------------------------------------------------- 1 PRIMARY alias3 index NULL a 19 NULL 133 100.00 Using index 1 PRIMARY alias2 index a a 19 NULL 133 100.00 Using index; Using join buffer (flat, BNL join) 1 PRIMARY alias1 index a a 19 NULL 133 100.00 Using where; Using index; Using join buffer (incremental, BNL join) 2 DEPENDENT SUBQUERY t1 index_subquery a a 19 const 10 100.00 Using index; Using where --- ------------------ ------ -------------- ------------- ---- ------- ----- ---- -------- -------------------------------------------------------------------- Since there is subquery caching, the subquery itself is executed only once, but the function that tests the cache is called for each join row. Finally, as in the original query the above plan can be improved by setting join_cache_level=0 down to 0.07 sec: --- ------------ ------ ----- ------------- ---- ------- ---- ---- -------- ------------------------- id select_type table type possible_keys key key_len ref rows filtered Extra --- ------------ ------ ----- ------------- ---- ------- ---- ---- -------- ------------------------- 1 PRIMARY alias1 index a a 19 NULL 133 100.00 Using index 1 PRIMARY alias2 index a a 19 NULL 133 100.00 Using where; Using index 1 PRIMARY alias3 index NULL a 19 NULL 133 100.00 Using index 2 MATERIALIZED t1 index a a 19 NULL 133 100.00 Using index --- ------------ ------ ----- ------------- ---- ------- ---- ---- -------- -------------------------
            Hide
            elenst Elena Stepanova added a comment -

            Re: Query with impossible or constant subquery in WHERE or HAVING is not precomputed and thus not part of optimization
            Also filed in JIRA as MDEV-193

            Show
            elenst Elena Stepanova added a comment - Re: Query with impossible or constant subquery in WHERE or HAVING is not precomputed and thus not part of optimization Also filed in JIRA as MDEV-193
            Hide
            timour Timour Katchaounov added a comment -

            Re: Query with impossible or constant subquery in WHERE or HAVING is not precomputed and thus not part of optimization
            Pushed to MariaDB 5.5.25.

            Show
            timour Timour Katchaounov added a comment - Re: Query with impossible or constant subquery in WHERE or HAVING is not precomputed and thus not part of optimization Pushed to MariaDB 5.5.25.
            Hide
            ratzpo Rasmus Johansson added a comment -

            Launchpad bug id: 944706

            Show
            ratzpo Rasmus Johansson added a comment - Launchpad bug id: 944706

              People

              • Assignee:
                timour Timour Katchaounov
                Reporter:
                elenst Elena Stepanova
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: