Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-610

LP:898747 - Join optimizer pruning seems to work poorly for semi-joins

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Join optimizer pruning seems to be too aggressive in pruning query plans with semi-joins. Quick investigation in debugger hints at that it is not comparing apples-to-apples when comparing record counts.

      As a result, one can observe effects like this:

      create table ten (a int);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
      
      create table one_k (a int);
      insert into one_k select A.a + 10*B.a + 100*C.a from ten A, ten B, ten C;
      
      MariaDB [test]> set optimizer_prune_level=0;
      Query OK, 0 rows affected (0.00 sec)
      
      MariaDB [test]>  explain select * from one_k A where a in (select B.a from  ten B, ten C where C.a < A.a);
      +----+-------------+-------+------+---------------+------+---------+------+------+----------------------------+
      | id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                      |
      +----+-------------+-------+------+---------------+------+---------+------+------+----------------------------+
      |  1 | PRIMARY     | A     | ALL  | NULL          | NULL | NULL    | NULL | 1000 |                            |
      |  1 | PRIMARY     | C     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using where                |
      |  1 | PRIMARY     | B     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using where; FirstMatch(A) |
      +----+-------------+-------+------+---------------+------+---------+------+------+----------------------------+
      3 rows in set (0.01 sec)
      
      MariaDB [test]> set optimizer_prune_level=1;
      Query OK, 0 rows affected (0.00 sec)
      
      MariaDB [test]>  explain select * from one_k A where a in (select B.a from  ten B, ten C where C.a < A.a);
      +----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------------------+
      | id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                                          |
      +----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------------------+
      |  1 | PRIMARY     | B     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Start temporary                                                |
      |  1 | PRIMARY     | C     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using where                                                    |
      |  1 | PRIMARY     | A     | ALL  | NULL          | NULL | NULL    | NULL | 1000 | Using where; End temporary; Using join buffer (flat, BNL join) |
      +----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------------------+
      3 rows in set (0.00 sec)
      

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            psergey Sergei Petrunia added a comment -

            Re: Join optimizer pruning seems to work poorly for semi-joins
            Last_query_cost values:

            • The plan with A,C,B and FirstMatch: 22391.696266
            • The plan with B,C,A and Start/End Temorary: 50707.742164
            Show
            psergey Sergei Petrunia added a comment - Re: Join optimizer pruning seems to work poorly for semi-joins Last_query_cost values: The plan with A,C,B and FirstMatch: 22391.696266 The plan with B,C,A and Start/End Temorary: 50707.742164
            Hide
            ratzpo Rasmus Johansson added a comment -

            Launchpad bug id: 898747

            Show
            ratzpo Rasmus Johansson added a comment - Launchpad bug id: 898747
            Hide
            elenst Elena Stepanova added a comment -

            The problem revealed by this test case was gone between 5.3.5 and 5.3.6, particularly with the following revision:

                    ------------------------------------------------------------
                    revno: 3479.1.1
                    revision-id: psergey@askmonty.org-20120402174154-8y0lzcwc0qycoj3n
                    parent: psergey@askmonty.org-20120327104326-uezxq6pz8ulfl3fn
                    committer: Sergey Petrunya <psergey@askmonty.org>
                    branch nick: 5.3-look2
                    timestamp: Mon 2012-04-02 21:41:54 +0400
                    message:
                      BUG#913030: Optimizer chooses a suboptimal excution plan for Q18 from DBT-3
                      - When doing join optimization, pre-sort the tables so that they mimic the execution
                        order we've had with 'semijoin=off'. 
                      - That way, we will not get regressions when there are two query plans (the old and the
                        new) that have indentical costs but different execution times (because of factors that
                        the optimizer was not able to take into account).
            

            If it's just a special case, then probably another example is needed. Otherwise, maybe the bug should be closed?

            Show
            elenst Elena Stepanova added a comment - The problem revealed by this test case was gone between 5.3.5 and 5.3.6, particularly with the following revision: ------------------------------------------------------------ revno: 3479.1.1 revision-id: psergey@askmonty.org-20120402174154-8y0lzcwc0qycoj3n parent: psergey@askmonty.org-20120327104326-uezxq6pz8ulfl3fn committer: Sergey Petrunya <psergey@askmonty.org> branch nick: 5.3-look2 timestamp: Mon 2012-04-02 21:41:54 +0400 message: BUG#913030: Optimizer chooses a suboptimal excution plan for Q18 from DBT-3 - When doing join optimization, pre-sort the tables so that they mimic the execution order we've had with 'semijoin=off'. - That way, we will not get regressions when there are two query plans (the old and the new) that have indentical costs but different execution times (because of factors that the optimizer was not able to take into account). If it's just a special case, then probably another example is needed. Otherwise, maybe the bug should be closed?
            Hide
            elenst Elena Stepanova added a comment -

            Closing for now on the reason given above, please re-open if needed.

            Show
            elenst Elena Stepanova added a comment - Closing for now on the reason given above, please re-open if needed.

              People

              • Assignee:
                psergey Sergei Petrunia
                Reporter:
                psergey Sergei Petrunia
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: