Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-3574

LP:951283 - Wrong result (missing rows) with semijoin+firstmatch, IN/ANY subquery

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      The following query

      SELECT * FROM t1 AS alias1, t1 AS alias2, t2 AS alias3
      WHERE alias3.d IN (
      SELECT alias4.c FROM t2 AS alias4, t2 AS alias5
      WHERE alias5.b = alias4.b
      AND ( alias5.b >= alias3.b OR alias5.c != alias3.c )
      )

      on test data returns 2940 rows if it is executed with semijoin=on, firstmatch=on, optimizer_prune_level=0, and 3724 rows otherwise. The latter is correct.
      In the test case SELECT * is replaced with SELECT COUNT for convenience, it does not change the result.

      bzr version-info
      revision-id: <email address hidden>
      date: 2012-03-05 22:33:46 -0800
      build-date: 2012-03-10 05:45:32 +0400
      revno: 3455

      Also reproducible on MariaDB 5.5 (revno 3316). Not reproducible on MySQL 5.6 (revno 3706).
      Could not reproduce on MyISAM or Aria.

      EXPLAIN with semijoin=on, firstmatch=on, optimizer_prune_level=0 (wrong result):

      id select_type table type possible_keys key key_len ref rows filtered Extra
      1 PRIMARY alias3 ALL PRIMARY NULL NULL NULL 19 100.00 Using where
      1 PRIMARY alias4 ref PRIMARY,c c 4 test.alias3.d 1 100.00 Using index; FirstMatch(alias3)
      1 PRIMARY alias1 ALL NULL NULL NULL NULL 14 100.00
      1 PRIMARY alias2 ALL NULL NULL NULL NULL 14 100.00
      1 PRIMARY alias5 eq_ref PRIMARY PRIMARY 4 test.alias4.b 1 100.00 Using where; FirstMatch(alias2)
      Warnings:
      Note 1276 Field or reference 'test.alias3.b' of SELECT #2 was resolved in SELECT #1
      Note 1276 Field or reference 'test.alias3.c' of SELECT #2 was resolved in SELECT #1
      Note 1003 select count(0) AS `COUNT` from `test`.`t1` `alias1` semi join (`test`.`t2` `alias4` join `test`.`t2` `alias5`) join `test`.`t1` `alias2` join `test`.`t2` `alias3` where ((`test`.`alias4`.`c` = `test`.`alias3`.`d`) and (`test`.`alias5`.`b` = `test`.`alias4`.`b`) and ((`test`.`alias4`.`b` >= `test`.`alias3`.`b`) or (`test`.`alias5`.`c` <> `test`.`alias3`.`c`)))

      EXPLAIN with semijoin=on, firstmatch=on, optimizer_prune_level=1 (correct result):

      id select_type table type possible_keys key key_len ref rows filtered Extra
      1 PRIMARY alias1 ALL NULL NULL NULL NULL 14 100.00
      1 PRIMARY alias2 ALL NULL NULL NULL NULL 14 100.00 Using join buffer (flat, BNL join)
      1 PRIMARY alias3 ALL PRIMARY NULL NULL NULL 19 100.00 Using where; Using join buffer (incremental, BNL join)
      1 PRIMARY alias4 ref PRIMARY,c c 4 test.alias3.d 1 100.00 Using index
      1 PRIMARY alias5 eq_ref PRIMARY PRIMARY 4 test.alias4.b 1 100.00 Using where; FirstMatch(alias3)
      Warnings:
      Note 1276 Field or reference 'test.alias3.b' of SELECT #2 was resolved in SELECT #1
      Note 1276 Field or reference 'test.alias3.c' of SELECT #2 was resolved in SELECT #1
      Note 1003 select count(0) AS `COUNT` from `test`.`t1` `alias1` semi join (`test`.`t2` `alias4` join `test`.`t2` `alias5`) join `test`.`t1` `alias2` join `test`.`t2` `alias3` where ((`test`.`alias4`.`c` = `test`.`alias3`.`d`) and (`test`.`alias5`.`b` = `test`.`alias4`.`b`) and ((`test`.`alias4`.`b` >= `test`.`alias3`.`b`) or (`test`.`alias5`.`c` <> `test`.`alias3`.`c`)))

      Minimal optimizer_switch: firstmatch=on,semijoin=on
      Full optimizer_switch (default): index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on

      Test case:

      --source include/have_innodb.inc

      SET optimizer_switch='firstmatch=on,semijoin=on';
      SET optimizer_prune_level=0;

      CREATE TABLE t1 ( a INT ) ENGINE=InnoDB;
      INSERT INTO t1 VALUES
      (10),(11),(12),(13),(14),(15),(16),
      (17),(18),(19),(20),(21),(22),(23);

      CREATE TABLE t2 (
      b INT PRIMARY KEY,
      c VARCHAR(1),
      d VARCHAR(1),
      KEY(c)
      ) ENGINE=InnoDB;

      INSERT INTO t2 VALUES
      (1,'j','j'),(2,'v','v'),(3,'c','c'),(4,'m','m'),
      (5,'d','d'),(6,'d','d'),(7,'y','y'),(8,'t','t'),
      (9,'d','d'),(10,'s','s'),(11,'r','r'),(12,'m','m'),
      (13,'b','b'),(14,'x','x'),(15,'g','g'),(16,'p','p'),
      (17,'q','q'),(18,'w','w'),(19,'d','d');

      SELECT COUNT FROM t1 AS alias1, t1 AS alias2, t2 AS alias3
      WHERE alias3.d IN (
      SELECT alias4.c FROM t2 AS alias4, t2 AS alias5
      WHERE alias5.b = alias4.b
      AND ( alias5.b >= alias3.b OR alias5.c != alias3.c )
      );

      1. End of test case
      1. Expected result
      2. COUNT
      3. 3724
      1. Actual result
      2. COUNT
      3. 2940

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            elenst Elena Stepanova added a comment -

            Re: Wrong result (missing rows) with semijoin+firstmatch, IN/ANY subquery
            While the provided test requires optimizer_prune_level=0 and InnoDB tables, later I had similar mismatches with MyISAM tables and default optimizer_prune_level, so I am removing these conditions from the bug summary.

            Show
            elenst Elena Stepanova added a comment - Re: Wrong result (missing rows) with semijoin+firstmatch, IN/ANY subquery While the provided test requires optimizer_prune_level=0 and InnoDB tables, later I had similar mismatches with MyISAM tables and default optimizer_prune_level, so I am removing these conditions from the bug summary.
            Hide
            psergey Sergei Petrunia added a comment -

            Re: Wrong result (missing rows) with semijoin+firstmatch, IN/ANY subquery
            The query plan seems to be valid.

            I've compared the query output resultsets:

            • semi-join's resultset is a proper subset of non-semijoin's resultset
            • whether a row is missing is a function of value of alias3.b: rows with values of 6,9,12,19 are missing. All others are present.
            Show
            psergey Sergei Petrunia added a comment - Re: Wrong result (missing rows) with semijoin+firstmatch, IN/ANY subquery The query plan seems to be valid. I've compared the query output resultsets: semi-join's resultset is a proper subset of non-semijoin's resultset whether a row is missing is a function of value of alias3.b: rows with values of 6,9,12,19 are missing. All others are present.
            Hide
            psergey Sergei Petrunia added a comment -

            Re: Wrong result (missing rows) with semijoin+firstmatch, IN/ANY subquery
            The join order is:

            O alias3
            I alias4 FirstMatch(alias3)
            n alias1
            n alias2
            I alias5 FirstMatch(alias2)

            Here "O" means outer, "I" means inner, "n" - outer, not correlated with the subquery table.

            When I debug to see why records with alias3.b=6 do not make it to the output, I see this scenario:

            • the code arrives at sub_select(...,alias4, .. ) call
            • it sets join->return_tab= {alias3}

              // this what FirstMatch is about.

            • then it continues execution. The code builds full record combinations, but neither of them matches the condition attached to table alias5.
            • at some point, execution will return into sub_select(...,alias4, .. ) function.
              At this point:
            • It should search for next matching record, because no match have been found yet.
            • It does not do that, though. Instead, it jumps out here:

            if (join->return_tab < join_tab)
            DBUG_RETURN(NESTED_LOOP_OK);

            Apparently, FirstMatch code is malfunctioning when it is handling the case where inner tables are interleaved with outer tables.

            Show
            psergey Sergei Petrunia added a comment - Re: Wrong result (missing rows) with semijoin+firstmatch, IN/ANY subquery The join order is: O alias3 I alias4 FirstMatch(alias3) n alias1 n alias2 I alias5 FirstMatch(alias2) Here "O" means outer, "I" means inner, "n" - outer, not correlated with the subquery table. When I debug to see why records with alias3.b=6 do not make it to the output, I see this scenario: the code arrives at sub_select(...,alias4, .. ) call it sets join->return_tab= {alias3} // this what FirstMatch is about. then it continues execution. The code builds full record combinations, but neither of them matches the condition attached to table alias5. at some point, execution will return into sub_select(...,alias4, .. ) function. At this point: It should search for next matching record, because no match have been found yet. It does not do that, though. Instead, it jumps out here: if (join->return_tab < join_tab) DBUG_RETURN(NESTED_LOOP_OK); Apparently, FirstMatch code is malfunctioning when it is handling the case where inner tables are interleaved with outer tables.
            Hide
            psergey Sergei Petrunia added a comment -

            Alternate proposal: unfinished patch and exploration
            bug951283.tgz
            LPexportBug951283_bug951283.tgz

            Show
            psergey Sergei Petrunia added a comment - Alternate proposal: unfinished patch and exploration bug951283.tgz LPexportBug951283_bug951283.tgz
            Hide
            psergey Sergei Petrunia added a comment -

            Re: Wrong result (missing rows) with semijoin+firstmatch, IN/ANY subquery
            Alternate proposal: unfinished patch and exploration

            Show
            psergey Sergei Petrunia added a comment - Re: Wrong result (missing rows) with semijoin+firstmatch, IN/ANY subquery Alternate proposal: unfinished patch and exploration
            Hide
            ratzpo Rasmus Johansson added a comment -

            Launchpad bug id: 951283

            Show
            ratzpo Rasmus Johansson added a comment - Launchpad bug id: 951283

              People

              • Assignee:
                psergey Sergei Petrunia
                Reporter:
                elenst Elena Stepanova
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: