Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4173

Wrong result (extra row) with semijoin=on, joins in outer query, LEFT JOIN in the subquery

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.0.1, 5.5.29
    • Fix Version/s: 10.0.4, 5.5.32
    • Component/s: None
    • Labels:
      None

      Description

      The following test case produces COUNT = 23712 when it is executed with exists_to_in=off, and 23713 when it is executed with exists_to_in=on.
      Postgres, MySQL 5.6 and older versions of MariaDB all return 23712, so I assume it is the correct result.

      Test case:

      set optimizer_switch='exists_to_in=on';
      
      CREATE TABLE t1 (i1 INT) ENGINE=MyISAM;
      INSERT INTO t1 VALUES 
      (4),(8),(0),(0),(0),(7),(7),(5),(3),(4),(9),(6),(1),(5),(6),(2),(4),(4),
      (4),(3),(3),(7),(6),(7),(9),(4),(4),(2),(1),(2),(2),(3),(8),(4),(1),(7),
      (9),(4),(5),(5),(9),(3),(8),(0),(3),(1),(0),(8),(3),(3),(9),(6),(1),(0),
      (8),(3),(9),(5),(9),(2),(5),(9),(1),(8),(7),(6),(2),(4),(7),(3),(8),(6);
      
      CREATE TABLE t2 (i2 INT, j2 INT) ENGINE=MyISAM;
      INSERT INTO t2 VALUES 
      (7,1),(0,7),(9,4),(3,7),(4,0),(2,2),(5,9),(3,4),(1,0),(3,9),
      (5,8),(1,8),(204,18),(224,84),(9,6),(5,3),(0,6),(6,1),(7,3);
      
      CREATE TABLE t3 (i3 INT, KEY(i3)) ENGINE=MyISAM;
      INSERT INTO t3 VALUES 
      (0),(8),(1),(8),(9),(24),(6),(1),(6),
      (2),(4),(8),(4),(4),(7),(4),(1),(9),(4);
      
      SELECT COUNT(*) FROM t1 outer_t1, t2 outer_t2, t3 
      WHERE EXISTS ( 
        SELECT 1 FROM t2 LEFT JOIN t3 ON ( i3 = j2 ) 
        WHERE j2 = outer_t1.i1 AND i2 <> outer_t2.j2 
      );
      

      Result with exists_to_in=off:

      COUNT(*)
      23712
      

      Result with exists_to_in=on:

      COUNT(*)
      23713
      

      EXPLAIN with exists_to_in=on, otherwise default optimizer_switch:

      id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
      1	PRIMARY	outer_t2	ALL	NULL	NULL	NULL	NULL	19	100.00	
      1	PRIMARY	t3	index	NULL	i3	5	NULL	19	100.00	Using index; Using join buffer (flat, BNL join)
      1	PRIMARY	t2	ALL	NULL	NULL	NULL	NULL	19	100.00	Using where; Start temporary; Using join buffer (incremental, BNL join)
      1	PRIMARY	t3	ref	i3	i3	5	test.t2.j2	2	100.00	Using where; Using index
      1	PRIMARY	outer_t1	ALL	NULL	NULL	NULL	NULL	72	100.00	Using where; End temporary; Using join buffer (flat, BNL join)
      Warnings:
      Note	1276	Field or reference 'test.outer_t1.i1' of SELECT #2 was resolved in SELECT #1
      Note	1276	Field or reference 'test.outer_t2.j2' of SELECT #2 was resolved in SELECT #1
      Note	1003	select count(0) AS `COUNT(*)` from `test`.`t1` `outer_t1` semi join (`test`.`t2` left join `test`.`t3` on(((`test`.`t3`.`i3` = `test`.`t2`.`j2`) and (`test`.`t2`.`j2` is not null)))) join `test`.`t2` `outer_t2` join `test`.`t3` where ((`test`.`outer_t1`.`i1` = `test`.`t2`.`j2`) and (`test`.`t2`.`i2` <> `test`.`outer_t2`.`j2`))
      

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            pomyk Patryk Pomykalski added a comment -

            I have analyzed and fixed this. The problem was due to ignoring overflowing duplicate value when converting from heap to aria/myisam temp table.
            It could lead to extra rows in semijoin or missing row in union query.

            Patch for 5.3 branch:
            http://bazaar.launchpad.net/~pomyks/maria/5.3-mdev-4173/revision/3658

            Show
            pomyk Patryk Pomykalski added a comment - I have analyzed and fixed this. The problem was due to ignoring overflowing duplicate value when converting from heap to aria/myisam temp table. It could lead to extra rows in semijoin or missing row in union query. Patch for 5.3 branch: http://bazaar.launchpad.net/~pomyks/maria/5.3-mdev-4173/revision/3658
            Hide
            psergey Sergei Petrunia added a comment -

            Confirm Patryk's analysis. Apparently, DuplicateElimination strategy is the only part of SQL layer that cares about whether a write to temporary table resulted in HA_ERR_DUPP_KEY or not. The patch seems big, but most of the changes are trivial. MySQL 5.6 also has such change. They have introduced it as a part of some big patch.

            Show
            psergey Sergei Petrunia added a comment - Confirm Patryk's analysis. Apparently, DuplicateElimination strategy is the only part of SQL layer that cares about whether a write to temporary table resulted in HA_ERR_DUPP_KEY or not. The patch seems big, but most of the changes are trivial. MySQL 5.6 also has such change. They have introduced it as a part of some big patch.
            Hide
            psergey Sergei Petrunia added a comment - - edited

            Patryk, can I assume that the patch you've posted above is available to MariaDB under the New BSD licence?

            Show
            psergey Sergei Petrunia added a comment - - edited Patryk, can I assume that the patch you've posted above is available to MariaDB under the New BSD licence?
            Hide
            pomyk Patryk Pomykalski added a comment -

            Yes, all my patches are available under New BSD license. This one is mostly from mysql code.

            Show
            pomyk Patryk Pomykalski added a comment - Yes, all my patches are available under New BSD license. This one is mostly from mysql code.
            Hide
            psergey Sergei Petrunia added a comment -

            Thanks.

            Show
            psergey Sergei Petrunia added a comment - Thanks.

              People

              • Assignee:
                psergey Sergei Petrunia
                Reporter:
                elenst Elena Stepanova
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: