Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6982

LEFT JOIN table elimination is not always used when it could

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 10.0.14
    • Fix Version/s: 10.0
    • Component/s: None
    • Labels:
      None

      Description

      If I use a LEFT JOIN query on tables with INT and ENUM columns with primary keys, t2 is correctly eliminated from the query plan:

      DROP TABLE IF EXISTS t1,t2;
      CREATE TABLE t1 (a INT PRIMARY KEY);
      INSERT INTO t1 VALUES (1);
      DROP TABLE IF EXISTS t2;
      CREATE TABLE t2 (a ENUM('a','b'));
      INSERT INTO t2 VALUES ('a'),('b');
      SELECT t1.* FROM t1 JOIN t2 USING (a);
      SELECT t1.* FROM t1 LEFT JOIN t2 USING (a);
      ALTER TABLE t2 ADD PRIMARY KEY(a);
      SELECT t1.* FROM t1 JOIN t2 USING (a);
      SELECT t1.* FROM t1 LEFT JOIN t2 USING (a);
      EXPLAIN SELECT t1.* FROM t1 LEFT JOIN t2 USING (a);
      

      The EXPLAIN query returns:

      +------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
      | id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
      +------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
      |    1 | SIMPLE      | t1    | index | NULL          | PRIMARY | 4       | NULL |    1 | Using index |
      +------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
      1 row in set (0.00 sec)
      

      Now if I change the data type of t1.a from INT to DOUBLE and run EXPLAIN again:

      ALTER TABLE t1 MODIFY a DOUBLE;
      EXPLAIN SELECT t1.* FROM t1 LEFT JOIN t2 USING (a);
      

      or, change column type to DECIMAL:

      ALTER TABLE t1 MODIFY a DECIMAL(10,1);
      EXPLAIN SELECT t1.* FROM t1 LEFT JOIN t2 USING (a);
      

      it does not use table elimination any more:

      +------+-------------+-------+--------+---------------+---------+---------+-----------+------+--------------------------+
      | id   | select_type | table | type   | possible_keys | key     | key_len | ref       | rows | Extra                    |
      +------+-------------+-------+--------+---------------+---------+---------+-----------+------+--------------------------+
      |    1 | SIMPLE      | t1    | index  | NULL          | PRIMARY | 8       | NULL      |    1 | Using index              |
      |    1 | SIMPLE      | t2    | eq_ref | PRIMARY       | PRIMARY | 1       | test.t1.a |    1 | Using where; Using index |
      +------+-------------+-------+--------+---------------+---------+---------+-----------+------+--------------------------+
      2 rows in set (0.00 sec)
      

      There should not be any problems to eliminate t2 in case of DOUBLE+ENUM.
      DOUBLE+ENUM are compared as DOUBLE. val_real() of a ENUM column return numbers in the range 0..65535. If ENUM is known to have unique values (due to the primary key), the set of its val_real() values is also guaranteed to have only unique values. The range of returned real values is small, so no duplicates can appear due to a precision loss at INT->DOUBLE conversion in ENUM's val_real().

      The same is true for DECIMAL+ENUM.

      Exactly the same behaviour is observed if I change t2.a from ENUM to SET('a','b').
      t2.a can be safely eliminated in DOUBLE+SET and DECIMAL+SET combinations.
      Currently it's not eliminated. This script demonstrates:

      DROP TABLE IF EXISTS t1,t2;
      CREATE TABLE t1 (a INT PRIMARY KEY);
      INSERT INTO t1 VALUES (1);
      DROP TABLE IF EXISTS t2;
      CREATE TABLE t2 (a SET('a','b'));
      INSERT INTO t2 VALUES ('a'),('b');
      SELECT t1.* FROM t1 JOIN t2 USING (a);
      SELECT t1.* FROM t1 LEFT JOIN t2 USING (a);
      ALTER TABLE t2 ADD PRIMARY KEY(a);
      SELECT t1.* FROM t1 JOIN t2 USING (a);
      SELECT t1.* FROM t1 LEFT JOIN t2 USING (a);
      EXPLAIN SELECT t1.* FROM t1 LEFT JOIN t2 USING (a);
      
      ALTER TABLE t1 MODIFY a DOUBLE;
      EXPLAIN SELECT t1.* FROM t1 LEFT JOIN t2 USING (a);
      
      ALTER TABLE t1 MODIFY a DECIMAL(10,1);
      EXPLAIN SELECT t1.* FROM t1 LEFT JOIN t2 USING (a);
      

      A similar problem is observed with temporal columns in t2.
      According to EXPLAIN, table elimination is not applied for t2:

      DROP TABLE IF EXISTS t1,t2;
      CREATE TABLE t1 (c1 INT PRIMARY KEY);
      INSERT INTO t1 VALUES (20010101),(20010102);
      DROP TABLE IF EXISTS t2;
      CREATE TABLE t2 (c1 DATE);
      INSERT INTO t2 VALUES ('2001-01-01'),('2001-01-02');
      SELECT t1.* FROM t1 JOIN t2 USING(c1);
      SELECT t1.* FROM t1 LEFT JOIN t2 USING(c1);
      ALTER TABLE t2 ADD PRIMARY KEY(c1);
      SELECT t1.* FROM t1 JOIN t2 USING(c1);
      SELECT t1.* FROM t1 LEFT JOIN t2 USING(c1);
      EXPLAIN SELECT t1.* FROM t1 LEFT JOIN t2 USING(c1);
      

      It should be safe to eliminate t2, because INT+DATE are compared as DATE. Thus non of the records in t2 can produce duplicate values for the LEFT JOIN.
      It should be safe to eliminate t2 in case of DOUBLE and DECIMAL data types in t1, as DOUBLE+DATE and DECIMAL+DATE are compared as DATE.
      It should be safe to eliminate t2 in case of CHAR/VARCHAR/TEXT/BLOB/ENUM/SET data types in t1, because all of them are compared as DATE against a DATE column.

        Gliffy Diagrams

          Attachments

            Activity

            There are no comments yet on this issue.

              People

              • Assignee:
                bar Alexander Barkov
                Reporter:
                bar Alexander Barkov
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: