Details
-
Type:
Bug
-
Status: Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 5.5.32
-
Fix Version/s: 5.5.33
-
Component/s: None
-
Labels:None
-
Environment:openSUSE 12.1
mariadb-5.5.28a-1.4.1.x86_64
Description
This is a copy of this MySQL bug: http://bugs.mysql.com/bug.php?id=69359
I am filing it here in hopes it will get more attention than from the MySQL folks.
Note, this bug occurs in BOTH MySQL and MariaDB.
Restatement of bug follows...
See
http://stackoverflow.com/questions/16848190/mysql-why-isnt-foo-is-null-optimized-away
for a description. Quoting that here:
I have two tables Person and Message and the latter has a foreign key to the former. Each table has id as the primary key column, and the Person table also has a column personId which is (uniquely) indexed.
The query below should take advantage of the personId key index, but instead MySQL requires scanning the entire Message table for some reason:
mysql> EXPLAIN SELECT `m`.*
-> FROM
-> `Message` AS `m`
-> LEFT JOIN
-> `Person` AS `p` ON (`m`.`person` = `p`.`id`)
-> WHERE
-> 'M002649397' IS NULL OR
-> `p`.`personId` = 'M002649397';
+----+-------------+-------+--------+---------------+---------+---------+----------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+----------------+--------+-------------+
| 1 | SIMPLE | m | ALL | NULL | NULL | NULL | NULL | 273220 | |
| 1 | SIMPLE | p | eq_ref | PRIMARY | PRIMARY | 8 | pcom.m.person | 1 | Using where |
+----+-------------+-------+--------+---------------+---------+---------+----------------+--------+-------------+
2 rows in set (0.00 sec)
But when I comment out the 'M002649397' IS NULL OR clause (which has no effect on the result), the query suddenly gets more efficient:
mysql> EXPLAIN SELECT `m`.*
-> FROM
-> `Message` AS `m`
-> LEFT JOIN
-> `Person` AS `p` ON (`m`.`person` = `p`.`id`)
-> WHERE
-> -- 'M002649397' IS NULL OR
-> `p`.`personId` = 'M002649397';
+----+-------------+-------+-------+--------------------+--------------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+--------------------+--------------------+---------+-------+------+-------------+
| 1 | SIMPLE | p | const | PRIMARY,personId | personId | 767 | const | 1 | Using index |
| 1 | SIMPLE | m | ref | FK9C2397E7A0F6ED11 | FK9C2397E7A0F6ED11 | 9 | const | 3 | Using where |
+----+-------------+-------+-------+--------------------+--------------------+---------+-------+------+-------------+
2 rows in set (0.01 sec)
The bug is that the 'M002649397' IS NULL expression, which is always false, is not being optimized away.
Here is a schema to test with:
create table Message (
type char(1) not null,
id bigint not null auto_increment,
createTime datetime not null,
updateTime datetime not null,
person bigint,
primary key (id)
) ENGINE=InnoDB default charset=utf8 collate=utf8_bin;
create table Person (
id bigint not null auto_increment,
createTime datetime not null,
updateTime datetime not null,
firstName varchar(255),
lastName varchar(255),
middleName varchar(255),
personId varchar(255) not null unique,
primary key (id)
) ENGINE=InnoDB default charset=utf8 collate=utf8_bin;
create index idx_Message_createTime on Message (createTime);
alter table Message
add index FK9C2397E7A0F6ED11 (person),
add constraint FK9C2397E7A0F6ED11
foreign key (person)
references Person (id);
create index idx_Person_lastName on Person (lastName);
Gliffy Diagrams
Attachments
Issue Links
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
The query has form
m LEFT JOIN p ON ...
LEFT JOIN allows only one join order:
m, p.
(This is because LEFT JOIN needs to find all records in `m`, regardless
of whether they have a match in `p`).
When we use that join order, the ON expression allows to access table `p`
using eq_ref access method (that is, a lookup on a primary key).
However, the condition `p`.`personId` = 'M002649397' is not useful.
If we take a careful look at the WHERE clause, we can see that is only
satisfied when
`p`.`personId` = 'M002649397'
This means that we're not interested in results of LEFT JOIN that have a
NULL-complemented row for table `p`. In other words, we can replace the
LEFT JOIN with INNER JOIN.
Once we have INNER JOIN, we can use both join orders:
When we use the join order "p, m" (like the second EXPLAIN does), we can
to limit the number of rows we get from table `p`.
use condition
The problem here seems to be that the optimizer is unable to convert left
join into inner join.
When the WHERE clause is just
`p`.`personId` = 'M002649397'
then conversion works (check out EXPLAIN EXTENDED - it shows "JOIN")
When the WHERE clause is
'M002649397' IS NULL OR `p`.`personId` = 'M002649397';
then conversion doesn't work.