Very suboptimal join order is generated for a simple query even if MariaDB query planner knows the other is better in every sense

Description

This is the summary of the bug reported here: https://phabricator.wikimedia.org/T113901

This is the exact CREATE syntax of revision and page:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 CREATE TABLE `revision` ( `rev_id` int(8) unsigned NOT NULL AUTO_INCREMENT, `rev_page` int(8) unsigned NOT NULL DEFAULT '0', `rev_text_id` int(8) unsigned NOT NULL DEFAULT '0', `rev_comment` varbinary(255) DEFAULT NULL, `rev_user` int(5) unsigned NOT NULL DEFAULT '0', `rev_user_text` varbinary(255) NOT NULL DEFAULT '', `rev_timestamp` varbinary(14) NOT NULL DEFAULT '', `rev_minor_edit` tinyint(1) unsigned NOT NULL DEFAULT '0', `rev_deleted` tinyint(1) unsigned NOT NULL DEFAULT '0', `rev_len` int(8) unsigned DEFAULT NULL, `rev_parent_id` int(8) unsigned DEFAULT NULL, `rev_sha1` varbinary(32) NOT NULL DEFAULT '', `rev_content_model` varbinary(32) DEFAULT NULL, `rev_content_format` varbinary(64) DEFAULT NULL, PRIMARY KEY (`rev_id`), KEY `rev_page_id` (`rev_page`,`rev_id`), KEY `rev_timestamp` (`rev_timestamp`,`rev_id`), KEY `page_timestamp` (`rev_page`,`rev_timestamp`,`rev_id`), KEY `user_timestamp` (`rev_user`,`rev_timestamp`,`rev_id`), KEY `usertext_timestamp` (`rev_user_text`,`rev_timestamp`,`rev_id`) ) ENGINE=InnoDB AUTO_INCREMENT=683122218 DEFAULT CHARSET=binary
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 CREATE TABLE `page` ( `page_id` int(8) unsigned NOT NULL AUTO_INCREMENT, `page_namespace` int(11) NOT NULL DEFAULT '0', `page_title` varbinary(255) NOT NULL DEFAULT '', `page_restrictions` tinyblob NOT NULL, `page_counter` bigint(20) unsigned NOT NULL DEFAULT '0', `page_is_redirect` tinyint(1) unsigned NOT NULL DEFAULT '0', `page_is_new` tinyint(1) unsigned NOT NULL DEFAULT '0', `page_random` double unsigned NOT NULL DEFAULT '0', `page_touched` varbinary(14) NOT NULL DEFAULT '', `page_links_updated` varbinary(14) DEFAULT NULL, `page_latest` int(8) unsigned NOT NULL DEFAULT '0', `page_len` int(8) unsigned NOT NULL DEFAULT '0', `page_content_model` varbinary(32) DEFAULT NULL, PRIMARY KEY (`page_id`), UNIQUE KEY `name_title` (`page_namespace`,`page_title`), KEY `page_random` (`page_random`), KEY `page_len` (`page_len`), KEY `page_redirect_namespace_len` (`page_is_redirect`,`page_namespace`,`page_len`) ) ENGINE=InnoDB AUTO_INCREMENT=47992700 DEFAULT CHARSET=binary

Regular explain:

Tables have approximately the numbers or rows you see on the autoinc.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 mysql> EXPLAIN SELECT rev_id, rev_timestamp, page_id, page_title, page_namespace FROM `revision` INNER JOIN `page` ON ((rev_page = page_id)) ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 11\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: page type: index possible_keys: PRIMARY key: name_title key_len: 261 ref: NULL rows: 33836157 Extra: Using index; Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: revision type: ref possible_keys: rev_page_id,page_timestamp key: page_timestamp key_len: 4 ref: enwiki.page.page_id rows: 8 Extra: Using index

Forcing the other plan:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 mysql> EXPLAIN SELECT STRAIGHT_JOIN rev_id, rev_timestamp, page_id, page_title, page_namespace FROM `revision` INNER JOIN `page` ON ((rev_page = page_id)) ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 11\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: revision type: index possible_keys: rev_page_id,page_timestamp key: rev_timestamp key_len: 20 ref: NULL rows: 11 Extra: *************************** 2. row *************************** id: 1 select_type: SIMPLE table: page type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: enwiki.revision.rev_page rows: 1 Extra:

The actual query parsed has no issue:

1 2 3 4 select `enwiki`.`revision`.`rev_id` AS `rev_id`,`enwiki`.`revision`.`rev_timestamp` AS `rev_timestamp`,`enwiki`.`page`.`page_id` AS `page_id`,`enwiki`.`page`.`page_title` AS `page_title`,`enwiki`.`page`.`page_namespace` AS `page_namespace` from `enwiki`.`revision` join `enwiki`.`page` where (`enwiki`.`revision`.`rev_page` = `enwiki`.`page`.`page_id`) order by `enwiki`.`revision`.`rev_timestamp` desc,`enwiki`.`revision`.`rev_id` desc limit 11

Even ignoring the indexes does not work (it is not related to covering index):

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 mysql> EXPLAIN EXTENDED SELECT rev_id, rev_timestamp, page_id, page_title, page_namespace FROM `revision` INNER JOIN `page` IGNORE INDEX(name_title, PRIMARY) ON ((rev_page = page_id)) ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 11\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: page type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 33836157 filtered: 100.00 Extra: Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: revision type: ref possible_keys: rev_page_id,page_timestamp key: page_timestamp key_len: 4 ref: enwiki.page.page_id rows: 8 filtered: 100.00 Extra: Using index 2 rows in set, 1 warning (0.00 sec)

I think this is a bug on the optimizer.

Running ANALYZE on both tables, even with use-stat-tables=PREFERABLY does not help (and it shouldn't, row counts are ok, it is that for some reason maria refuses to use the clearly better plan -that itself understands).

This could be a specific configuration we have, you can see most of it at:
https://git.wikimedia.org/blob/operations%2Fpuppet.git/1423255184712f7fde17dee338c7ac1519ee44ee/templates%2Fmariadb%2Fproduction.my.cnf.erb

Environment

Linux with MariaDB-like packages

Status

Assignee

Sergei Petrunia

Reporter

Jaime Crespo

External issue ID

None

External issue ID

None

Components

Sprint

None

Fix versions

Affects versions

10.0
10.1
10.0.16

Priority

Major