InnoDB uses non-covering index on bool column when estimate is 50% of the table

Description

When there is a tinyint column with 0/1 values and only handful of rows contain one of them, optimizer computes the rows extimate for using index on that column as 50% of the rows instead of 95%+ and moreover it actually uses the index instead of a table scan even when there are additional conditions on a non-indexed column (so PK ref access will be needed anyway).

Selecting the "rare" value gives much better estimate (and using index is the right choice in that case).

I tested it between 15k - 1M rows and did observe it in production on 3.5M table. It seemed to behave correctly (optimizer prefering tablescan) somewhere under 15k rows.

I tested it on 10.0.22 an 10.1.6 (debug build I had at hand). Not using histogram-based stats. set global innodb_stats_traditional=off; did not make a difference.

I reported this to MySQL too as I found 5.6 doing the same (did not test 5.7) - http://bugs.mysql.com/bug.php?id=79271

Testcase attached, my results here:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 MariaDB [test]> EXPLAIN SELECT COUNT(1) FROM skewed_bool WHERE val > 1 AND flag = 1; +------+-------------+-------------+------+---------------+------+---------+-------+------+-------------+ | id | select_type | TABLE | TYPE | possible_keys | KEY | key_len | REF | ROWS | Extra | +------+-------------+-------------+------+---------------+------+---------+-------+------+-------------+ | 1 | SIMPLE | skewed_bool | REF | flag | flag | 1 | const | 12 | USING WHERE | +------+-------------+-------------+------+---------------+------+---------+-------+------+-------------+ 1 ROW IN SET (0.00 sec) MariaDB [test]> EXPLAIN SELECT COUNT(1) FROM skewed_bool WHERE val > 1 AND flag = 0; +------+-------------+-------------+------+---------------+------+---------+-------+-------+-------------+ | id | select_type | TABLE | TYPE | possible_keys | KEY | key_len | REF | ROWS | Extra | +------+-------------+-------------+------+---------------+------+---------+-------+-------+-------------+ | 1 | SIMPLE | skewed_bool | REF | flag | flag | 1 | const | 18768 | USING WHERE | +------+-------------+-------------+------+---------------+------+---------+-------+-------+-------------+ 1 ROW IN SET (0.00 sec) MariaDB [test]> SELECT COUNT(1), flag FROM skewed_bool GROUP BY flag; +----------+------+ | COUNT(1) | flag | +----------+------+ | 36850 | 0 | | 13 | 1 | +----------+------+ 2 ROWS IN SET (0.01 sec)

Environment

Ubuntu 12.04, Centos 6.6

Status

Assignee

Sergei Petrunia

Reporter

Jiri Kavalik

Labels

External issue ID

None

External issue ID

None

Fix versions

Affects versions

10.0
10.0.22
10.1

Priority

Minor