Details
Description
Create a dataset:
create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table one_k(a int); insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; create table t1 (a int, b int); insert into t1 select NULL, a from one_k; set optimizer_use_condition_selectivity=4; SET use_stat_tables = PREFERABLY; set histogram_size=100; analyze table t1 persistent for all;
Ok, so we've got EITS statistics for t1.A. t1.a has only NULL values. Let's see how selectivity estimates work
MariaDB [test]> explain extended select * from t1 A straight_join t1 B where A.a < 5; +------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------------------------+ | 1 | SIMPLE | A | ALL | NULL | NULL | NULL | NULL | 1000 | 0.00 | Using where | | 1 | SIMPLE | B | ALL | NULL | NULL | NULL | NULL | 1000 | 100.00 | Using join buffer (flat, BNL join) | +------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------------------------+
So far, good. Now, let's try an "IS NULL":
MariaDB [test]> explain extended select * from t1 A straight_join t1 B where A.a is null; +------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------------------------+ | 1 | SIMPLE | A | ALL | NULL | NULL | NULL | NULL | 1000 | 0.00 | Using where | | 1 | SIMPLE | B | ALL | NULL | NULL | NULL | NULL | 1000 | 100.00 | Using join buffer (flat, BNL join) | +------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------------------------+
We get filtered=0 even if it should have been 100.
Gliffy Diagrams
Attachments
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
The reason is this code in get_column_range_cardinality:
if (col_non_nulls < 1) res= 0;it basically means: 'if there are sufficiently many NULLs in the table, assume #matching_rows=0'. It ignores the possibility that the predicate is satisfied by NULL values.