Details
Description
Create the dataset:
create table t5 (col1 int);
set @a=-1;
create table one_k (a int) select (@a:=@a+1) as a from information_schema.session_variables A, information_schema.session_variables B limit 1000;
insert into t5 select A.a from one_k A, one_k B where A.a < 100 and B.a < 100;
set histogram_size=100;
analyze table t5 persistent for all;
select *, hex(histogram) from mysql.column_stats where table_name='t5'\G
*************************** 1. row ***************************
db_name: j10
table_name: t5
column_name: col1
min_value: 0
max_value: 99
nulls_ratio: 0.0000
avg_length: 4.0000
avg_frequency: 100.0000
hist_size: 100
hist_type:
histogram: (100 bytes here)
Ok, so we've got a table with 100 rows of 0, 100 rows of 1, and so forth up to 99.
Let's see how estimating works:
MariaDB [j10]> explain extended select * from t5 where col1 in (1,2,3); +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+ | 1 | SIMPLE | t5 | ALL | NULL | NULL | NULL | NULL | 10000 | 3.79 | Using where | +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+ 1 row in set, 1 warning (10.64 sec)
The real selectivity is 3%, we've got 3.79. Good.
Now, let's try values that are certainly not in the table:
MariaDB [j10]> explain extended select * from t5 where col1 in (-1,-2,-3); +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+ | 1 | SIMPLE | t5 | ALL | NULL | NULL | NULL | NULL | 10000 | 3.79 | Using where | +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+
OOps, again 3.79%.
Let's see what non-equality range shows:
MariaDB [j10]> explain extended select * from t5 where col1<=-1; +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+ | 1 | SIMPLE | t5 | ALL | NULL | NULL | NULL | NULL | 10000 | 0.99 | Using where | +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+
1%. It's better.
I consider selecitivity obtained for "where col1 in (-1,-2,-3)" to be a bug.
Gliffy Diagrams
Attachments
Issue Links
- relates to
-
MDEV-4145 Take into account the selectivity of single-table range predicates on non-indexed columns when searching for the best execution plan
-
- Closed
-
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
idea:
— sql/sql_statistics.h 2013-04-16 05:43:07 +0000
+++ sql/sql_statistics.h 2013-04-20 14:42:57 +0000
@@ -248,6 +248,10 @@
1.0 : get_value(max) * inv_prec_factor) -
(min == 0 ?
0.0 : get_value(min-1) * inv_prec_factor);
+
+ if (width <= DBL_EPSILON)
+ return 0.0;
+
sel= avg_sel * (bucket_sel * (max + 1 - min)) / width;
return sel;
}