Details
Description
Create the dataset as specified by MDEV-4363.
MariaDB [j10]> set use_stat_tables='preferably'; Query OK, 0 rows affected (0.00 sec) MariaDB [j10]> set optimizer_use_condition_selectivity=4; Query OK, 0 rows affected, 1 warning (0.00 sec)
The point of this test is to check skewed data distributions.
The value of 178 is very frequent:
MariaDB [j10]> select (select count(*) from t5 where col2 = 178 ) /(select count(*) from t5); +--------------------------------------------------------------------------+ | (select count(*) from t5 where col2 = 178 ) /(select count(*) from t5) | +--------------------------------------------------------------------------+ | 0.3301 | +--------------------------------------------------------------------------+ 1 row in set (0.18 sec)
What does EXPLAIN think:
MariaDB [j10]> explain extended select count(*) from t5 where col2 = 178 ; +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+ | 1 | SIMPLE | t5 | ALL | NULL | NULL | NULL | NULL | 10000 | 3.60 | Using where | +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+ 1 row in set, 1 warning (0.00 sec)
it thinks that selectivity is 3.6%.
If i try it with a regular, rare value:
MariaDB [j10]> select (select count(*) from t5 where col2 = 179 ) /(select count(*) from t5); +--------------------------------------------------------------------------+ | (select count(*) from t5 where col2 = 179 ) /(select count(*) from t5) | +--------------------------------------------------------------------------+ | 0.0001 | +--------------------------------------------------------------------------+ 1 row in set (0.20 sec) MariaDB [j10]> explain extended select count(*) from t5 where col2 = 179 ; +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+ | 1 | SIMPLE | t5 | ALL | NULL | NULL | NULL | NULL | 10000 | 3.60 | Using where | +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------------+ 1 row in set, 1 warning (0.00 sec)
It looks like histograms do not allow to distinguish between a frequent and a rare value?
Gliffy Diagrams
Attachments
Issue Links
- relates to
-
MDEV-4145 Take into account the selectivity of single-table range predicates on non-indexed columns when searching for the best execution plan
-
- Closed
-
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
One can have an example with non "RQG data", also:
MariaDB [dbt3sf10]> select *, hex(histogram) from mysql.column_stats where column_name='c_mktsegment' and table_name='customer'\G *************************** 1. row *************************** db_name: dbt3sf10 table_name: customer column_name: c_mktsegment min_value: AUTOMOBILE max_value: MACHINERY nulls_ratio: 0.0000 avg_length: 8.9998 avg_frequency: 300000.0000 hist_size: 200 hist_type: histogram: jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj�������������������������������������������������������������������������������� hex(histogram): 00000000000000000000000000000000000000000000000000000000000000000000000000000000151515151515151515151515151515151515151515151515151515151515151515151515151515156A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A6A95959595959595959595959595959595959595959595959595959595959595959595959595959595FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF