Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-588

LP:1005898 - index_merge/intersection is used when ref(const) is faster

    Details

    • Type: Bug
    • Status: Stalled
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 5.2.14, 5.3.12
    • Fix Version/s: None
    • Component/s: Optimizer
    • Labels:

      Description

      The queries were provided by Stephane Varoqui here: http://pastebin.com/1pqidvq6

      The dataset is lots.tgz, uploaded to ftp.askmonty.org/private/

      EXPLAIN outputs and query time:

      MariaDB [lots]> explain SELECT count(*) AS amount FROM lots WHERE contractNumber='1478876' AND lots.tsClosed IS NULL;
      +----+-------------+-------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------------------+
      | id | select_type | table | type        | possible_keys           | key                     | key_len | ref  | rows | Extra                                                              |
      +----+-------------+-------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------------------+
      |  1 | SIMPLE      | lots  | index_merge | tsClosed,contractNumber | contractNumber,tsClosed | 5,5     | NULL | 3257 | Using intersect(contractNumber,tsClosed); Using where; Using index |
      +----+-------------+-------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------------------+
      1.85 sec.
      
      MariaDB [lots]> explain SELECT count(*) AS amount FROM lots ignore index(tsClosed) WHERE contractNumber='1478876' AND lots.tsClosed IS NULL;
      +----+-------------+-------+------+----------------+----------------+---------+-------+-------+-------------+
      | id | select_type | table | type | possible_keys  | key            | key_len | ref   | rows  | Extra       |
      +----+-------------+-------+------+----------------+----------------+---------+-------+-------+-------------+
      |  1 | SIMPLE      | lots  | ref  | contractNumber | contractNumber | 5       | const | 28600 | Using where |
      +----+-------------+-------+------+----------------+----------------+---------+-------+-------+-------------+
      0.30 sec
      
      MariaDB [lots]> explain SELECT count(*) AS amount FROM lots ignore index(contractNumber) WHERE contractNumber='1478876' AND lots.tsClosed IS NULL;
      +----+-------------+-------+------+---------------+----------+---------+-------+--------+------------------------------------+
      | id | select_type | table | type | possible_keys | key      | key_len | ref   | rows   | Extra                              |
      +----+-------------+-------+------+---------------+----------+---------+-------+--------+------------------------------------+
      |  1 | SIMPLE      | lots  | ref  | tsClosed      | tsClosed | 5       | const | 243422 | Using index condition; Using where |
      +----+-------------+-------+------+---------------+----------+---------+-------+--------+------------------------------------+
      4.50 sec
      

      As one can see, index_merge/intersect is about 1.85/0.30=6 times slower than ref access on contractNumber.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            psergey Sergei Petrunia added a comment -

            Re: index_merge/intersection is used when ref(const) is faster
            Let's explore the dataset: here's numbers of matching records for both parts of the WHERE:

            Total rows: 2 137 152
            lots.tsClosed IS NULL - 544 288 (estimate: 243 422)
            contractNumber='1478876' - 30 000 (estimate: 28 600)
            contractNumber='1478876' AND lots.tsClosed IS NULL - 10 000 (index_merge's estimate: 3257)

            • index_merge's estimate of 3257 was obtained assuming both parts of WHERE are not correlated. In fact, they're strongly correlated.
            • Estimate for "lots.tsClosed IS NULL" misses the real value by the order of two.
            Show
            psergey Sergei Petrunia added a comment - Re: index_merge/intersection is used when ref(const) is faster Let's explore the dataset: here's numbers of matching records for both parts of the WHERE: Total rows: 2 137 152 lots.tsClosed IS NULL - 544 288 (estimate: 243 422) contractNumber='1478876' - 30 000 (estimate: 28 600) contractNumber='1478876' AND lots.tsClosed IS NULL - 10 000 (index_merge's estimate: 3257) index_merge's estimate of 3257 was obtained assuming both parts of WHERE are not correlated. In fact, they're strongly correlated. Estimate for "lots.tsClosed IS NULL" misses the real value by the order of two.
            Hide
            psergey Sergei Petrunia added a comment -

            Re: index_merge/intersection is used when ref(const) is faster
            If I put a breakpoint in ha_myisam::records_in_range() and return the exact number for records_in_range("lots.tsClosed IS NULL") call, index_merge will still be used:

            MariaDB [lots]> explain SELECT count AS amount FROM lots WHERE contractNumber='1478876' AND lots.tsClosed IS NULL;
            --------------------------------------------------------------------------------------------------------------------------------------------------------------+

            id select_type table type possible_keys key key_len ref rows Extra

            --------------------------------------------------------------------------------------------------------------------------------------------------------------+

            1 SIMPLE lots index_merge tsClosed,contractNumber contractNumber,tsClosed 5,5 NULL 7283 Using intersect(contractNumber,tsClosed); Using where; Using index

            --------------------------------------------------------------------------------------------------------------------------------------------------------------+

            Show
            psergey Sergei Petrunia added a comment - Re: index_merge/intersection is used when ref(const) is faster If I put a breakpoint in ha_myisam::records_in_range() and return the exact number for records_in_range("lots.tsClosed IS NULL") call, index_merge will still be used: MariaDB [lots] > explain SELECT count AS amount FROM lots WHERE contractNumber='1478876' AND lots.tsClosed IS NULL; --- ----------- ----- ----------- ----------------------- ----------------------- ------- ---- ---- -------------------------------------------------------------------+ id select_type table type possible_keys key key_len ref rows Extra --- ----------- ----- ----------- ----------------------- ----------------------- ------- ---- ---- -------------------------------------------------------------------+ 1 SIMPLE lots index_merge tsClosed,contractNumber contractNumber,tsClosed 5,5 NULL 7283 Using intersect(contractNumber,tsClosed); Using where; Using index --- ----------- ----- ----------- ----------------------- ----------------------- ------- ---- ---- -------------------------------------------------------------------+
            Hide
            psergey Sergei Petrunia added a comment -

            Re: index_merge/intersection is used when ref(const) is faster
            In MariaDB 5.2, I get:

            MariaDB [lots]> explain SELECT count AS amount FROM lots WHERE contractNumber='1478876';
            ---------------------------------------------------------------------------------------------+

            id select_type table type possible_keys key key_len ref rows Extra

            ---------------------------------------------------------------------------------------------+

            1 SIMPLE lots ref contractNumber contractNumber 5 const 28600 Using where; Using index

            ---------------------------------------------------------------------------------------------+
            1 row in set (0.02 sec)

            MariaDB [lots]> explain SELECT count AS amount FROM lots WHERE lots.tsClosed IS NULL;
            ---------------------------------------------------------------------------------------+

            id select_type table type possible_keys key key_len ref rows Extra

            ---------------------------------------------------------------------------------------+

            1 SIMPLE lots ref tsClosed tsClosed 5 const 243422 Using where; Using index

            ---------------------------------------------------------------------------------------+
            1 row in set (0.01 sec)

            MariaDB [lots]> explain SELECT count AS amount FROM lots WHERE contractNumber='1478876' AND lots.tsClosed IS NULL;
            --------------------------------------------------------------------------------------------------------------------------------------------------------------+

            id select_type table type possible_keys key key_len ref rows Extra

            --------------------------------------------------------------------------------------------------------------------------------------------------------------+

            1 SIMPLE lots index_merge tsClosed,contractNumber contractNumber,tsClosed 5,5 NULL 3257 Using intersect(contractNumber,tsClosed); Using where; Using index

            --------------------------------------------------------------------------------------------------------------------------------------------------------------+
            contractNumber: 0.30 sec
            tsClosed: 4.34 sec
            Index_merge: 2.00 sec

            Show
            psergey Sergei Petrunia added a comment - Re: index_merge/intersection is used when ref(const) is faster In MariaDB 5.2, I get: MariaDB [lots] > explain SELECT count AS amount FROM lots WHERE contractNumber='1478876'; --- ----------- ----- ---- -------------- -------------- ------- ----- ----- -------------------------+ id select_type table type possible_keys key key_len ref rows Extra --- ----------- ----- ---- -------------- -------------- ------- ----- ----- -------------------------+ 1 SIMPLE lots ref contractNumber contractNumber 5 const 28600 Using where; Using index --- ----------- ----- ---- -------------- -------------- ------- ----- ----- -------------------------+ 1 row in set (0.02 sec) MariaDB [lots] > explain SELECT count AS amount FROM lots WHERE lots.tsClosed IS NULL; --- ----------- ----- ---- ------------- -------- ------- ----- ------ -------------------------+ id select_type table type possible_keys key key_len ref rows Extra --- ----------- ----- ---- ------------- -------- ------- ----- ------ -------------------------+ 1 SIMPLE lots ref tsClosed tsClosed 5 const 243422 Using where; Using index --- ----------- ----- ---- ------------- -------- ------- ----- ------ -------------------------+ 1 row in set (0.01 sec) MariaDB [lots] > explain SELECT count AS amount FROM lots WHERE contractNumber='1478876' AND lots.tsClosed IS NULL; --- ----------- ----- ----------- ----------------------- ----------------------- ------- ---- ---- -------------------------------------------------------------------+ id select_type table type possible_keys key key_len ref rows Extra --- ----------- ----- ----------- ----------------------- ----------------------- ------- ---- ---- -------------------------------------------------------------------+ 1 SIMPLE lots index_merge tsClosed,contractNumber contractNumber,tsClosed 5,5 NULL 3257 Using intersect(contractNumber,tsClosed); Using where; Using index --- ----------- ----- ----------- ----------------------- ----------------------- ------- ---- ---- -------------------------------------------------------------------+ contractNumber: 0.30 sec tsClosed: 4.34 sec Index_merge: 2.00 sec
            Hide
            psergey Sergei Petrunia added a comment -

            Re: index_merge/intersection is used when ref(const) is faster
            Conclusion: MariaDB 5.2 is the same as 5.3 for this bug.

            Show
            psergey Sergei Petrunia added a comment - Re: index_merge/intersection is used when ref(const) is faster Conclusion: MariaDB 5.2 is the same as 5.3 for this bug.
            Hide
            psergey Sergei Petrunia added a comment -

            Re: index_merge/intersection is used when ref(const) is faster
            === Range analyzer ===
            lots.tsClosed IS NULL
            (gdb) print found_records
            $59 = 243422
            (gdb) p cost
            $58 =

            {io_count = 243423, avg_io_cost = 1, cpu_cost = 48684.410000000003, mem_cost = 0, import_cost = 0}

            contractNumber='1478876'
            (gdb) print found_records
            $60 = 28600
            (gdb) p cost
            $61 =

            {io_count = 28601, avg_io_cost = 1, cpu_cost = 5720.0100000000002, mem_cost = 0, import_cost = 0}

            As a result, we have:
            (gdb) print read_time
            $62 = 34321.010000000002
            (gdb) print best_records
            $63 = 28600

            === Index Merge
            T@10 : | | | | | | | | | | | | info: ROR key scans (original): tsClosed,contractNumber
            T@10 : | | | | | | | | | | | | info: ROR key scans (ordered): contractNumber,tsClosed

            T@10 : | | | | | | | | | | | >ror_intersect_add
            T@10 : | | | | | | | | | | | | info: Current out_rows= 2.13715e+06
            T@10 : | | | | | | | | | | | | info: Adding scan on contractNumber
            T@10 : | | | | | | | | | | | | info: is_cpk_scan: 0
            T@10 : | | | | | | | | | | | | >ror_scan_selectivity
            T@10 : | | | | | | | | | | | | | info: sel_arg step
            T@10 : | | | | | | | | | | | | | info: Selectivity multiplier: 0.0133823
            T@10 : | | | | | | | | | | | | | info: Returning multiplier: 0.0133823
            T@10 : | | | | | | | | | | | | <ror_scan_selectivity
            (gdb) p intersect->out_rows
            $69 = 28600
            (gdb) p intersect->index_records
            $70 = 28600
            (gdb) p intersect->index_scan_costs
            $71 = 602.50860420650099
            (gdb) p intersect->total_cost
            $72 = 17919.86261827108

            T@10 : | | | | | | | | | | | >ror_intersect_add
            T@10 : | | | | | | | | | | | | info: Current out_rows= 28600
            T@10 : | | | | | | | | | | | | info: Adding scan on tsClosed
            T@10 : | | | | | | | | | | | | info: is_cpk_scan: 0
            T@10 : | | | | | | | | | | | | >ror_scan_selectivity
            T@10 : | | | | | | | | | | | | | info: sel_arg step
            T@10 : | | | | | | | | | | | | | info: Selectivity multiplier: 0.1139
            T@10 : | | | | | | | | | | | | | info: Returning multiplier: 0.1139
            T@10 : | | | | | | | | | | | | <ror_scan_selectivity
            T@10 : | | | | | | | | | | | | info: ROR-intersect is covering now
            (gdb) p intersect->out_rows
            $73 = 3257.5451816248915
            (gdb) p intersect->index_records
            $74 = 272022
            (gdb) p intersect->index_scan_costs
            $75 = 5723.2619502868065
            (gdb) p intersect->total_cost
            $76 = 5723.2619502868065

            Show
            psergey Sergei Petrunia added a comment - Re: index_merge/intersection is used when ref(const) is faster === Range analyzer === lots.tsClosed IS NULL (gdb) print found_records $59 = 243422 (gdb) p cost $58 = {io_count = 243423, avg_io_cost = 1, cpu_cost = 48684.410000000003, mem_cost = 0, import_cost = 0} contractNumber='1478876' (gdb) print found_records $60 = 28600 (gdb) p cost $61 = {io_count = 28601, avg_io_cost = 1, cpu_cost = 5720.0100000000002, mem_cost = 0, import_cost = 0} As a result, we have: (gdb) print read_time $62 = 34321.010000000002 (gdb) print best_records $63 = 28600 === Index Merge T@10 : | | | | | | | | | | | | info: ROR key scans (original): tsClosed,contractNumber T@10 : | | | | | | | | | | | | info: ROR key scans (ordered): contractNumber,tsClosed T@10 : | | | | | | | | | | | >ror_intersect_add T@10 : | | | | | | | | | | | | info: Current out_rows= 2.13715e+06 T@10 : | | | | | | | | | | | | info: Adding scan on contractNumber T@10 : | | | | | | | | | | | | info: is_cpk_scan: 0 T@10 : | | | | | | | | | | | | >ror_scan_selectivity T@10 : | | | | | | | | | | | | | info: sel_arg step T@10 : | | | | | | | | | | | | | info: Selectivity multiplier: 0.0133823 T@10 : | | | | | | | | | | | | | info: Returning multiplier: 0.0133823 T@10 : | | | | | | | | | | | | <ror_scan_selectivity (gdb) p intersect->out_rows $69 = 28600 (gdb) p intersect->index_records $70 = 28600 (gdb) p intersect->index_scan_costs $71 = 602.50860420650099 (gdb) p intersect->total_cost $72 = 17919.86261827108 T@10 : | | | | | | | | | | | >ror_intersect_add T@10 : | | | | | | | | | | | | info: Current out_rows= 28600 T@10 : | | | | | | | | | | | | info: Adding scan on tsClosed T@10 : | | | | | | | | | | | | info: is_cpk_scan: 0 T@10 : | | | | | | | | | | | | >ror_scan_selectivity T@10 : | | | | | | | | | | | | | info: sel_arg step T@10 : | | | | | | | | | | | | | info: Selectivity multiplier: 0.1139 T@10 : | | | | | | | | | | | | | info: Returning multiplier: 0.1139 T@10 : | | | | | | | | | | | | <ror_scan_selectivity T@10 : | | | | | | | | | | | | info: ROR-intersect is covering now (gdb) p intersect->out_rows $73 = 3257.5451816248915 (gdb) p intersect->index_records $74 = 272022 (gdb) p intersect->index_scan_costs $75 = 5723.2619502868065 (gdb) p intersect->total_cost $76 = 5723.2619502868065
            Hide
            psergey Sergei Petrunia added a comment -

            Re: index_merge/intersection is used when ref(const) is faster
            Final comparison of range-vs-index_merge:
            (gdb) print min_cost ## this is index_merge
            $77 = 5723.2619502868065
            (gdb) print read_time ## this is read_time
            $78 = 34321.010000000002

            Show
            psergey Sergei Petrunia added a comment - Re: index_merge/intersection is used when ref(const) is faster Final comparison of range-vs-index_merge: (gdb) print min_cost ## this is index_merge $77 = 5723.2619502868065 (gdb) print read_time ## this is read_time $78 = 34321.010000000002
            Hide
            psergey Sergei Petrunia added a comment -

            Re: index_merge/intersection is used when ref(const) is faster
            Index_merge code assume conditions to be uncorrelated and "tsClosed IS NULL" adds selectivity of " Selectivity multiplier: 0.1139".

            The real value of the multiplier is 0.3. If I set that, I ge this plan:

            MariaDB [lots]> explain SELECT count AS amount FROM lots WHERE contractNumber='1478876' AND lots.tsClosed IS NULL;
            --------------------------------------------------------------------------------------------------------------------------------------------------------------+

            id select_type table type possible_keys key key_len ref rows Extra

            --------------------------------------------------------------------------------------------------------------------------------------------------------------+

            1 SIMPLE lots index_merge tsClosed,contractNumber contractNumber,tsClosed 5,5 NULL 8580 Using intersect(contractNumber,tsClosed); Using where; Using index

            --------------------------------------------------------------------------------------------------------------------------------------------------------------+

            #rows is closer to reality now (8580 is closer to the real value of 10,000), but still, index_merge is chosen.

            Show
            psergey Sergei Petrunia added a comment - Re: index_merge/intersection is used when ref(const) is faster Index_merge code assume conditions to be uncorrelated and "tsClosed IS NULL" adds selectivity of " Selectivity multiplier: 0.1139". The real value of the multiplier is 0.3. If I set that, I ge this plan: MariaDB [lots] > explain SELECT count AS amount FROM lots WHERE contractNumber='1478876' AND lots.tsClosed IS NULL; --- ----------- ----- ----------- ----------------------- ----------------------- ------- ---- ---- -------------------------------------------------------------------+ id select_type table type possible_keys key key_len ref rows Extra --- ----------- ----- ----------- ----------------------- ----------------------- ------- ---- ---- -------------------------------------------------------------------+ 1 SIMPLE lots index_merge tsClosed,contractNumber contractNumber,tsClosed 5,5 NULL 8580 Using intersect(contractNumber,tsClosed); Using where; Using index --- ----------- ----- ----------- ----------------------- ----------------------- ------- ---- ---- -------------------------------------------------------------------+ #rows is closer to reality now (8580 is closer to the real value of 10,000), but still, index_merge is chosen.
            Hide
            psergey Sergei Petrunia added a comment -

            Re: index_merge/intersection is used when ref(const) is faster
            The problem seems to be the mismatch between range and index_merge costs:

            Range access costs

            • are calculated in handler::multi_range_read_info_const(), the formula is roughly

            handler::read_time() + #rows / TIME_FOR_COMPARE

            index_merge costs:

            • are calculated in opt_range.cc.
            • for a 2-way intersection over a MyISAM table the formula is roughly:

            handler->keyread_time(index1, ...) + handler->keyread_time(index2, ...) + get_sweep_read_cost(...)

            note that the formula only includes index costs, #rows / TIME_FOR_COMPARE was not added.

            Show
            psergey Sergei Petrunia added a comment - Re: index_merge/intersection is used when ref(const) is faster The problem seems to be the mismatch between range and index_merge costs: Range access costs are calculated in handler::multi_range_read_info_const(), the formula is roughly handler::read_time() + #rows / TIME_FOR_COMPARE index_merge costs: are calculated in opt_range.cc. for a 2-way intersection over a MyISAM table the formula is roughly: handler->keyread_time(index1, ...) + handler->keyread_time(index2, ...) + get_sweep_read_cost(...) note that the formula only includes index costs, #rows / TIME_FOR_COMPARE was not added.
            Hide
            ratzpo Rasmus Johansson added a comment -

            Launchpad bug id: 1005898

            Show
            ratzpo Rasmus Johansson added a comment - Launchpad bug id: 1005898

              People

              • Assignee:
                psergey Sergei Petrunia
                Reporter:
                psergey Sergei Petrunia
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated: