Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6713

A join query is faster with index_condition_pushdown=off

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 5.5.39, 10.0.14
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      An example from customer: a join query is faster when one runs it with index_condition_pushdown=off. The difference is not big, but it's repeatable, so it is worth to investigate it.

      Query and dataset were provided by Joffrey MICHAIE.

      Details are here in private comments (it's customer data).

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            psergey Sergei Petrunia added a comment -

            Tried on current 10.0 tree, release build. Query execution time:

            • index_condition_pushdown=off: 0.31 .. 0.33 sec
            • index_condition_pushdown=on: 0.40..0.42 sec.
            Show
            psergey Sergei Petrunia added a comment - Tried on current 10.0 tree, release build. Query execution time: index_condition_pushdown=off: 0.31 .. 0.33 sec index_condition_pushdown=on: 0.40..0.42 sec.
            Hide
            elenst Elena Stepanova added a comment -

            Values from the current 10.0 tree revno 4383 (even scarier, but maybe it's a difference in build options; built as cmake && make):
            ICP:
            1 row in set (1.87 sec)
            1 row in set (1.79 sec)
            1 row in set (1.70 sec)
            1 row in set (1.69 sec)
            1 row in set (1.66 sec)
            no ICP:
            1 row in set (1.11 sec)
            1 row in set (1.02 sec)
            1 row in set (1.09 sec)
            1 row in set (1.08 sec)
            1 row in set (1.21 sec)
            ICP:
            1 row in set (1.71 sec)
            1 row in set (1.67 sec)
            1 row in set (1.69 sec)
            1 row in set (1.77 sec)
            1 row in set (1.61 sec)
            no ICP:
            1 row in set (1.48 sec)
            1 row in set (1.01 sec)
            1 row in set (1.09 sec)
            1 row in set (1.13 sec)
            1 row in set (0.96 sec)
            Averages:
            no ICP: 1,118
            ICP: 1,716

            Show
            elenst Elena Stepanova added a comment - Values from the current 10.0 tree revno 4383 (even scarier, but maybe it's a difference in build options; built as cmake && make): ICP: 1 row in set (1.87 sec) 1 row in set (1.79 sec) 1 row in set (1.70 sec) 1 row in set (1.69 sec) 1 row in set (1.66 sec) no ICP: 1 row in set (1.11 sec) 1 row in set (1.02 sec) 1 row in set (1.09 sec) 1 row in set (1.08 sec) 1 row in set (1.21 sec) ICP: 1 row in set (1.71 sec) 1 row in set (1.67 sec) 1 row in set (1.69 sec) 1 row in set (1.77 sec) 1 row in set (1.61 sec) no ICP: 1 row in set (1.48 sec) 1 row in set (1.01 sec) 1 row in set (1.09 sec) 1 row in set (1.13 sec) 1 row in set (0.96 sec) Averages: no ICP: 1,118 ICP: 1,716
            Hide
            psergey Sergei Petrunia added a comment -

            Investigation:

            InnoDB (and XtraDB too) has a "prefetch cache" that is used from
            row_search_for_mysql().

            When an index scan has read MYSQL_FETCH_CACHE_THRESHOLD=4 rows, then
            prefetch cache is turned on (See row_sel_enqueue_cache_row_for_mysql() calls).
            InnoDB reads MYSQL_FETCH_CACHE_SIZE=8 rows into the cache, and then subsequent
            calls to row_search_for_mysql() read from the cache until it is empty.

            The the cache stores records in MySQL format.

            When ICP is not used, InnoDB will decode index tuple directly into the cache
            record.

            When ICP is used, row_search_for_mysql() needs to check ICP condition before
            putting the row into cache.
            ICP condition can only be checked when index fields are in table->record[0].
            The solution is to unpack into table->record[0], make ICP check, and then put
            the record into cache with memcpy:

            row_sel_enqueue_cache_row_for_mysql(
            /*================================*/
            	byte*		mysql_rec,	/*!< in/out: MySQL record */
            	row_prebuilt_t*	prebuilt)	/*!< in/out: prebuilt struct */
            {
            	/* For non ICP code path the row should already exist in the
            	next fetch cache slot. */
            
            	if (prebuilt->idx_cond != NULL) {
            		byte*	dest = row_sel_fetch_last_buf(prebuilt);
            
            		ut_memcpy(dest, mysql_rec, prebuilt->mysql_row_len);
            	}
            
            Show
            psergey Sergei Petrunia added a comment - Investigation: InnoDB (and XtraDB too) has a "prefetch cache" that is used from row_search_for_mysql(). When an index scan has read MYSQL_FETCH_CACHE_THRESHOLD=4 rows, then prefetch cache is turned on (See row_sel_enqueue_cache_row_for_mysql() calls). InnoDB reads MYSQL_FETCH_CACHE_SIZE=8 rows into the cache, and then subsequent calls to row_search_for_mysql() read from the cache until it is empty. The the cache stores records in MySQL format. When ICP is not used, InnoDB will decode index tuple directly into the cache record. When ICP is used, row_search_for_mysql() needs to check ICP condition before putting the row into cache. ICP condition can only be checked when index fields are in table->record [0] . The solution is to unpack into table->record [0] , make ICP check, and then put the record into cache with memcpy: row_sel_enqueue_cache_row_for_mysql( /*================================*/ byte* mysql_rec, /*!< in/out: MySQL record */ row_prebuilt_t* prebuilt) /*!< in/out: prebuilt struct */ { /* For non ICP code path the row should already exist in the next fetch cache slot. */ if (prebuilt->idx_cond != NULL) { byte* dest = row_sel_fetch_last_buf(prebuilt); ut_memcpy(dest, mysql_rec, prebuilt->mysql_row_len); }
            Hide
            psergey Sergei Petrunia added a comment -

            This works.

            However, suppose the table has a few VARCHAR(255) columns, but actual values are much shorter (typical).

            Let's assume the row we are unpacking has 10-byte values.
            Then, decoding InnoDB index tuples will need to write 10 bytes (+1 byte for length).
            However, ut_memcpy() call above will have to copy 255 bytes.
            This seems to be the reason for the slowdown.

            Show
            psergey Sergei Petrunia added a comment - This works. However, suppose the table has a few VARCHAR(255) columns, but actual values are much shorter (typical). Let's assume the row we are unpacking has 10-byte values. Then, decoding InnoDB index tuples will need to write 10 bytes (+1 byte for length). However, ut_memcpy() call above will have to copy 255 bytes. This seems to be the reason for the slowdown.
            Hide
            psergey Sergei Petrunia added a comment -

            Using the above as a guideline, I created an artificial dataset that would show the problem.. and it didn't work. It seems, I'm missing something here.

            Show
            psergey Sergei Petrunia added a comment - Using the above as a guideline, I created an artificial dataset that would show the problem.. and it didn't work. It seems, I'm missing something here.
            Hide
            psergey Sergei Petrunia added a comment - - edited

            Re-ran benchmark on the smaller query.

            Top CPU consumers when ICP=ON:

                18.45%   mysqld  mysqld              [.] btr_search_guess_on_hash(dict_index_t*, btr_search_t*, dtuple_t const*, unsigned long, unsigned long, btr_cur_t*, unsigned long, mtr_t*)
                18.43%   mysqld  mysqld              [.] rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned long*, unsigned long, mem_block_info_t**)
                 8.08%   mysqld  mysqld              [.] mtr_memo_slot_release_func(mtr_memo_slot_t*)
                 7.12%   mysqld  mysqld              [.] buf_page_get_known_nowait(unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*)
                 5.44%   mysqld  mysqld              [.] buf_page_optimistic_get(unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*)
                 4.52%   mysqld  mysqld              [.] row_search_for_mysql(unsigned char*, unsigned long, row_prebuilt_t*, unsigned long, unsigned long)
            

            Top CPU consumers when ICP=OFF:

                29.21%   mysqld  mysqld              [.] row_sel_enqueue_cache_row_for_mysql(unsigned char*, row_prebuilt_t*)
                11.75%   mysqld  mysqld              [.] rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned long*, unsigned long, mem_block_info_t**)
                 5.37%   mysqld  mysqld              [.] mtr_memo_slot_release_func(mtr_memo_slot_t*)
                 3.98%   mysqld  mysqld              [.] buf_page_get_known_nowait(unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*)
                 3.57%   mysqld  mysqld              [.] buf_page_optimistic_get(unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*)
                 2.79%   mysqld  mysqld              [.] row_search_for_mysql(unsigned char*, unsigned long, row_prebuilt_t*, unsigned long, unsigned long)
            
            Show
            psergey Sergei Petrunia added a comment - - edited Re-ran benchmark on the smaller query. Top CPU consumers when ICP=ON: 18.45% mysqld mysqld [.] btr_search_guess_on_hash(dict_index_t*, btr_search_t*, dtuple_t const*, unsigned long, unsigned long, btr_cur_t*, unsigned long, mtr_t*) 18.43% mysqld mysqld [.] rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned long*, unsigned long, mem_block_info_t**) 8.08% mysqld mysqld [.] mtr_memo_slot_release_func(mtr_memo_slot_t*) 7.12% mysqld mysqld [.] buf_page_get_known_nowait(unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*) 5.44% mysqld mysqld [.] buf_page_optimistic_get(unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*) 4.52% mysqld mysqld [.] row_search_for_mysql(unsigned char*, unsigned long, row_prebuilt_t*, unsigned long, unsigned long) Top CPU consumers when ICP=OFF: 29.21% mysqld mysqld [.] row_sel_enqueue_cache_row_for_mysql(unsigned char*, row_prebuilt_t*) 11.75% mysqld mysqld [.] rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned long*, unsigned long, mem_block_info_t**) 5.37% mysqld mysqld [.] mtr_memo_slot_release_func(mtr_memo_slot_t*) 3.98% mysqld mysqld [.] buf_page_get_known_nowait(unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*) 3.57% mysqld mysqld [.] buf_page_optimistic_get(unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*) 2.79% mysqld mysqld [.] row_search_for_mysql(unsigned char*, unsigned long, row_prebuilt_t*, unsigned long, unsigned long)
            Hide
            psergey Sergei Petrunia added a comment - - edited

            Again, the same picture, but ICP=ON has the new biggest user: row_sel_enqueue_cache_row_for_mysql

            Show
            psergey Sergei Petrunia added a comment - - edited Again, the same picture, but ICP=ON has the new biggest user: row_sel_enqueue_cache_row_for_mysql
            Hide
            jplindst Jan Lindström added a comment -

            Is this really a bug? In my understanding there is no certainty that ICP would help on all cases, When ICP is used, row_search_for_mysql() needs to check ICP condition before putting the row into cache. ICP condition can only be checked when index fields are in table->record[0].
            The solution is to unpack into table->record[0], make ICP check, and then put the record into cache with memcpy. All this takes time.

            Show
            jplindst Jan Lindström added a comment - Is this really a bug? In my understanding there is no certainty that ICP would help on all cases, When ICP is used, row_search_for_mysql() needs to check ICP condition before putting the row into cache. ICP condition can only be checked when index fields are in table->record [0] . The solution is to unpack into table->record [0] , make ICP check, and then put the record into cache with memcpy. All this takes time.

              People

              • Assignee:
                psergey Sergei Petrunia
                Reporter:
                psergey Sergei Petrunia
              • Votes:
                1 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated: