Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8687

CREATE TEMPORARY TABLE vs plain SELECT causes 280x perf drop: InnoDB takes unneeded locks

    Details

    • Type: Bug
    • Status: Confirmed
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 10.1, 10.0
    • Fix Version/s: 10.1, 10.0
    • Labels:
    • Environment:
      ppc64el

      Description

      With the attached SQL and psdoit script to run queries concurrently, you can see that the CREATE TEMPORARY TABLE part of the query causes concurrency to tank. The related issue of MDEV-8684 is for problems in the InnoDB mutex code, while this bug is for the fact that InnoDB is down in lock_rec_convert_impl_to_exp() when it probably shouldn't be.

      It appears that CREATE TEMPORARY TABLE is triggering InnoDB to go and do a whole bunch of locking that it doesn't do when you just run a plain SELECT.

      The addition of CREATE TEMPORARY TABLE makes performance 280x worse.

       # mysql -u root < sql
      
       # time ./psdoit 1
       0.022s
      
       # time ./psdoit 10
       0.112s
      
       # time ./psdoit 100
       7.150s
      
       # time ./psdoit 200
       30.422s
      
       # time ./psdoit 400
       190.378s
      

      Things get very bad, very fast. We spend almost all our time in a locking
      food fight:

           37.42%  mysqld         mysqld              [.] ut_delay(unsigned long)                                                                                     
      		    |
      		    ---ut_delay(unsigned long)
      		       |          
      		       |--99.74%-- mutex_spin_wait(void*, bool, char const*, unsigned long)
      		       |          |          
      		       |          |--74.70%-- pfs_mutex_enter_func(ib_mutex_t*, char const*, unsigned long) [clone .constprop.71]
      		       |          |          lock_rec_convert_impl_to_expl(buf_block_t const*, unsigned char const*, dict_index_t*, unsigned long const*)
      		       |          |          lock_clust_rec_read_check_and_lock(unsigned long, buf_block_t const*, unsigned char const*, dict_index_t*, unsigned long const
      		       |          |          sel_set_rec_lock(buf_block_t const*, unsigned char const*, dict_index_t*, unsigned long const*, unsigned long, unsigned long, 
      		       |          |          row_search_for_mysql(unsigned char*, unsigned long, row_prebuilt_t*, unsigned long, unsigned long)
      		       |          |          |          
      		       |          |          |--76.07%-- ha_innobase::index_read(unsigned char*, unsigned char const*, unsigned int, ha_rkey_function)
      		       |          |          |          handler::index_read_map(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function)
      		       |          |          |          handler::ha_index_read_map(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function)
      		       |          |          |          join_read_key2(THD*, st_join_table*, TABLE*, st_table_ref*)
      		       |          |          |          sub_select(JOIN*, st_join_table*, bool)
      		       |          |          |
      

      and:

           27.96%  mysqld         [kernel.kallsyms]   [k] _raw_spin_lock                                                                                   
      		    |
      		    ---_raw_spin_lock
      		       |          
      		       |--58.11%-- futex_wait_setup
      		       |          |          
      		       |          |--99.96%-- 0
      		       |          |          futex_wait
      		       |          |          do_futex
      		       |          |          sys_futex
      		       |          |          system_call
      		       |          |          |          
      		       |          |          |--64.91%-- __lll_lock_wait
      		       |          |          |          |          
      		       |          |          |          |--83.30%-- pthread_mutex_lock
      		       |          |          |          |          |          
      		       |          |          |          |          |--52.72%-- os_event_reset(os_event*)
      		       |          |          |          |          |          sync_array_reserve_cell(sync_array_t*, void*, unsigned long, char const*, unsigned long, unsi
      		       |          |          |          |          |          mutex_spin_wait(void*, bool, char const*, unsigned long)
      		       |          |          |          |          |          |          
      		       |          |          |          |          |          |--75.08%-- pfs_mutex_enter_func(ib_mutex_t*, char const*, unsigned long) [clone .constprop.7
      		       |          |          |          |          |          |          lock_rec_convert_impl_to_expl(buf_block_t const*, unsigned char const*, dict_index
      		       |          |          |          |          |          |          lock_clust_rec_read_check_and_lock(unsigned long, buf_block_t const*, unsigned cha
      		       |          |          |          |          |          |          sel_set_rec_lock(buf_block_t const*, unsigned char const*, dict_index_t*, unsigned
      		       |          |          |          |          |          |          row_search_for_mysql(unsigned char*, unsigned long, row_prebuilt_t*, unsigned long 
      

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              stewart-ibm Stewart Smith added a comment -

              One possibility as to what's going on is that InnoDB is taking locks as if statement based binary logging is enabled.

              I'd argue that in this day and age, it should do so based on mixed mode replication rather than statement based, although I haven't looked at what the default binlog-format is in modern MariaDB (although there's probably a strong case for mixed mode if it isn't already)

              Show
              stewart-ibm Stewart Smith added a comment - One possibility as to what's going on is that InnoDB is taking locks as if statement based binary logging is enabled. I'd argue that in this day and age, it should do so based on mixed mode replication rather than statement based, although I haven't looked at what the default binlog-format is in modern MariaDB (although there's probably a strong case for mixed mode if it isn't already)
              Hide
              elenst Elena Stepanova added a comment -
                1 10 100 200 400
              MariaDB 5.5.45 0.019 0.084 2.160 14.047 60.398
              MySQL 5.5.45 0.022 0.084 1.085 3.891 41.689
              MariaDB 10.0.21 0.026 0.097 1.658 20.697 163.562
              MySQL 5.6.26 0.043 0.167 1.975 7.036 82.006
              MariaDB 10.1.6 0.020 0.075 1.973 27.845 152.873
              MySQL 5.7.8 0.043 0.201 2.147 4.043 21.544

              Small values are too volatile for meaningful comparison. High values are better on MySQL everywhere, on 5.7 they are quite impressive (but even on 5.6 they are much better than 10.0/10.1).

              Show
              elenst Elena Stepanova added a comment -   1 10 100 200 400 MariaDB 5.5.45 0.019 0.084 2.160 14.047 60.398 MySQL 5.5.45 0.022 0.084 1.085 3.891 41.689 MariaDB 10.0.21 0.026 0.097 1.658 20.697 163.562 MySQL 5.6.26 0.043 0.167 1.975 7.036 82.006 MariaDB 10.1.6 0.020 0.075 1.973 27.845 152.873 MySQL 5.7.8 0.043 0.201 2.147 4.043 21.544 Small values are too volatile for meaningful comparison. High values are better on MySQL everywhere, on 5.7 they are quite impressive (but even on 5.6 they are much better than 10.0/10.1).
              Hide
              svoj Sergey Vojtovich added a comment -

              Jan Lindström, could you check this? Should we merge fix from MySQL?

              Show
              svoj Sergey Vojtovich added a comment - Jan Lindström , could you check this? Should we merge fix from MySQL?

                People

                • Assignee:
                  jplindst Jan Lindström
                  Reporter:
                  stewart-ibm Stewart Smith
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  3 Start watching this issue

                  Dates

                  • Created:
                    Updated: