Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7401

Query optimizer crashes while optimizing a join

    Details

      Description

      In our production database the following statement crashes the query optimizer:

      explain select b.info1
      from
      (
      select distinct info1
      from state s1
      where s1.status='FR' and s1.timestamp>'20141101' and s1.timestamp<'20141105'
      ) b
      left outer join state s
        on s.timestamp>='20141201' and s.timestamp < '20141201230000' and s.info1 = b.info1
      where s.`status`='FR'
      

      This statement is a reduced one, that is the real statement is much larger, but this seems to be the part that crashes the optimizer. When we select slightly different date values, the query can be optimized with the following execution plan:

      'id','select_type','table','type','possible_keys','key','key_len','ref','rows','Extra','1','PRIMARY','<derived2>','ALL',\N,\N,\N,\N,'5650','Using where','1','PRIMARY','s','ref','timestamp,iStatus,iStatusNext,info1','info1','63','b.info1','27','Using index condition; Using where','2','DERIVED','s1','range','timestamp,iStatus,iStatusNext','iStatus','312',\N,'5650','Using index condition; Using temporary',
      

      If we replace the left outer join by inner join, the statement works. If we drop the subselect and replace it with something static, it works, too. If we remove the "where" clause and write this condition into the "on" clause (which of course changes the statement), it works, too.

      In the log file of the server we find the following error report:

      150102 12:51:22 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
      
      To report this bug, see http://kb.askmonty.org/en/reporting-bugs
      
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.
      
      Server version: 10.0.14-MariaDB-log
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=13
      max_threads=502
      thread_count=11
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1233792 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
      
      Thread pointer: 0x0x7ffa59f0b008
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7ffc93195d00 thread_stack 0x48000
      /usr/sbin/mysqld(my_print_stacktrace+0x2b)[0xb6b85b]
      /usr/sbin/mysqld(handle_fatal_signal+0x398)[0x723f08]
      /lib64/libpthread.so.0[0x36d1a0f710]
      /usr/sbin/mysqld(_Z16best_access_pathP4JOINP13st_join_tableyjbdP11st_positionS4_+0x158b)[0x5ff87b]
      /usr/sbin/mysqld[0x6012a8]
      /usr/sbin/mysqld[0x601663]
      /usr/sbin/mysqld[0x60198b]
      /usr/sbin/mysqld(_Z11choose_planP4JOINy+0xf5)[0x601ef5]
      /usr/sbin/mysqld[0x52a437]
      /usr/sbin/mysqld(_ZN4JOIN14optimize_innerEv+0x6de)[0x61d3de]
      /usr/sbin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0xd8)[0x61fab8]
      /usr/sbin/mysqld(_Z19mysql_explain_unionP3THDP18st_select_lex_unitP13select_result+0x117)[0x6205c7]
      /usr/sbin/mysqld[0x5cad47]
      /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x4c97)[0x5d58d7]
      /usr/sbin/mysqld[0x5d7402]
      /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1b20)[0x5d95c0]
      /usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x453)[0x6946a3]
      /usr/sbin/mysqld(handle_one_connection+0x42)[0x694772]
      /usr/sbin/mysqld[0xa6137d]
      /lib64/libpthread.so.0[0x36d1a079d1]
      /lib64/libc.so.6(clone+0x6d)[0x36d16e89dd]
      
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x7ffa41c22020): is an invalid pointer
      Connection ID (thread ID): 25
      Status: NOT_KILLED
      
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on
      

      Now I hope, that this is enough information for you to find out, what is happening here. If not, please feel free to ask for anything missing.

      Best regards,

      Thomas Mischke

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              elenst Elena Stepanova added a comment -

              Sorry, I missed that comment. I will look at the data shortly.
              Regarding 10.0.16, it's too early to try anyway, the bugfix we are talking about was pushed into 5.5, but it hasn't yet been merged into 10.0 tree.

              Show
              elenst Elena Stepanova added a comment - Sorry, I missed that comment. I will look at the data shortly. Regarding 10.0.16, it's too early to try anyway, the bugfix we are talking about was pushed into 5.5, but it hasn't yet been merged into 10.0 tree.
              Hide
              elenst Elena Stepanova added a comment -

              Thanks again for the query and the data, I am now able to reproduce the crash.
              There is no point trying the latest builds, the bug is still present in the current tree. I applied MDEV-6830 bugfix to the tree manually, the crash went away; but I'll keep this report open until the proper merge 5.5->10.0 and will re-check it again then.

              As a workaround, you can try either
              SET optimizer_switch = 'derived_with_keys=off';
              or
              SET big_tables = 0;
              Hopefully at least one will help. In my test each of them helps, but it can depend on the data.

              Test case reduced farther (although it makes sense to also try the initial one after the bug is fixed, just to make sure. For the initial one, take the table from the comment and the query from the description, no need to populate the table; make sure that InnoDB is present and big_tables = ON).

              Test case
              
              --source include/have_innodb.inc
              
              CREATE TABLE `state` (
                `id` int(11) NOT NULL,
                `info1` varchar(1024) DEFAULT NULL,
                `status` varchar(100) DEFAULT NULL,
                PRIMARY KEY (id),
                KEY (status)
              ) ENGINE=InnoDB;
              
              INSERT INTO state VALUES (1,'foo','FR'),(2,'bar','NN');
              
              SET big_tables=1;
              
              select * from
              (
              select distinct info1
              from state s1
              ) b
              left join state s
                on s.info1 = b.info1
              where s.`status`='FR';
              
              Stack trace from 10.0 revno 4550
              #3  <signal handler called>
              #4  0x0000000000715f9b in Index_statistics::get_avg_frequency (this=0xa5a5a5a5a5a5a5a5, i=0) at 10.0/sql/sql_statistics.h:418
              #5  0x0000000000760688 in st_key::actual_rec_per_key (this=0x7f1b7442a858, i=0) at 10.0/sql/table.cc:7176
              #6  0x00000000006bf5c3 in best_access_path (join=0x7f1b7445b088, s=0x7f1b74629b60, remaining_tables=1, idx=1, disable_jbuf=false, record_count=1, pos=0x7f1b7462a4e0, loose_scan_pos=0x7f1b8957db30) at 10.0/sql/sql_select.cc:6001
              #7  0x00000000006c2f14 in best_extension_by_limited_search (join=0x7f1b7445b088, remaining_tables=1, idx=1, record_count=1, read_time=2.2000000000000002, search_depth=61, prune_level=1, use_cond_selectivity=1) at 10.0/sql/sql_select.cc:7701
              #8  0x00000000006c337a in best_extension_by_limited_search (join=0x7f1b7445b088, remaining_tables=3, idx=0, record_count=1, read_time=0, search_depth=62, prune_level=1, use_cond_selectivity=1) at 10.0/sql/sql_select.cc:7770
              #9  0x00000000006c1787 in greedy_search (join=0x7f1b7445b088, remaining_tables=3, search_depth=62, prune_level=1, use_cond_selectivity=1) at 10.0/sql/sql_select.cc:6942
              #10 0x00000000006c0d6f in choose_plan (join=0x7f1b7445b088, join_tables=3) at 10.0/sql/sql_select.cc:6519
              #11 0x00000000006ba73d in make_join_statistics (join=0x7f1b7445b088, tables_list=..., conds=0x7f1b74629420, keyuse_array=0x7f1b7445b390) at 10.0/sql/sql_select.cc:4038
              #12 0x00000000006b13a3 in JOIN::optimize_inner (this=0x7f1b7445b088) at 10.0/sql/sql_select.cc:1339
              #13 0x00000000006b033a in JOIN::optimize (this=0x7f1b7445b088) at 10.0/sql/sql_select.cc:1024
              #14 0x00000000006b7ebf in mysql_select (thd=0x7f1b80dee070, rref_pointer_array=0x7f1b80df26e0, tables=0x7f1b744196c8, wild_num=1, fields=..., conds=0x7f1b7441aa48, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7f1b74743020, unit=0x7f1b80df1d80, select_lex=0x7f1b80df2468) at 10.0/sql/sql_select.cc:3294
              #15 0x00000000006ae549 in handle_select (thd=0x7f1b80dee070, lex=0x7f1b80df1cb8, result=0x7f1b74743020, setup_tables_done_option=0) at 10.0/sql/sql_select.cc:373
              #16 0x0000000000683244 in execute_sqlcom_select (thd=0x7f1b80dee070, all_tables=0x7f1b744196c8) at 10.0/sql/sql_parse.cc:5261
              #17 0x000000000067b542 in mysql_execute_command (thd=0x7f1b80dee070) at 10.0/sql/sql_parse.cc:2545
              #18 0x00000000006859bd in mysql_parse (thd=0x7f1b80dee070, rawbuf=0x7f1b74741088 "select * from\n(\nselect distinct info1\nfrom state s1\n) b\nleft join state s\non s.info1 = b.info1\nwhere s.`status`='FR'", length=116, parser_state=0x7f1b8957f610) at 10.0/sql/sql_parse.cc:6407
              #19 0x00000000006787b2 in dispatch_command (command=COM_QUERY, thd=0x7f1b80dee070, packet=0x7f1b7bab9071 "select * from\n(\nselect distinct info1\nfrom state s1\n) b\nleft join state s\non s.info1 = b.info1\nwhere s.`status`='FR'", packet_length=116) at 10.0/sql/sql_parse.cc:1299
              #20 0x0000000000677b57 in do_command (thd=0x7f1b80dee070) at 10.0/sql/sql_parse.cc:996
              #21 0x000000000079489b in do_handle_one_connection (thd_arg=0x7f1b80dee070) at 10.0/sql/sql_connect.cc:1375
              #22 0x00000000007945ee in handle_one_connection (arg=0x7f1b80dee070) at 10.0/sql/sql_connect.cc:1289
              #23 0x0000000000ccdc56 in pfs_spawn_thread (arg=0x7f1b7bab10f0) at 10.0/storage/perfschema/pfs.cc:1860
              #24 0x00007f1b891b9b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
              #25 0x00007f1b8747020d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
              
              Show
              elenst Elena Stepanova added a comment - Thanks again for the query and the data, I am now able to reproduce the crash. There is no point trying the latest builds, the bug is still present in the current tree. I applied MDEV-6830 bugfix to the tree manually, the crash went away; but I'll keep this report open until the proper merge 5.5->10.0 and will re-check it again then. As a workaround, you can try either SET optimizer_switch = 'derived_with_keys=off'; or SET big_tables = 0; Hopefully at least one will help. In my test each of them helps, but it can depend on the data. Test case reduced farther (although it makes sense to also try the initial one after the bug is fixed, just to make sure. For the initial one, take the table from the comment and the query from the description, no need to populate the table; make sure that InnoDB is present and big_tables = ON). Test case --source include/have_innodb.inc CREATE TABLE `state` ( `id` int(11) NOT NULL, `info1` varchar(1024) DEFAULT NULL, `status` varchar(100) DEFAULT NULL, PRIMARY KEY (id), KEY (status) ) ENGINE=InnoDB; INSERT INTO state VALUES (1,'foo','FR'),(2,'bar','NN'); SET big_tables=1; select * from ( select distinct info1 from state s1 ) b left join state s on s.info1 = b.info1 where s.`status`='FR'; Stack trace from 10.0 revno 4550 #3 <signal handler called> #4 0x0000000000715f9b in Index_statistics::get_avg_frequency (this=0xa5a5a5a5a5a5a5a5, i=0) at 10.0/sql/sql_statistics.h:418 #5 0x0000000000760688 in st_key::actual_rec_per_key (this=0x7f1b7442a858, i=0) at 10.0/sql/table.cc:7176 #6 0x00000000006bf5c3 in best_access_path (join=0x7f1b7445b088, s=0x7f1b74629b60, remaining_tables=1, idx=1, disable_jbuf=false, record_count=1, pos=0x7f1b7462a4e0, loose_scan_pos=0x7f1b8957db30) at 10.0/sql/sql_select.cc:6001 #7 0x00000000006c2f14 in best_extension_by_limited_search (join=0x7f1b7445b088, remaining_tables=1, idx=1, record_count=1, read_time=2.2000000000000002, search_depth=61, prune_level=1, use_cond_selectivity=1) at 10.0/sql/sql_select.cc:7701 #8 0x00000000006c337a in best_extension_by_limited_search (join=0x7f1b7445b088, remaining_tables=3, idx=0, record_count=1, read_time=0, search_depth=62, prune_level=1, use_cond_selectivity=1) at 10.0/sql/sql_select.cc:7770 #9 0x00000000006c1787 in greedy_search (join=0x7f1b7445b088, remaining_tables=3, search_depth=62, prune_level=1, use_cond_selectivity=1) at 10.0/sql/sql_select.cc:6942 #10 0x00000000006c0d6f in choose_plan (join=0x7f1b7445b088, join_tables=3) at 10.0/sql/sql_select.cc:6519 #11 0x00000000006ba73d in make_join_statistics (join=0x7f1b7445b088, tables_list=..., conds=0x7f1b74629420, keyuse_array=0x7f1b7445b390) at 10.0/sql/sql_select.cc:4038 #12 0x00000000006b13a3 in JOIN::optimize_inner (this=0x7f1b7445b088) at 10.0/sql/sql_select.cc:1339 #13 0x00000000006b033a in JOIN::optimize (this=0x7f1b7445b088) at 10.0/sql/sql_select.cc:1024 #14 0x00000000006b7ebf in mysql_select (thd=0x7f1b80dee070, rref_pointer_array=0x7f1b80df26e0, tables=0x7f1b744196c8, wild_num=1, fields=..., conds=0x7f1b7441aa48, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7f1b74743020, unit=0x7f1b80df1d80, select_lex=0x7f1b80df2468) at 10.0/sql/sql_select.cc:3294 #15 0x00000000006ae549 in handle_select (thd=0x7f1b80dee070, lex=0x7f1b80df1cb8, result=0x7f1b74743020, setup_tables_done_option=0) at 10.0/sql/sql_select.cc:373 #16 0x0000000000683244 in execute_sqlcom_select (thd=0x7f1b80dee070, all_tables=0x7f1b744196c8) at 10.0/sql/sql_parse.cc:5261 #17 0x000000000067b542 in mysql_execute_command (thd=0x7f1b80dee070) at 10.0/sql/sql_parse.cc:2545 #18 0x00000000006859bd in mysql_parse (thd=0x7f1b80dee070, rawbuf=0x7f1b74741088 "select * from\n(\nselect distinct info1\nfrom state s1\n) b\nleft join state s\non s.info1 = b.info1\nwhere s.`status`='FR'", length=116, parser_state=0x7f1b8957f610) at 10.0/sql/sql_parse.cc:6407 #19 0x00000000006787b2 in dispatch_command (command=COM_QUERY, thd=0x7f1b80dee070, packet=0x7f1b7bab9071 "select * from\n(\nselect distinct info1\nfrom state s1\n) b\nleft join state s\non s.info1 = b.info1\nwhere s.`status`='FR'", packet_length=116) at 10.0/sql/sql_parse.cc:1299 #20 0x0000000000677b57 in do_command (thd=0x7f1b80dee070) at 10.0/sql/sql_parse.cc:996 #21 0x000000000079489b in do_handle_one_connection (thd_arg=0x7f1b80dee070) at 10.0/sql/sql_connect.cc:1375 #22 0x00000000007945ee in handle_one_connection (arg=0x7f1b80dee070) at 10.0/sql/sql_connect.cc:1289 #23 0x0000000000ccdc56 in pfs_spawn_thread (arg=0x7f1b7bab10f0) at 10.0/storage/perfschema/pfs.cc:1860 #24 0x00007f1b891b9b50 in start_thread (arg=<optimized out>) at pthread_create.c:304 #25 0x00007f1b8747020d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
              Hide
              thomas.mischke Thomas Mischke added a comment -

              Both workarounds prevent out initial statement from crashing the server.

              Thanks for the workaround, and looking forward to the next release without this bug!

              Show
              thomas.mischke Thomas Mischke added a comment - Both workarounds prevent out initial statement from crashing the server. Thanks for the workaround, and looking forward to the next release without this bug!
              Hide
              elenst Elena Stepanova added a comment -

              Checked both testcases on 10.0 revno 4587 (after the merge), no crashes anymore. Closing as fixed.

              Show
              elenst Elena Stepanova added a comment - Checked both testcases on 10.0 revno 4587 (after the merge), no crashes anymore. Closing as fixed.
              Hide
              thomas.mischke Thomas Mischke added a comment -

              Today we updated to version 10.0.16. The above mentioned statements are now working without workaround. Thanks!

              Show
              thomas.mischke Thomas Mischke added a comment - Today we updated to version 10.0.16. The above mentioned statements are now working without workaround. Thanks!

                People

                • Assignee:
                  Unassigned
                  Reporter:
                  thomas.mischke Thomas Mischke
                • Votes:
                  2 Vote for this issue
                  Watchers:
                  6 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: