Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8004

key_buffer related crashes in MyISAM table check, stacktrace in error log truncated

    Details

    • Type: Bug
    • Status: Stalled
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 10.0.16
    • Fix Version/s: 10.0
    • Component/s: OTHER
    • Labels:
      None
    • Environment:
      CentOS 6.5 - 64bit

      Description

      There have been seveal cases now where MariaDB 10.0.16 crashed with a strack trace that only shows signal/crash handling stack frames, starting with pthread_cond_signal() call, e.g.:

        Thread pointer: 0x0x7f482702d008
        Attempting backtrace. You can use the following information to find out
        where mysqld died. If you see no messages after this, something went
        terribly wrong...
        stack_bottom = 0x7f90cb141840 thread_stack 0x48000
        /usr/sbin/mysqld(my_print_stacktrace+0x2b)[0xb73d3b]
        /usr/sbin/mysqld(handle_fatal_signal+0x398)[0x726518]
        /lib64/libpthread.so.0(+0xf710)[0x7f90cae0a710]
        /lib64/libpthread.so.0(pthread_cond_signal+0x44)[0x7f90cae06e34]
      
        Trying to get some variables.
        Some pointers may be invalid and cause the dump to abort.
        Query (0x7f48306c5015): is an invalid pointer
        Connection ID (thread ID): 5
        Status: NOT_KILLED
      
      

      So there is no information whatsoever that would hint towards what acutally lead up to this crash (and nothing suspicious in the error log leading up to the crash either). The only suspicious pattern that I could identify so far is that the Connection ID is usually a very low one (so far seen: 5 twice, 7 once, and one very high number in the 60 000 range ...) so things may be related to replication threads).

      Any idea how to get a more useful stack trace, preferrably without having to enable core dumps (as this happens on systems with huge memory / process size)?

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              serg Sergei Golubchik added a comment -

              Okay, I see. The complain here is not that MariaDB crashes, but that there is no stack trace.

              I'll reopen it.

              Show
              serg Sergei Golubchik added a comment - Okay, I see. The complain here is not that MariaDB crashes, but that there is no stack trace. I'll reopen it.
              Hide
              serg Sergei Golubchik added a comment -

              May be the problem here is not the stack tracing code itself, but that this bug that corrupts the stack in this specific way is only possible in 10.0 (meaning, 5.5 and mysql 5.6, 5.7 would've also printed no stack trace for such a stack corruption).

              If this is the case, the real stack trace (with gdb, as above) could help us to locate the bug.

              Show
              serg Sergei Golubchik added a comment - May be the problem here is not the stack tracing code itself, but that this bug that corrupts the stack in this specific way is only possible in 10.0 (meaning, 5.5 and mysql 5.6, 5.7 would've also printed no stack trace for such a stack corruption). If this is the case, the real stack trace (with gdb, as above) could help us to locate the bug.
              Hide
              serg Sergei Golubchik added a comment -

              MDEV-8325 — another issue with a crash without a stack trace.

              Show
              serg Sergei Golubchik added a comment - MDEV-8325 — another issue with a crash without a stack trace.
              Hide
              hholzgra Hartmut Holzgraefe added a comment - - edited

              Two crashes with same stack trace have now been reported that have happened after MariaDB 10.0.19 was restarted after a hard kill

              Core dumps were available and unmangled stack traces could be extracted

              These two crashes happened during MyISAM table repair (triggered by auto repair on open feature)
              while accessing the MyISAM key buffer cache. key_buffer_size in these cases were 50G
              on systems with a total of 384GB RAM. Similar previous MariaDB 5.5 instances did not
              crash in this way at all, so this may due to key buffer related changes in 10.0.x?

              Last function call before pthread_cond_signal() is simple_key_cache_read().
              I haven't found any direct call to pthread_cond_signal() or the mysql_cond_signal() convenience macro in there yet. There is probably at least one extra macro or
              inline function layer in between that doesn't show in the stack trace?

              #0  0x00007ffc3de858ac in pthread_kill () from /lib64/libpthread.so.0
              #1  0x000000000072b342 in handle_fatal_signal ()
              #2  <signal handler called>
              #3  0x00007ffc3de84e34 in pthread_cond_signal@@GLIBC_2.3.2 ()
                 from /lib64/libpthread.so.0
              #4  0x0000000000b6a8b3 in simple_key_cache_read ()
              #5  0x0000000000b2f975 in _mi_fetch_keypage ()
              #6  0x0000000000b0b94b in chk_index_down ()
              #7  0x0000000000b0bea3 in chk_index ()
              #8  0x0000000000b0b98b in chk_index_down ()
              #9  0x0000000000b0bea3 in chk_index ()
              #10 0x0000000000b0b98b in chk_index_down ()
              #11 0x0000000000b0bea3 in chk_index ()
              #12 0x0000000000b0b98b in chk_index_down ()
              #13 0x0000000000b0bea3 in chk_index ()
              #14 0x0000000000b0dcc2 in chk_key ()
              #15 0x0000000000b0872d in ha_myisam::check(THD*, st_ha_check_opt*) ()
              #16 0x0000000000b05844 in ha_myisam::check_and_repair(THD*) ()
              #17 0x0000000000592011 in Open_table_context::recover_from_failed_open() ()
              #18 0x0000000000596e3e in open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) ()
              #19 0x000000000063c699 in mysqld_show_create(THD*, TABLE_LIST*) ()
              #20 0x00000000005d80dc in mysql_execute_command(THD*) ()
              #21 0x00000000005da5e7 in mysql_parse ()
              #22 0x00000000005dc9cc in dispatch_command(enum_server_command, THD*, char*, unsigned int) ()
              #23 0x0000000000699b53 in do_handle_one_connection(THD*) ()
              #24 0x0000000000699c22 in handle_one_connection ()
              #25 0x0000000000a715fd in pfs_spawn_thread ()
              #26 0x00007ffc3de809d1 in start_thread () from /lib64/libpthread.so.0
              #27 0x00007ffc3c59a8fd in clone () from /lib64/libc.so.6
              
              Show
              hholzgra Hartmut Holzgraefe added a comment - - edited Two crashes with same stack trace have now been reported that have happened after MariaDB 10.0.19 was restarted after a hard kill Core dumps were available and unmangled stack traces could be extracted These two crashes happened during MyISAM table repair (triggered by auto repair on open feature) while accessing the MyISAM key buffer cache. key_buffer_size in these cases were 50G on systems with a total of 384GB RAM. Similar previous MariaDB 5.5 instances did not crash in this way at all, so this may due to key buffer related changes in 10.0.x? Last function call before pthread_cond_signal() is simple_key_cache_read() . I haven't found any direct call to pthread_cond_signal() or the mysql_cond_signal() convenience macro in there yet. There is probably at least one extra macro or inline function layer in between that doesn't show in the stack trace? #0 0x00007ffc3de858ac in pthread_kill () from /lib64/libpthread.so.0 #1 0x000000000072b342 in handle_fatal_signal () #2 <signal handler called> #3 0x00007ffc3de84e34 in pthread_cond_signal@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #4 0x0000000000b6a8b3 in simple_key_cache_read () #5 0x0000000000b2f975 in _mi_fetch_keypage () #6 0x0000000000b0b94b in chk_index_down () #7 0x0000000000b0bea3 in chk_index () #8 0x0000000000b0b98b in chk_index_down () #9 0x0000000000b0bea3 in chk_index () #10 0x0000000000b0b98b in chk_index_down () #11 0x0000000000b0bea3 in chk_index () #12 0x0000000000b0b98b in chk_index_down () #13 0x0000000000b0bea3 in chk_index () #14 0x0000000000b0dcc2 in chk_key () #15 0x0000000000b0872d in ha_myisam::check(THD*, st_ha_check_opt*) () #16 0x0000000000b05844 in ha_myisam::check_and_repair(THD*) () #17 0x0000000000592011 in Open_table_context::recover_from_failed_open() () #18 0x0000000000596e3e in open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) () #19 0x000000000063c699 in mysqld_show_create(THD*, TABLE_LIST*) () #20 0x00000000005d80dc in mysql_execute_command(THD*) () #21 0x00000000005da5e7 in mysql_parse () #22 0x00000000005dc9cc in dispatch_command(enum_server_command, THD*, char*, unsigned int) () #23 0x0000000000699b53 in do_handle_one_connection(THD*) () #24 0x0000000000699c22 in handle_one_connection () #25 0x0000000000a715fd in pfs_spawn_thread () #26 0x00007ffc3de809d1 in start_thread () from /lib64/libpthread.so.0 #27 0x00007ffc3c59a8fd in clone () from /lib64/libc.so.6
              Hide
              hholzgra Hartmut Holzgraefe added a comment -

              It's also weired that gdb could produce a stack trace just fine while the internal backtrace printing code couldn't?

              Show
              hholzgra Hartmut Holzgraefe added a comment - It's also weired that gdb could produce a stack trace just fine while the internal backtrace printing code couldn't?

                People

                • Assignee:
                  monty Michael Widenius
                  Reporter:
                  hholzgra Hartmut Holzgraefe
                • Votes:
                  2 Vote for this issue
                  Watchers:
                  6 Start watching this issue

                  Dates

                  • Created:
                    Updated: