Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8066

Crash on unloading semisync_master plugin

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 10.0.15
    • Fix Version/s: 10.0
    • Component/s: Replication
    • Labels:
      None
    • Environment:
      Linux

      Description

      This has so far only been observed once and was not reproducible so far
      (even with several clients doing transactions in parallel for days while the plugin was unloaded and then installed again every second):

      150331 9:19:22 [Note] Semi-sync replication switched OFF.
      150331 9:19:22 [Note] Semi-sync replication disabled on the master.
      150331 9:19:22 [ERROR] mysqld got signal 11 ;
      

      [...]

      /usr/sbin/mysqld(my_print_stacktrace+0x2b)[0xb70d4b]
      /usr/sbin/mysqld(handle_fatal_signal+0x398)[0x7257b8]
      /lib64/libpthread.so.0[0x377040f710]
      /usr/lib64/mysql/plugin/semisync_master.so(ActiveTranx::is_tranx_end_pos(char const*, unsigned long long)+0x24)[0x7fe91a5f9fe4]
      /usr/lib64/mysql/plugin/semisync_master.so(ReplSemiSyncMaster::commitTrx(char const*, unsigned long long)+0x19e)[0x7fe91a5fab3e]
      /usr/sbin/mysqld(Trans_delegate::after_commit(THD*, bool)+0xa2)[0x69b612]
      /usr/sbin/mysqld(ha_commit_trans(THD*, bool)+0x222)[0x728872]
      /usr/sbin/mysqld(trans_commit_stmt(THD*)+0x1b)[0x6a350b]
      /usr/sbin/mysqld(mysql_execute_command(THD*)+0x514)[0x5d16d4]
      /usr/sbin/mysqld[0x5d79d2]
      /usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x1b20)[0x5d9b90]
      /usr/sbin/mysqld(do_handle_one_connection(THD*)+0x453)[0x6956a3]
      /usr/sbin/mysqld(handle_one_connection+0x42)[0x695772]
      /lib64/libpthread.so.0[0x37704079d1]
      /lib64/libc.so.6(clone+0x6d)[0x37700e8b6d]
      

      My blind educated guess is a race condition between plugin callbacks and plugin teardown code. More specific: I think that ReplSemiSyncMaster::commitTrx() is still registered as an after_commit callback, but either at the time it gets called the plugins transaction hash table has just been freed, or it got freed just at the "right" time between invoking the callback and actually processing it?

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            hholzgra Hartmut Holzgraefe added a comment -

            Looking at the semisync code again the "funny" part is that the above stack trace should never be seen with a production aka. non-debug build of mysqld?

            The last two strack trace lines, before the signal handler kicks in, were:

            .../semisync_master.so(ActiveTranx::is_tranx_end_pos(...)
            .../semisync_master.so(ReplSemiSyncMaster::commitTrx(...)
            

            The only place where commitTrx() calls is_tranx_end_pos is the following assert() though:

            Unable to find source-code formatter for language: c++. Available languages are: actionscript, html, java, javascript, none, sql, xhtml, xml
              /*
                  At this point, the binlog file and position of this transaction
                  must have been removed from ActiveTranx.
                */
                assert(thd_killed(NULL) ||
                       !active_tranxs_->is_tranx_end_pos(trx_wait_binlog_name,
                                                         trx_wait_binlog_pos));
            

            So is_tranx_end_pos() should never be called in a non-debug build
            where assert() is just defined as empty?

            Show
            hholzgra Hartmut Holzgraefe added a comment - Looking at the semisync code again the "funny" part is that the above stack trace should never be seen with a production aka. non-debug build of mysqld? The last two strack trace lines, before the signal handler kicks in, were: .../semisync_master.so(ActiveTranx::is_tranx_end_pos(...) .../semisync_master.so(ReplSemiSyncMaster::commitTrx(...) The only place where commitTrx() calls is_tranx_end_pos is the following assert() though: Unable to find source-code formatter for language: c++. Available languages are: actionscript, html, java, javascript, none, sql, xhtml, xml /* At this point, the binlog file and position of this transaction must have been removed from ActiveTranx. */ assert (thd_killed(NULL) || !active_tranxs_->is_tranx_end_pos(trx_wait_binlog_name, trx_wait_binlog_pos)); So is_tranx_end_pos() should never be called in a non-debug build where assert() is just defined as empty?
            Hide
            elenst Elena Stepanova added a comment -

            Kristian Nielsen fixed a similar issue once long time ago (MDEV-359), so maybe the above information will be enough to fix it again?

            Show
            elenst Elena Stepanova added a comment - Kristian Nielsen fixed a similar issue once long time ago ( MDEV-359 ), so maybe the above information will be enough to fix it again?
            Hide
            hholzgra Hartmut Holzgraefe added a comment -

            Forget my note regarding assert(), i somehow thought it was off by default and only enabled when compiling with DEBUG defined, but it is actually the opposite: enabled by default unless NDEBUG is defined to disable it ...

            Show
            hholzgra Hartmut Holzgraefe added a comment - Forget my note regarding assert(), i somehow thought it was off by default and only enabled when compiling with DEBUG defined, but it is actually the opposite: enabled by default unless NDEBUG is defined to disable it ...

              People

              • Assignee:
                knielsen Kristian Nielsen
                Reporter:
                hholzgra Hartmut Holzgraefe
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: