Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8476

Race condition in slave SQL thread shutdown

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 10.1, 10.0
    • Fix Version/s: 10.1, 10.0
    • Component/s: None
    • Labels:
      None

      Description

      During server shutdown slave SQL thread broadcasts "stopped" too early:

        mysql_cond_broadcast(&rli->stop_cond);
        DBUG_EXECUTE_IF("simulate_slave_delay_at_terminate_bug38694", sleep(5););
        mysql_mutex_unlock(&rli->run_lock);  // tell the world we are done
      
        /*
          Deactivate the parallel replication thread pool, if there are now no more
          SQL threads running. Do this here, when we have released all locks, but
          while our THD (and current_thd) is still valid.
        */
        mysql_mutex_lock(&LOCK_active_mi);
        if (opt_slave_parallel_threads > 0 &&
            !master_info_index->any_slave_sql_running())
          rpl_parallel_inactivate_pool(&global_rpl_thread_pool);
        mysql_mutex_unlock(&LOCK_active_mi);
      
        mysql_mutex_lock(&LOCK_thread_count);
        THD_CHECK_SENTRY(thd);
        delete thd;
        mysql_mutex_unlock(&LOCK_thread_count);
      

      This causes enormous number of sporadic replication test failures with crashes, hangs and warnings like "Plugin '...' has ref_count=1 after shutdown".

      To reproduce this failure reliably, just add sleep(1) immediately after condition is signaled and run tests as usual:

      diff --git a/sql/slave.cc b/sql/slave.cc
      index 4129c4c..efc75f3 100644
      --- a/sql/slave.cc
      +++ b/sql/slave.cc
      @@ -4837,6 +4837,7 @@ log '%s' at position %s, relay log '%s' position: %s%s", RPL_LOG_NAME,
         mysql_cond_broadcast(&rli->stop_cond);
         DBUG_EXECUTE_IF("simulate_slave_delay_at_terminate_bug38694", sleep(5););
         mysql_mutex_unlock(&rli->run_lock);  // tell the world we are done
      +  sleep(1);
      
         /*
           Deactivate the parallel replication thread pool, if there are now no more
      

      This seem to be appeared after:

      commit 682ed005c57dabb4615052a79789e44e5c93525c
      Author: Kristian Nielsen <knielsen@knielsen-hq.org>
      Date:   Wed Jun 10 11:57:42 2015 +0200
      
          MDEV-8294: Inconsistent behavior of slave parallel threads at runtime
      
          There were some cases where the slave SQL thread could stop without
          the pool of parallel replication worker threads being correctly
          de-activated.
      

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              elenst Elena Stepanova added a comment -

              Sergey Vojtovich,

              Is it a 10.x-specific problem, as specified in Affected versions, or 5.5 can be affected as well?
              Apart from MDEV-7884, we've recently had another flavor of a similar failure in rpl.rpl_binlog_errors on 5.5:
              http://buildbot.askmonty.org/buildbot/builders/p8-trusty-bintar-debug/builds/672/steps/test/logs/stdio

              rpl.rpl_binlog_errors 'mix,xtradb'       w1 [ fail ]  Found warnings/errors in server log file!
                      Test ended at 2015-09-25 00:11:51
              line
              150925  0:10:46 [ERROR] mysqld: Table './mysql/plugin' is marked as crashed and should be repaired
              150925  0:10:46 [Warning] Checking table:   './mysql/plugin'
              150925  0:10:46 [ERROR] mysql.plugin: 1 client is using or hasn't closed the table properly
              150925  0:10:46 [ERROR] mysqld: Table './mysql/event' is marked as crashed and should be repaired
              150925  0:10:46 [Warning] Checking table:   './mysql/event'
              150925  0:10:46 [ERROR] mysql.event: 1 client is using or hasn't closed the table properly
              

              Which in fact looks like this in the slave's error log:

              150925  0:10:13 [Note] /var/lib/buildbot/maria-slave/power8-vlp04-bintar-debug/build/sql/mysqld: Normal shutdown
              
              150925  0:10:43 [Note] /var/lib/buildbot/maria-slave/power8-vlp04-bintar-debug/build/sql/mysqld (mysqld 5.5.45-MariaDB-debug-log) starti
              ng as process 7173 ...
              

              So, obviously the slave had hung on shutdown.
              Can it be the same problem? (and MDEV-7884 as well).

              Show
              elenst Elena Stepanova added a comment - Sergey Vojtovich , Is it a 10.x-specific problem, as specified in Affected versions, or 5.5 can be affected as well? Apart from MDEV-7884 , we've recently had another flavor of a similar failure in rpl.rpl_binlog_errors on 5.5: http://buildbot.askmonty.org/buildbot/builders/p8-trusty-bintar-debug/builds/672/steps/test/logs/stdio rpl.rpl_binlog_errors 'mix,xtradb' w1 [ fail ] Found warnings/errors in server log file! Test ended at 2015-09-25 00:11:51 line 150925 0:10:46 [ERROR] mysqld: Table './mysql/plugin' is marked as crashed and should be repaired 150925 0:10:46 [Warning] Checking table: './mysql/plugin' 150925 0:10:46 [ERROR] mysql.plugin: 1 client is using or hasn't closed the table properly 150925 0:10:46 [ERROR] mysqld: Table './mysql/event' is marked as crashed and should be repaired 150925 0:10:46 [Warning] Checking table: './mysql/event' 150925 0:10:46 [ERROR] mysql.event: 1 client is using or hasn't closed the table properly Which in fact looks like this in the slave's error log: 150925 0:10:13 [Note] /var/lib/buildbot/maria-slave/power8-vlp04-bintar-debug/build/sql/mysqld: Normal shutdown 150925 0:10:43 [Note] /var/lib/buildbot/maria-slave/power8-vlp04-bintar-debug/build/sql/mysqld (mysqld 5.5.45-MariaDB-debug-log) starti ng as process 7173 ... So, obviously the slave had hung on shutdown. Can it be the same problem? (and MDEV-7884 as well).
              Hide
              svoj Sergey Vojtovich added a comment -

              Elena Stepanova, this should be different bug. I see no way for this MDEV-8476 to affect 5.5.

              Show
              svoj Sergey Vojtovich added a comment - Elena Stepanova , this should be different bug. I see no way for this MDEV-8476 to affect 5.5.

                People

                • Assignee:
                  knielsen Kristian Nielsen
                  Reporter:
                  svoj Sergey Vojtovich
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  3 Start watching this issue

                  Dates

                  • Created:
                    Updated: