Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-201

LP:962540 - Assertion `!thd->spcont' failed in net_send_error on server shutdown

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.5.22, 10.0.6
    • Component/s: None
    • Labels:
      None

      Description

      https://bugs.launchpad.net/maria/+bug/962540

      120323 0:32:52 [Note] debug/sql/mysqld: ready for connections.
      Version: '5.5.21-MariaDB-debug-log' socket: 'debug/mysql-test/var/tmp/mysqld.1.sock' port: 16000 Source distribution
      120323 0:33:07 [Note] Start binlog_dump to slave_server(2), pos(master-bin.000001, 4)
      120323 0:33:08 [Note] Semi-sync replication initialized for transactions.
      120323 0:33:08 [Note] Semi-sync replication enabled on the master.
      120323 0:33:08 [Note] Stop asynchronous binlog_dump to slave (server_id: 2)
      120323 0:33:08 [Note] Start binlog_dump to slave_server(2), pos(master-bin.000001, 245)
      120323 0:33:08 [Note] Start semi-sync binlog_dump to slave (server_id: 2), pos(master-bin.000001, 245)
      120323 0:33:09 [Note] debug/sql/mysqld: Normal shutdown
      
      120323 0:33:10 [Note] Event Scheduler: Purging the queue. 0 events
      120323 0:33:10 [Note] Stop semi-sync binlog_dump to slave (server_id: 2)
      120323 0:33:12 [Warning] debug/sql/mysqld: Forcing close of thread 7 user: 'root'
      
      120323 0:33:12 [Warning] debug/sql/mysqld: Forcing close of thread 6 user: 'root'
      
      mysqld: sql/protocol.cc:151: bool net_send_error(THD*, uint, const char*, const char*): Assertion `!thd->spcont' failed.
      120323 0:33:12 [ERROR] mysqld got signal 6 ;
      
      #7 0xb74f50f0 in abort () from /lib/libc.so.6
      #8 0xb74ec014 in __assert_fail () from /lib/libc.so.6
      #9 0x081a7bef in net_send_error (thd=0x97760e8, sql_errno=1053,
          err=0x95b0242 "Server shutdown in progress", sqlstate=0x0)
          at sql/protocol.cc:151
      #10 0x08197d83 in close_connection (thd=0x97760e8, sql_errno=1053)
          at sql/mysqld.cc:2358
      #11 0x081963bc in close_connections () at sql/mysqld.cc:1476
      #12 0x0819672f in kill_server (sig_ptr=0x0) at sql/mysqld.cc:1640
      #13 0x0819676f in kill_server_thread (arg=0xad1a4328) at sql/mysqld.cc:1668
      #14 0x0854933b in pfs_spawn_thread (arg=0x965d9c0)
          at storage/perfschema/pfs.cc:1015
      #15 0xb77cbb25 in start_thread () from /lib/libpthread.so.0
      

      bzr version-info

      revision-id: wlad@montyprogram.com-20120322192114-4hh3m2imbsx7r2vw
      date: 2012-03-22 20:21:14 +0100
      build-date: 2012-03-23 00:35:31 +0400
      revno: 3345
      

      Could not reproduce on MySQL 5.5 or MySQL trunk.

      Please note that the test case requires semisync plugins, so they should be built and placed in the plugin folder.
      On the same reason the test case is not applicable to versions below 5.5.

      The test case works pretty reliably for me, but a couple of times (out of dozens) it didn't cause the assertion, so I needed to re-run it.

      1. Test case:
      --source include/master-slave.inc
      --source include/have_binlog_format_mixed.inc
      --source include/have_semisync_plugin.inc
      
      eval INSTALL PLUGIN rpl_semi_sync_master SONAME '$SEMISYNC_MASTER_SO';
      SET GLOBAL rpl_semi_sync_master_enabled = 1;
      
      --connection slave
      eval INSTALL PLUGIN rpl_semi_sync_slave SONAME '$SEMISYNC_SLAVE_SO';
      SET GLOBAL rpl_semi_sync_slave_enabled = 1;
      STOP SLAVE;
      START SLAVE;
      
      --connection master
      CREATE TABLE t1 (
        pk INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
        a DATETIME
      ) ENGINE = MyISAM;
      
      --delimiter |
      CREATE PROCEDURE pr ()
      BEGIN
        DECLARE done INT DEFAULT 10000;
        wl: WHILE done DO
          INSERT INTO t1 (a) VALUES ( NOW() );
          SET done = done - 1;
        END WHILE wl;
      END|
      --delimiter ;
      
      --send
      CALL pr();
      --sleep 1
      --let $rpl_server_number= 1
      --source include/rpl_stop_server.inc
      

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            elenst Elena Stepanova added a comment -

            Modified test:

            --source include/master-slave.inc
            --source include/have_binlog_format_mixed.inc
            --source include/have_semisync_plugin.inc

            eval INSTALL PLUGIN rpl_semi_sync_master SONAME '$SEMISYNC_MASTER_SO';
            SET GLOBAL rpl_semi_sync_master_enabled = 1;

            --connection slave
            eval INSTALL PLUGIN rpl_semi_sync_slave SONAME '$SEMISYNC_SLAVE_SO';
            SET GLOBAL rpl_semi_sync_slave_enabled = 1;
            STOP SLAVE;
            START SLAVE;

            --connection master
            CREATE TABLE t1 (
            pk INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
            a TEXT
            ) ENGINE = MyISAM;

            --delimiter |
            CREATE PROCEDURE pr ()
            BEGIN
            DECLARE done INT DEFAULT 10000;
            wl: WHILE done DO
            INSERT INTO t1 (a) VALUES ( REPEAT('a',65536) ), ( REPEAT('b',65536) ), ( REPEAT('c',65536) ), ( REPEAT('d',65536) ), ( REPEAT('e',65536) );
            SET done = done - 1;
            END WHILE wl;
            END|
            --delimiter ;

            --send
            CALL pr();

            --let $rpl_server_number= 1
            --source include/rpl_stop_server.inc

            Show
            elenst Elena Stepanova added a comment - Modified test: --source include/master-slave.inc --source include/have_binlog_format_mixed.inc --source include/have_semisync_plugin.inc eval INSTALL PLUGIN rpl_semi_sync_master SONAME '$SEMISYNC_MASTER_SO'; SET GLOBAL rpl_semi_sync_master_enabled = 1; --connection slave eval INSTALL PLUGIN rpl_semi_sync_slave SONAME '$SEMISYNC_SLAVE_SO'; SET GLOBAL rpl_semi_sync_slave_enabled = 1; STOP SLAVE; START SLAVE; --connection master CREATE TABLE t1 ( pk INT NOT NULL AUTO_INCREMENT PRIMARY KEY, a TEXT ) ENGINE = MyISAM; --delimiter | CREATE PROCEDURE pr () BEGIN DECLARE done INT DEFAULT 10000; wl: WHILE done DO INSERT INTO t1 (a) VALUES ( REPEAT('a',65536) ), ( REPEAT('b',65536) ), ( REPEAT('c',65536) ), ( REPEAT('d',65536) ), ( REPEAT('e',65536) ); SET done = done - 1; END WHILE wl; END| --delimiter ; --send CALL pr(); --let $rpl_server_number= 1 --source include/rpl_stop_server.inc
            Hide
            elenst Elena Stepanova added a comment -

            Re-assigning back since it seems to be reproducible on perro at the moment.

            Show
            elenst Elena Stepanova added a comment - Re-assigning back since it seems to be reproducible on perro at the moment.
            Hide
            serg Sergei Golubchik added a comment - - edited

            The fix for this bug was removed from 10.0 (probably accidentally in 5.6 merge). Elena, could you please test this again in 10.0 tree to see whether the bug reappears?

            Show
            serg Sergei Golubchik added a comment - - edited The fix for this bug was removed from 10.0 (probably accidentally in 5.6 merge). Elena, could you please test this again in 10.0 tree to see whether the bug reappears?
            Hide
            elenst Elena Stepanova added a comment - - edited

            I couldn't reproduce the failure on 10.0 tree so far (still trying).
            But I'm also not getting "Forcing close of thread ..." warnings, so maybe something changed that was a prerequisite for this use case.

            Show
            elenst Elena Stepanova added a comment - - edited I couldn't reproduce the failure on 10.0 tree so far (still trying). But I'm also not getting "Forcing close of thread ..." warnings, so maybe something changed that was a prerequisite for this use case.
            Hide
            elenst Elena Stepanova added a comment -

            I found when it stopped happening on 10.0 tree. The following revision did it:

            ------------------------------------------------------------
            revno: 3748 [merge]
            revision-id: knielsen@knielsen-hq.org-20130429100354-0kujsw18axni9syq
            timestamp: Mon 2013-04-29 12:03:54 +0200
            message:
            Merge 10.0-base -> 10.0
            ------------------------------------------------------------

            More particularly, this one:

            ------------------------------------------------------------
            revno: 3427.1.194
            revision-id: knielsen@knielsen-hq.org-20130424110540-hf6f77511ahhuios
            timestamp: Wed 2013-04-24 13:05:40 +0200
            message:
            Add missing check for thd->killed in mysql_binlog_send().

            The slave dump thread running on the master only checked thd->killed whenever
            it reached the end of a binlog file, not between events. This could
            unnecessarily delay server shutdown.

            This was found by code inspection while tracking down some occasional "forcing
            close of thread..." errors in Buildbot. Hopefully this will fix the failures,
            but the fix is correct in any case.

            Also increase the wait during server shutdown, 2 seconds is a bit tight in
            case of heavy I/O stall, and it seems better to delay shutdown a bit than
            force-kill threads unnecessarily.

            Also fix some races in test cases that restart the mysqld server. The .expect
            file should be changed with --append_file, --remove_file + --write_file
            creates a short window where mysqld can error out due to .expect file missing.
            ------------------------------------------------------------

            Even more particularly, this change:

            === modified file 'sql/mysqld.cc'
            — sql/mysqld.cc 2013-04-16 11:43:28 +0000
            +++ sql/mysqld.cc 2013-04-24 11:05:40 +0000
            @@ -1494,8 +1494,21 @@
            Events::deinit();
            end_slave();

            • /* Give threads time to die. */
            • for (int i= 0; (volatile int32) &thread_count && i < 100; i++)
              + /*
              + Give threads time to die.
              +
              + In 5.5, this was waiting 100 rounds @ 20 milliseconds/round, so as little
              + as 2 seconds, depending on thread scheduling.
              +
              + From 10.0, we increase this to 1000 rounds / 20 seconds. The rationale is
              + that on a server with heavy I/O load, it is quite possible for eg. an
              + fsync() of the binlog or whatever to cause something like LOCK_log to be
              + held for more than 2 seconds. We do not want to force kill threads in
              + such cases, if it can be avoided. Note that normally, the wait will be
              + much smaller than even 2 seconds, this is only a safety fallback against
              + stuck threads so server shutdown is not held up forever.
              + */
              + for (int i= 0; (volatile int32) &thread_count && i < 1000; i++)
              my_sleep(20000);

            /*

            If I revert the change on the current 10.0 tree, the assertion failure starts happening again.
            The increased sleep time doesn't look like a real fix to me, but I couldn't overcome it with increased rpl_semi_sync_master_timeout and MTR's shutdown-timeout, so maybe it is, after all.

            I've set up the test on perro, just in case. It's under mariadb. 10.0 tree, revno 3850, with the reverted patch quoted above (it is also in mdev201/dif), built with BUILD/compile-pentium-debug-max-no-ndb.
            To reproduce,
            cd mdev201/mysql-test
            perl ./mtr mdev201 -mysqld=plugin-rpl-semi-sync-master --mysqld=-plugin-rpl-semi-sync-slave

            Show
            elenst Elena Stepanova added a comment - I found when it stopped happening on 10.0 tree. The following revision did it: ------------------------------------------------------------ revno: 3748 [merge] revision-id: knielsen@knielsen-hq.org-20130429100354-0kujsw18axni9syq timestamp: Mon 2013-04-29 12:03:54 +0200 message: Merge 10.0-base -> 10.0 ------------------------------------------------------------ More particularly, this one: ------------------------------------------------------------ revno: 3427.1.194 revision-id: knielsen@knielsen-hq.org-20130424110540-hf6f77511ahhuios timestamp: Wed 2013-04-24 13:05:40 +0200 message: Add missing check for thd->killed in mysql_binlog_send(). The slave dump thread running on the master only checked thd->killed whenever it reached the end of a binlog file, not between events. This could unnecessarily delay server shutdown. This was found by code inspection while tracking down some occasional "forcing close of thread..." errors in Buildbot. Hopefully this will fix the failures, but the fix is correct in any case. Also increase the wait during server shutdown, 2 seconds is a bit tight in case of heavy I/O stall, and it seems better to delay shutdown a bit than force-kill threads unnecessarily. Also fix some races in test cases that restart the mysqld server. The .expect file should be changed with --append_file, --remove_file + --write_file creates a short window where mysqld can error out due to .expect file missing. ------------------------------------------------------------ Even more particularly, this change: === modified file 'sql/mysqld.cc' — sql/mysqld.cc 2013-04-16 11:43:28 +0000 +++ sql/mysqld.cc 2013-04-24 11:05:40 +0000 @@ -1494,8 +1494,21 @@ Events::deinit(); end_slave(); /* Give threads time to die. */ for (int i= 0; (volatile int32 ) &thread_count && i < 100; i++) + /* + Give threads time to die. + + In 5.5, this was waiting 100 rounds @ 20 milliseconds/round, so as little + as 2 seconds, depending on thread scheduling. + + From 10.0, we increase this to 1000 rounds / 20 seconds. The rationale is + that on a server with heavy I/O load, it is quite possible for eg. an + fsync() of the binlog or whatever to cause something like LOCK_log to be + held for more than 2 seconds. We do not want to force kill threads in + such cases, if it can be avoided. Note that normally, the wait will be + much smaller than even 2 seconds, this is only a safety fallback against + stuck threads so server shutdown is not held up forever. + */ + for (int i= 0; (volatile int32 ) &thread_count && i < 1000; i++) my_sleep(20000); /* If I revert the change on the current 10.0 tree, the assertion failure starts happening again. The increased sleep time doesn't look like a real fix to me, but I couldn't overcome it with increased rpl_semi_sync_master_timeout and MTR's shutdown-timeout, so maybe it is, after all. I've set up the test on perro, just in case. It's under mariadb. 10.0 tree, revno 3850, with the reverted patch quoted above (it is also in mdev201/dif), built with BUILD/compile-pentium-debug-max-no-ndb. To reproduce, cd mdev201/mysql-test perl ./mtr mdev201 - mysqld= plugin-rpl-semi-sync-master --mysqld= -plugin-rpl-semi-sync-slave

              People

              • Assignee:
                serg Sergei Golubchik
                Reporter:
                elenst Elena Stepanova
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 0 minutes
                  0m
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 3 hours, 50 minutes
                  3h 50m