Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8154

rpl.show_status_stop_slave_race-7126 sporadically causes internal check failure

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.0
    • Fix Version/s: 10.0.20, 5.5.46
    • Component/s: Tests
    • Labels:
      None

      Description

      Reported by AL13N on IRC.

      I am also getting the failure (sporadically, but often enough) on a release build, but not on a debug build.

      worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
      rpl.show_status_stop_slave_race-7126 'mix' [ pass ]   2121
      
      MTR's internal check of the test case 'rpl.show_status_stop_slave_race-7126' failed.
      This means that the test case does not preserve the state that existed
      before the test case was executed.  Most likely the test case did not
      do a proper clean-up. It could also be caused by the previous test run
      by this thread, if the server wasn't restarted.
      This is the diff of the states of the servers before and after the
      test case was executed:
      mysqltest: Logging to '/data/releases/mariadb-10.0.18-linux-x86_64/mysql-test/var/tmp/check-mysqld_2.log'.
      mysqltest: Results saved in '/data/releases/mariadb-10.0.18-linux-x86_64/mysql-test/var/tmp/check-mysqld_2.result'.
      mysqltest: Connecting to server localhost:16001 (socket /data/releases/mariadb-10.0.18-linux-x86_64/mysql-test/var/tmp/mysqld.2.sock) as 'root', connection 'default', attempt 0 ...
      mysqltest: ... Connected.
      mysqltest: Start processing test commands from './include/check-testcase.test' ...
      mysqltest: ... Done processing test commands.
      --- /data/releases/mariadb-10.0.18-linux-x86_64/mysql-test/var/tmp/check-mysqld_2.result	2015-05-13 02:02:04.000000000 +0300
      +++ /data/releases/mariadb-10.0.18-linux-x86_64/mysql-test/var/tmp/check-mysqld_2.reject	2015-05-13 02:02:06.000000000 +0300
      @@ -32,8 +32,8 @@
       Master_SSL_Key	
       Seconds_Behind_Master	NULL
       Master_SSL_Verify_Server_Cert	No
      -Last_IO_Errno	0
      -Last_IO_Error	
      +Last_IO_Errno	2003
      +Last_IO_Error	error connecting to master 'root@127.0.0.1:16000' - retry-time: 1  retries: 10  message: Can't connect to MySQL server on '127.0.0.1' (4 "Interrupted system call")
       Last_SQL_Errno	0
       Last_SQL_Error	
       Replicate_Ignore_Server_Ids	
      
      mysqltest: Result length mismatch
      

      Here is what I think is happening:

      The test runs "START SLAVE; STOP SLAVE" multiple times concurrently in two connections; so sometimes STOP from one connection interrupts starting from another connection, and the slave IO thread stops with the 'interrupted system call' error. If it happens to be the last state at the end of the concurrent flow, the test ends with this status because it doesn't have replication cleanup at the end.

      The change below seems to fix the problem, but I need a confirmation that the "Interrupted system call" status is really allowed here.

      diff --git a/mysql-test/suite/rpl/r/show_status_stop_slave_race-7126.result b/mysql-test/suite/rpl/r/show_status_stop_slave_race-7126.result
      index e71bb2e..c2a0498 100644
      --- a/mysql-test/suite/rpl/r/show_status_stop_slave_race-7126.result
      +++ b/mysql-test/suite/rpl/r/show_status_stop_slave_race-7126.result
      @@ -1,2 +1,4 @@
       include/master-slave.inc
       [connection master]
      +start slave;
      +include/rpl_end.inc
      diff --git a/mysql-test/suite/rpl/t/show_status_stop_slave_race-7126.test b/mysql-test/suite/rpl/t/show_status_stop_slave_race-7126.test
      index a79a188..38759c9 100644
      --- a/mysql-test/suite/rpl/t/show_status_stop_slave_race-7126.test
      +++ b/mysql-test/suite/rpl/t/show_status_stop_slave_race-7126.test
      @@ -10,3 +10,10 @@
       --exec $MYSQL_SLAP --silent --socket=$SLAVE_MYSOCK -q "START SLAVE; STOP SLAVE; SHOW GLOBAL STATUS" -c 2 --number-of-queries=100 --create-s
       
       # All done.
      +
      +--connection slave
      +start slave;
      +
      +--connection master
      +--source include/rpl_end.inc
      +
      

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            elenst Elena Stepanova added a comment -

            Kristian Nielsen, could you please take a look to confirm (or not confirm) that it's a test problem only?

            Show
            elenst Elena Stepanova added a comment - Kristian Nielsen , could you please take a look to confirm (or not confirm) that it's a test problem only?
            Hide
            knielsen Kristian Nielsen added a comment -

            It should be ok. "Interrupted system call" is how STOP SLAVE is able to
            cause the IO thread to get out of waiting on the network connection to the
            master.

            Show
            knielsen Kristian Nielsen added a comment - It should be ok. "Interrupted system call" is how STOP SLAVE is able to cause the IO thread to get out of waiting on the network connection to the master.
            Show
            elenst Elena Stepanova added a comment - Pushed into 10.0: https://github.com/MariaDB/server/commit/0b4231e9f19308c6b8b57e07e52c1cf711691ebd
            Hide
            elenst Elena Stepanova added a comment -

            Also applied to 5.5

            Show
            elenst Elena Stepanova added a comment - Also applied to 5.5

              People

              • Assignee:
                elenst Elena Stepanova
                Reporter:
                elenst Elena Stepanova
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: