Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4694

Changes to mysql.gtid_slave_pos table become visible too early

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Not a Bug
    • Affects Version/s: None
    • Fix Version/s: 10.0.4
    • Component/s: None
    • Labels:
      None

      Description

      The summary might be not quite accurate looking from inside, but it describes the effect I observe. Please feel free to modify it if needed.

      I have master=>slave setup with master_use_gtid=current_pos. The slave does not have log-slave-updates.
      A test executes DML flow (1 thread only) on master, simultaneously checking gtid* values on slave every 10 seconds, and crashing and restarting the slave every 30 seconds.
      To check the gtid* values, it uses the following query:

      SELECT GROUP_CONCAT(domain_id,"-",server_id,"-",seq_no separator ","), @@gtid_current_pos, @@gtid_slave_pos, @@gtid_binlog_pos, MAX(seq_no)
      FROM mysql.gtid_slave_pos 
      WHERE (domain_id, sub_id) IN (SELECT domain_id, MAX(sub_id) FROM mysql.gtid_slave_pos GROUP BY domain_id)
      

      and prints the values like this:

      table mysql.gtid_slave_pos: 0-1-1511
      gtid_current_pos: 0-1-1511
      gtid_slave_pos:   0-1-1511
      gtid_binlog_pos:  
      

      Normally, I see either 3 equal values, as above, or

      table mysql.gtid_slave_pos: 0-1-3197
      gtid_current_pos: 0-1-3196
      gtid_slave_pos:   0-1-3196
      gtid_binlog_pos:  
      

      which I attributed to a delay at updating the variables.
      However, every once in a while I see the following picture:

      Slave GTID status:
        table mysql.gtid_slave_pos: 0-1-14040
        gtid_current_pos: 0-1-14039
        gtid_slave_pos:   0-1-14039
        gtid_binlog_pos:  
      Sending SIGKILL to server with pid 42395 in order to force crash recovery
       Trying to restart the server ...
      Slave GTID status after server restart:
        table mysql.gtid_slave_pos: 0-1-14039
        gtid_current_pos: 0-1-14039
        gtid_slave_pos:   0-1-14039
        gtid_binlog_pos:  
      

      That is, mysql.gtid_slave_pos already had seq_no 14040 before server crash, but after restart it's back to 14039.
      At the same time, I see no indication of the event 14040 being re-executed. So, it looks like no harm is done.
      Transaction isolation level is default REPEATABLE-READ.
      I find it bizarre, because to my understanding the table is supposed to be updated in the same transaction as the change the new GTID belongs to, so if I see the change, the transaction should be already committed; and then I don't understand how it can be rolled back after restart...

      I can provide more information, including how to reproduce it with a stress test, or I can setup the test on perro, or modify the test in any manner, or do whatever it takes to debug it, but I'm also quite prepared to learn that it works as expected.

      bzr version-info

      revision-id: knielsen@knielsen-hq.org-20130621192324-544gs6nw2gj4h45x
      revno: 3664
      branch-nick: 10.0-base
      

      The effect is not related to two recent bugfixes, I observed it before as well.

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              knielsen Kristian Nielsen added a comment - - edited

              This is actually InnoDB behaviour, as per this comment from the source:

              -----------------------------------------------------------------------
              /* The following assignment makes the transaction committed in memory
              and makes its changes to data visible to other transactions.
              NOTE that there is a small discrepancy from the strict formal
              visibility rules here: a human user of the database can see
              modifications made by another transaction T even before the necessary
              log segment has been flushed to the disk. If the database happens to
              crash before the flush, the user has seen modifications from T which
              will never be a committed transaction. However, any transaction T2
              which sees the modifications of the committing transaction T, and
              which also itself makes modifications to the database, will get an lsn
              larger than the committing transaction T. In the case where the log
              flush fails, and T never gets committed, also T2 will never get
              committed. */
              -----------------------------------------------------------------------

              I was able to repeat the same behaviour just with a standard InnoDB table, no
              replication or GTID. Test case as follows, though it uses sleep and is thus
              racy and may not behave the same on your machine:

              -----------------------------------------------------------------------

              1. rpl_mdev4694.test
                --source include/have_innodb.inc
                --source include/have_debug.inc

              --connect(con1,localhost,root,,)

              --connection default
              CREATE TABLE t1 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB;

              INSERT INTO t1 VALUES (1, 1);
              INSERT INTO t1 VALUES (2, 2);

              --write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
              wait
              EOF

              --connection con1
              BEGIN;
              INSERT INTO t1 VALUES (3,3);
              DELETE FROM t1 WHERE a=1;
              send COMMIT;

              --connection default
              SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
              BEGIN;
              SELECT * FROM t1 ORDER BY a;
              SET SESSION debug_dbug="+d,dummy_disable_default_dbug_output";
              SET SESSION debug_dbug="+d,crash_dispatch_command_before";
              --error 2006,2013
              SELECT 1;
              --source include/wait_until_disconnected.inc

              --append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
              restart
              EOF

              --enable_reconnect
              --source include/wait_until_connected_again.inc

              SELECT * FROM t1 ORDER BY a;

              --connection con1
              --enable_reconnect
              --error 0,2006,2013
              reap;

              --connection default
              DROP TABLE t1;
              -----------------------------------------------------------------------

              You will need this .opt file:

              -----------------------------------------------------------------------

              1. rpl_mdev4694-master.opt
                --skip-stack-trace --skip-core-file
                -----------------------------------------------------------------------

              And this patch:

              -----------------------------------------------------------------------
              === modified file 'storage/xtradb/trx/trx0trx.c'
              — storage/xtradb/trx/trx0trx.c 2013-06-06 15:51:28 +0000
              +++ storage/xtradb/trx/trx0trx.c 2013-06-22 09:20:43 +0000
              @@ -1200,7 +1200,7 @@ trx_commit_off_kernel(
              ulint flush_log_at_trx_commit;

              mutex_exit(&kernel_mutex);
              -
              + os_thread_sleep(3000000);
              if (trx->insert_undo != NULL) {

              trx_undo_insert_cleanup(trx);

              -----------------------------------------------------------------------

              I get the following output:

              -----------------------------------------------------------------------
              CREATE TABLE t1 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB;
              INSERT INTO t1 VALUES (1, 1);
              INSERT INTO t1 VALUES (2, 2);
              BEGIN;
              INSERT INTO t1 VALUES (3,3);
              DELETE FROM t1 WHERE a=1;
              COMMIT;
              SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
              BEGIN;
              SELECT * FROM t1 ORDER BY a;
              a b
              2 2
              3 3
              SET SESSION debug_dbug="+d,dummy_disable_default_dbug_output";
              SET SESSION debug_dbug="+d,crash_dispatch_command_before";
              SELECT 1;
              Got one of the listed errors
              SELECT * FROM t1 ORDER BY a;
              a b
              1 1
              2 2
              DROP TABLE t1;
              main.rpl_mdev4694 'xtradb' [ pass ] 13829
              -----------------------------------------------------------------------

              So the transaction is visible before the crash, but gone after the crash.

              So this seems to be intended InnoDB behaviour. Transactions seen just before a
              crash may still be rolled back during crash recovery, if the crash happens in
              the middle of the commit of said transactions.

              I believe it will not cause any issues for GTID, as you also seem to
              observe. The change to mysql.gtid_slave_pos will be recovered or rolled back
              together with the replicated transaction during InnoDB crash recovery, so the
              result will be consistent and correct irrespectively of visibility before the
              crash.

              So I will mark this as "not a bug". But please re-open if you think there are
              more things related to this to consider or investigate. I am quite happy that
              you have reached this deep level of testing of the GTID stuff. So please
              continue reporting anything that appears strange, even if this is not a bug,
              the next one may well be.

              Show
              knielsen Kristian Nielsen added a comment - - edited This is actually InnoDB behaviour, as per this comment from the source: ----------------------------------------------------------------------- /* The following assignment makes the transaction committed in memory and makes its changes to data visible to other transactions. NOTE that there is a small discrepancy from the strict formal visibility rules here: a human user of the database can see modifications made by another transaction T even before the necessary log segment has been flushed to the disk. If the database happens to crash before the flush, the user has seen modifications from T which will never be a committed transaction. However, any transaction T2 which sees the modifications of the committing transaction T, and which also itself makes modifications to the database, will get an lsn larger than the committing transaction T. In the case where the log flush fails, and T never gets committed, also T2 will never get committed. */ ----------------------------------------------------------------------- I was able to repeat the same behaviour just with a standard InnoDB table, no replication or GTID. Test case as follows, though it uses sleep and is thus racy and may not behave the same on your machine: ----------------------------------------------------------------------- rpl_mdev4694.test --source include/have_innodb.inc --source include/have_debug.inc --connect(con1,localhost,root,,) --connection default CREATE TABLE t1 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB; INSERT INTO t1 VALUES (1, 1); INSERT INTO t1 VALUES (2, 2); --write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect wait EOF --connection con1 BEGIN; INSERT INTO t1 VALUES (3,3); DELETE FROM t1 WHERE a=1; send COMMIT; --connection default SET TRANSACTION ISOLATION LEVEL READ COMMITTED; BEGIN; SELECT * FROM t1 ORDER BY a; SET SESSION debug_dbug="+d,dummy_disable_default_dbug_output"; SET SESSION debug_dbug="+d,crash_dispatch_command_before"; --error 2006,2013 SELECT 1; --source include/wait_until_disconnected.inc --append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect restart EOF --enable_reconnect --source include/wait_until_connected_again.inc SELECT * FROM t1 ORDER BY a; --connection con1 --enable_reconnect --error 0,2006,2013 reap; --connection default DROP TABLE t1; ----------------------------------------------------------------------- You will need this .opt file: ----------------------------------------------------------------------- rpl_mdev4694-master.opt --skip-stack-trace --skip-core-file ----------------------------------------------------------------------- And this patch: ----------------------------------------------------------------------- === modified file 'storage/xtradb/trx/trx0trx.c' — storage/xtradb/trx/trx0trx.c 2013-06-06 15:51:28 +0000 +++ storage/xtradb/trx/trx0trx.c 2013-06-22 09:20:43 +0000 @@ -1200,7 +1200,7 @@ trx_commit_off_kernel( ulint flush_log_at_trx_commit; mutex_exit(&kernel_mutex); - + os_thread_sleep(3000000); if (trx->insert_undo != NULL) { trx_undo_insert_cleanup(trx); ----------------------------------------------------------------------- I get the following output: ----------------------------------------------------------------------- CREATE TABLE t1 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB; INSERT INTO t1 VALUES (1, 1); INSERT INTO t1 VALUES (2, 2); BEGIN; INSERT INTO t1 VALUES (3,3); DELETE FROM t1 WHERE a=1; COMMIT; SET TRANSACTION ISOLATION LEVEL READ COMMITTED; BEGIN; SELECT * FROM t1 ORDER BY a; a b 2 2 3 3 SET SESSION debug_dbug="+d,dummy_disable_default_dbug_output"; SET SESSION debug_dbug="+d,crash_dispatch_command_before"; SELECT 1; Got one of the listed errors SELECT * FROM t1 ORDER BY a; a b 1 1 2 2 DROP TABLE t1; main.rpl_mdev4694 'xtradb' [ pass ] 13829 ----------------------------------------------------------------------- So the transaction is visible before the crash, but gone after the crash. So this seems to be intended InnoDB behaviour. Transactions seen just before a crash may still be rolled back during crash recovery, if the crash happens in the middle of the commit of said transactions. I believe it will not cause any issues for GTID, as you also seem to observe. The change to mysql.gtid_slave_pos will be recovered or rolled back together with the replicated transaction during InnoDB crash recovery, so the result will be consistent and correct irrespectively of visibility before the crash. So I will mark this as "not a bug". But please re-open if you think there are more things related to this to consider or investigate. I am quite happy that you have reached this deep level of testing of the GTID stuff. So please continue reporting anything that appears strange, even if this is not a bug, the next one may well be.
              Hide
              elenst Elena Stepanova added a comment -

              Thank you, I suspected there was a catch in there somewhere, but it looked so weird that I just had to find out. I agree that it's not a bug, at least as long as we don't find a scenario where it actually causes any trouble.

              Show
              elenst Elena Stepanova added a comment - Thank you, I suspected there was a catch in there somewhere, but it looked so weird that I just had to find out. I agree that it's not a bug, at least as long as we don't find a scenario where it actually causes any trouble.

                People

                • Assignee:
                  knielsen Kristian Nielsen
                  Reporter:
                  elenst Elena Stepanova
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  1 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: