Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6276

Replicate_Do_DB option causes slave server to commit empty InnoDB transactions.

    Details

    • Type: Bug
    • Status: Stalled
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 10.0.11
    • Fix Version/s: 10.0
    • Component/s: None
    • Environment:
      RHEL 6.4

      Description

      Hi,

      I'm using the Replicate_Do_DB option to replicate a DB with a very small amount of activity (less then one transaction per second). Another DB running on the master server runs many more transactions per second. In the output of 'show full processlist' on the slave, I can see the slave process constantly switching between COMMIT and 'closing tables' for the DB that should be excluded by the Replicate_Do_DB option. I did not have this issue when using MariaDB 5.5.27.

      I enabled the general query log and it shows BEGIN and COMMIT repeatedly but no queries that actually alter any tables:

      140527 12:13:35 1721 Query BEGIN
      1721 Query COMMIT
      1721 Query BEGIN
      1721 Query COMMIT
      1721 Query BEGIN
      1721 Query COMMIT
      1721 Query BEGIN
      1721 Query COMMIT
      1721 Query BEGIN
      1721 Query COMMIT
      1721 Query BEGIN
      1721 Query COMMIT
      1721 Query BEGIN
      1721 Query COMMIT
      1721 Query BEGIN
      1721 Query COMMIT
      1721 Query BEGIN
      140527 12:13:36 1721 Query COMMIT
      1721 Query BEGIN
      1721 Query COMMIT
      1721 Query BEGIN
      1721 Query COMMIT
      1721 Query BEGIN
      1721 Query COMMIT
      1721 Query BEGIN
      1721 Query COMMIT /* implicit, from Xid_log_event */
      1721 Query BEGIN
      1721 Query COMMIT /* implicit, from Xid_log_event */
      1721 Query BEGIN
      1721 Query COMMIT
      1721 Query BEGIN
      1721 Query COMMIT

      This generates a lot of I/O contention that I've had to mitigate on this development server by setting innodb_flush_log_at_trx_commit to 0. I wouldn't be comfortable with that setting on a production DB server though.

      Regards,
      Bill

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            knielsen Kristian Nielsen added a comment -

            I verified this bug with the following test case:

            --source include/have_innodb.inc
            --source include/have_binlog_format_statement.inc
            --source include/master-slave.inc
            
            --connection slave
            --source include/stop_slave.inc
            SET @old_do_db = @@GLOBAL.replicate_do_db;
            SET GLOBAL replicate_do_db = "test2,test3";
            --source include/start_slave.inc
            
            --connection master
            CREATE DATABASE test2;
            CREATE DATABASE test3;
            
            use test;
            CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=InnoDB;
            INSERT INTO t1 VALUES (101);
            
            use test2;
            CREATE TABLE t2 (a INT PRIMARY KEY) ENGINE=InnoDB;
            INSERT INTO t2 VALUES (201);
            INSERT INTO t2 VALUES (202), (203);
            
            use test3;
            CREATE TABLE t3 (a INT PRIMARY KEY) ENGINE=InnoDB;
            INSERT INTO t3 VALUES (301);
            BEGIN;
            INSERT INTO t3 VALUES (302);
            INSERT INTO t3 VALUES (303);
            COMMIT;
            
            use test;
            UPDATE t1 SET a = a + 10;
            INSERT INTO t1 VALUES (102);
            DELETE FROM t1 WHERE a = 111;
            
            use test2;
            UPDATE t2 SET a = a + 10 WHERE a MOD 2 = 0;
            
            use test;
            INSERT INTO t1 VALUES (103);
            SELECT @@gtid_binlog_pos;
            
            SHOW BINLOG EVENTS;
            --save_master_pos
            
            --connection slave
            --sync_with_master
            SHOW BINLOG EVENTS;
            SELECT * FROM mysql.gtid_slave_pos ORDER BY domain_id, sub_id;
            SELECT @@gtid_binlog_pos;
            
            --error ER_NO_SUCH_TABLE
            SELECT * FROM test.t1;
            SELECT * FROM test2.t2;
            SELECT * FROM test3.t3;
            
            --connection master
            use test;
            DROP TABLE t1;
            use test2;
            DROP TABLE t2;
            use test3;
            DROP TABLE t3;
            use test;
            DROP DATABASE test2;
            DROP DATABASE test3;
            --save_master_pos
            
            --connection slave
            --sync_with_master
            --source include/stop_slave.inc
            SET GLOBAL replicate_do_db = @old_do_db;
            --source include/start_slave.inc
            
            --source include/rpl_end.inc
            

            In the output of this test case, we can see this:

            SELECT * FROM mysql.gtid_slave_pos ORDER BY domain_id, sub_id;
            domain_id	sub_id	server_id	seq_no
            0	14	1	14
            0	15	1	15
            SELECT @@gtid_slave_pos;
            @@gtid_slave_pos
            0-1-15
            SELECT @@gtid_binlog_pos;
            @@gtid_binlog_pos
            0-1-14
            

            So the problem is that for the empty transaction, even though there are no
            changes (and nothing logged into the binlog), we still update the gtid slave
            pos, which requires an expensive commit of the mysql.gtid_slave_pos table.

            It might have been deliberate to update the gtid_slave_pos, the idea being
            that if the slave is stopped and then re-started, it should not try to
            re-execute events that were already skipped.

            But in any case, I think this bug clearly shows that such behaviour is
            wrong. If user filters out events, she will not expect transaction commit of
            such filtered events.

            So this needs to be fixed, I think.

            Show
            knielsen Kristian Nielsen added a comment - I verified this bug with the following test case: --source include/have_innodb.inc --source include/have_binlog_format_statement.inc --source include/master-slave.inc --connection slave --source include/stop_slave.inc SET @old_do_db = @@GLOBAL.replicate_do_db; SET GLOBAL replicate_do_db = "test2,test3"; --source include/start_slave.inc --connection master CREATE DATABASE test2; CREATE DATABASE test3; use test; CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=InnoDB; INSERT INTO t1 VALUES (101); use test2; CREATE TABLE t2 (a INT PRIMARY KEY) ENGINE=InnoDB; INSERT INTO t2 VALUES (201); INSERT INTO t2 VALUES (202), (203); use test3; CREATE TABLE t3 (a INT PRIMARY KEY) ENGINE=InnoDB; INSERT INTO t3 VALUES (301); BEGIN; INSERT INTO t3 VALUES (302); INSERT INTO t3 VALUES (303); COMMIT; use test; UPDATE t1 SET a = a + 10; INSERT INTO t1 VALUES (102); DELETE FROM t1 WHERE a = 111; use test2; UPDATE t2 SET a = a + 10 WHERE a MOD 2 = 0; use test; INSERT INTO t1 VALUES (103); SELECT @@gtid_binlog_pos; SHOW BINLOG EVENTS; --save_master_pos --connection slave --sync_with_master SHOW BINLOG EVENTS; SELECT * FROM mysql.gtid_slave_pos ORDER BY domain_id, sub_id; SELECT @@gtid_binlog_pos; --error ER_NO_SUCH_TABLE SELECT * FROM test.t1; SELECT * FROM test2.t2; SELECT * FROM test3.t3; --connection master use test; DROP TABLE t1; use test2; DROP TABLE t2; use test3; DROP TABLE t3; use test; DROP DATABASE test2; DROP DATABASE test3; --save_master_pos --connection slave --sync_with_master --source include/stop_slave.inc SET GLOBAL replicate_do_db = @old_do_db; --source include/start_slave.inc --source include/rpl_end.inc In the output of this test case, we can see this: SELECT * FROM mysql.gtid_slave_pos ORDER BY domain_id, sub_id; domain_id sub_id server_id seq_no 0 14 1 14 0 15 1 15 SELECT @@gtid_slave_pos; @@gtid_slave_pos 0-1-15 SELECT @@gtid_binlog_pos; @@gtid_binlog_pos 0-1-14 So the problem is that for the empty transaction, even though there are no changes (and nothing logged into the binlog), we still update the gtid slave pos, which requires an expensive commit of the mysql.gtid_slave_pos table. It might have been deliberate to update the gtid_slave_pos, the idea being that if the slave is stopped and then re-started, it should not try to re-execute events that were already skipped. But in any case, I think this bug clearly shows that such behaviour is wrong. If user filters out events, she will not expect transaction commit of such filtered events. So this needs to be fixed, I think.
            Hide
            knielsen Kristian Nielsen added a comment -

            Hm, there is a potential problem with not updating the GTID state for events
            that are ignored.

            If a lot of events are ignored (perhaps within a specific domain only), the
            slave's current GTID position (within that domain) may remain at an old GTID
            for a long time. This could eventually lead the slave reconnect to fail due to
            the master having purged old binlog files.

            The slave does not need any of the events that the master has purged, but the
            master cannot know this.

            One way to handle this might be to update the table only occasinally. For
            example once for each master binlog file, which could be conveniently done by
            updating when the GTID_LIST_EVENT is received.

            But it might be somewhat confusing to users that the current gtid position
            jumps ahead in big jumps, seemingly at random.

            Another approach might be to update the in-memory @@gtid_slave_pos for every
            event, but update the underlying table only occasionally. In case of crash,
            the updated in-memory position is not preserved, but that is not a problem as
            the events will just be re-fetch from the master and ignored again. But it is
            somewhat ugly to introduce such an inconsistency between the table and the
            variable - and it could confuse users that expect to be able to query the
            table to get the current position.

            Show
            knielsen Kristian Nielsen added a comment - Hm, there is a potential problem with not updating the GTID state for events that are ignored. If a lot of events are ignored (perhaps within a specific domain only), the slave's current GTID position (within that domain) may remain at an old GTID for a long time. This could eventually lead the slave reconnect to fail due to the master having purged old binlog files. The slave does not need any of the events that the master has purged, but the master cannot know this. One way to handle this might be to update the table only occasinally. For example once for each master binlog file, which could be conveniently done by updating when the GTID_LIST_EVENT is received. But it might be somewhat confusing to users that the current gtid position jumps ahead in big jumps, seemingly at random. Another approach might be to update the in-memory @@gtid_slave_pos for every event, but update the underlying table only occasionally. In case of crash, the updated in-memory position is not preserved, but that is not a problem as the events will just be re-fetch from the master and ignored again. But it is somewhat ugly to introduce such an inconsistency between the table and the variable - and it could confuse users that expect to be able to query the table to get the current position.
            Hide
            knielsen Kristian Nielsen added a comment -

            Maybe we could make a general facility to not update the mysql.gtid_slave_pos
            in a crash-safe way, for users that do not want or need it.

            Then it could be configured to to eg. update all event groups crash safe,
            update only non-ignored groups crash-safe, update no groups crash safe.

            Then when a GTID is not to be recorded crash-safe, we can record it only in
            the in-memory hash. And then periodically, perhaps everytime a GTID_LIST event
            is seen or every N GTIDs, we can update the table.

            And we will also update whenever a slave thread stops, so that in the no-crash
            case there would be no difference for the user.

            This could then be configurable by the DBA, which could also help reduce the
            overhead introduced by the GTID update, for users that do not use GTID...

            Show
            knielsen Kristian Nielsen added a comment - Maybe we could make a general facility to not update the mysql.gtid_slave_pos in a crash-safe way, for users that do not want or need it. Then it could be configured to to eg. update all event groups crash safe, update only non-ignored groups crash-safe, update no groups crash safe. Then when a GTID is not to be recorded crash-safe, we can record it only in the in-memory hash. And then periodically, perhaps everytime a GTID_LIST event is seen or every N GTIDs, we can update the table. And we will also update whenever a slave thread stops, so that in the no-crash case there would be no difference for the user. This could then be configurable by the DBA, which could also help reduce the overhead introduced by the GTID update, for users that do not use GTID...

              People

              • Assignee:
                knielsen Kristian Nielsen
                Reporter:
                billr Bill Rios
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: