Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4906

stop slave; start slave; increments the GTId on a slave

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.0.4
    • Fix Version/s: 10.0.5
    • Component/s: None
    • Labels:
    • Environment:

      Description

      With a simple test table, shown here on the slave:

      mysql 15:39:00 [PA DEV 002] [eis] [3202]> select * from mfp;
      +------+
      | c1   |
      +------+
      |   11 |
      |   22 |
      |   33 |
      |   44 |
      |   55 |
      +------+
      5 rows in set (0.00 sec)
      
      mysql 15:39:02 [PA DEV 002] [eis] [3202]> delete from mfp;
      Query OK, 5 rows affected (0.00 sec)
      

      I then also deleted all these rows on the master simply to explore the options for repairing replication using

      SET GLOBAL gtid_slave_pos = 'XXX'
      

      However, simply stopping and starting the slave ( sometimes twice ) skipped the transactions with no other actions required:

      mysql 15:39:09 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1412 |
      | gtid_current_pos | 3102-223202-1412 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-409  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
      
      mysql 15:40:24 [PA DEV 002] [eis] [3202]> stop slave; start slave;
      Query OK, 0 rows affected (0.00 sec)
      
      Query OK, 0 rows affected (0.01 sec)
      
      mysql 15:45:21 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1413 |
      | gtid_current_pos | 3102-223202-1413 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-410  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
      
      mysql 15:45:24 [PA DEV 002] [eis] [3202]> stop slave; start slave;
      Query OK, 0 rows affected (0.00 sec)
      
      Query OK, 0 rows affected (0.00 sec)
      
      mysql 15:45:28 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1413 |
      | gtid_current_pos | 3102-223202-1413 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-410  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
      
      mysql 15:45:30 [PA DEV 002] [eis] [3202]> stop slave; start slave;
      Query OK, 0 rows affected (0.00 sec)
      
      Query OK, 0 rows affected (0.01 sec)
      
      mysql 15:45:32 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1413 |
      | gtid_current_pos | 3102-223202-1413 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-411  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
      
      mysql 15:45:34 [PA DEV 002] [eis] [3202]> stop slave; start slave;
      Query OK, 0 rows affected (0.00 sec)
      
      Query OK, 0 rows affected (0.01 sec)
      
      mysql 15:45:36 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1413 |
      | gtid_current_pos | 3102-223202-1413 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-411  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
      
      mysql 15:45:37 [PA DEV 002] [eis] [3202]> stop slave; start slave;
      Query OK, 0 rows affected (0.00 sec)
      
      Query OK, 0 rows affected (0.01 sec)
      
      mysql 15:45:39 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1413 |
      | gtid_current_pos | 3102-223202-1413 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-412  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
      
      mysql 15:45:41 [PA DEV 002] [eis] [3202]> stop slave; start slave;
      Query OK, 0 rows affected (0.00 sec)
      
      Query OK, 0 rows affected (0.00 sec)
      
      mysql 15:45:43 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1413 |
      | gtid_current_pos | 3102-223202-1413 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-412  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
      
      mysql 15:45:44 [PA DEV 002] [eis] [3202]> stop slave; start slave;
      Query OK, 0 rows affected (0.00 sec)
      
      Query OK, 0 rows affected (0.00 sec)
      
      mysql 15:45:47 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1413 |
      | gtid_current_pos | 3102-223202-1413 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-413  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
      
      mysql 15:46:23 [PA DEV 002] [eis] [3202]> stop slave; start slave;
      Query OK, 0 rows affected (0.00 sec)
      
      Query OK, 0 rows affected (0.00 sec)
      
      mysql 15:46:26 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1413 |
      | gtid_current_pos | 3102-223202-1413 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-413  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
      
      mysql 15:46:28 [PA DEV 002] [eis] [3202]> stop slave; start slave;
      Query OK, 0 rows affected (0.00 sec)
      
      Query OK, 0 rows affected (0.01 sec)
      
      mysql 15:46:29 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1413 |
      | gtid_current_pos | 3102-223202-1413 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-414  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
      
      mysql 15:46:31 [PA DEV 002] [eis] [3202]> stop slave; start slave;
      Query OK, 0 rows affected (0.00 sec)
      
      Query OK, 0 rows affected (0.01 sec)
      
      mysql 15:46:39 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1413 |
      | gtid_current_pos | 3102-223202-1413 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-416  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
      
      mysql 15:46:41 [PA DEV 002] [eis] [3202]>
      

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            mathnode Richard Bensley added a comment -

            Master

            MariaDB [test]> show variables like '%gtid%';
            +------------------+----------------+
            | Variable_name    | Value          |
            +------------------+----------------+
            | gtid_binlog_pos  | 0-1376396033-1 |
            | gtid_current_pos | 0-1376396033-1 |
            | gtid_domain_id   | 0              |
            | gtid_seq_no      | 0              |
            | gtid_slave_pos   |                |
            | gtid_strict_mode | OFF            |
            +------------------+----------------+
            6 rows in set (0.00 sec)
            
            MariaDB [test]> create table t (a int auto_increment primary key);
            Query OK, 0 rows affected (0.32 sec)
            
            rbensley@rbensley:~$ for i in {1..10}; do mysql -P3000 test -e "insert into t values ();"; done
            

            Slave

            MariaDB [test]> select count(*) from t;
            +----------+
            | count(*) |
            +----------+
            |       10 |
            +----------+
            1 row in set (0.00 sec)
            
            MariaDB [test]> delete from t where a < 4;
            Query OK, 3 rows affected (0.09 sec)
            
            MariaDB [test]> select count(*) from t;
            +----------+
            | count(*) |
            +----------+
            |        7 |
            +----------+
            1 row in set (0.00 sec)
            

            Master

            MariaDB [test]> delete from t;
            Query OK, 10 rows affected (0.10 sec)
            

            The Slave will now show a replication failure:

            Last_SQL_Error: Could not execute Delete_rows event on table test.t; Can't find record in 't', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-log-bin.000003, end_log_pos 2070
            Replicate_Ignore_Server_Ids: 
            Master_Server_Id: 1376396033
            Master_SSL_Crl: 
            Master_SSL_Crlpath: 
            Using_Gtid: Slave_Pos
            

            From the slave, I run "stop slave; start slave;"
            And the GTID replication rolls forward!

            MariaDB [test]> show slave status\G
            *************************** 1. row ***************************
                           Slave_IO_State: Waiting for master to send event
                              Master_Host: 127.0.0.1
                              Master_User: root
                              Master_Port: 3000
                            Connect_Retry: 60
                          Master_Log_File: mysql-log-bin.000003
                      Read_Master_Log_Pos: 8641
                           Relay_Log_File: mysql-relay-bin.000002
                            Relay_Log_Pos: 781
                    Relay_Master_Log_File: mysql-log-bin.000003
                         Slave_IO_Running: Yes
                        Slave_SQL_Running: Yes
                          Replicate_Do_DB: 
                      Replicate_Ignore_DB: 
                       Replicate_Do_Table: 
                   Replicate_Ignore_Table: 
                  Replicate_Wild_Do_Table: 
              Replicate_Wild_Ignore_Table: 
                               Last_Errno: 0
                               Last_Error: 
                             Skip_Counter: 0
                      Exec_Master_Log_Pos: 8641
                          Relay_Log_Space: 1078
                          Until_Condition: None
                           Until_Log_File: 
                            Until_Log_Pos: 0
                       Master_SSL_Allowed: No
                       Master_SSL_CA_File: 
                       Master_SSL_CA_Path: 
                          Master_SSL_Cert: 
                        Master_SSL_Cipher: 
                           Master_SSL_Key: 
                    Seconds_Behind_Master: 0
            Master_SSL_Verify_Server_Cert: No
                            Last_IO_Errno: 0
                            Last_IO_Error: 
                           Last_SQL_Errno: 0
                           Last_SQL_Error: 
              Replicate_Ignore_Server_Ids: 
                         Master_Server_Id: 1376396033
                           Master_SSL_Crl: 
                       Master_SSL_Crlpath: 
                               Using_Gtid: Slave_Pos
            
            
            Show
            mathnode Richard Bensley added a comment - Master MariaDB [test]> show variables like '%gtid%'; +------------------+----------------+ | Variable_name | Value | +------------------+----------------+ | gtid_binlog_pos | 0-1376396033-1 | | gtid_current_pos | 0-1376396033-1 | | gtid_domain_id | 0 | | gtid_seq_no | 0 | | gtid_slave_pos | | | gtid_strict_mode | OFF | +------------------+----------------+ 6 rows in set (0.00 sec) MariaDB [test]> create table t (a int auto_increment primary key); Query OK, 0 rows affected (0.32 sec) rbensley@rbensley:~$ for i in {1..10}; do mysql -P3000 test -e "insert into t values ();" ; done Slave MariaDB [test]> select count(*) from t; +----------+ | count(*) | +----------+ | 10 | +----------+ 1 row in set (0.00 sec) MariaDB [test]> delete from t where a < 4; Query OK, 3 rows affected (0.09 sec) MariaDB [test]> select count(*) from t; +----------+ | count(*) | +----------+ | 7 | +----------+ 1 row in set (0.00 sec) Master MariaDB [test]> delete from t; Query OK, 10 rows affected (0.10 sec) The Slave will now show a replication failure: Last_SQL_Error: Could not execute Delete_rows event on table test.t; Can't find record in 't', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-log-bin.000003, end_log_pos 2070 Replicate_Ignore_Server_Ids: Master_Server_Id: 1376396033 Master_SSL_Crl: Master_SSL_Crlpath: Using_Gtid: Slave_Pos From the slave, I run "stop slave; start slave;" And the GTID replication rolls forward! MariaDB [test]> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 127.0.0.1 Master_User: root Master_Port: 3000 Connect_Retry: 60 Master_Log_File: mysql-log-bin.000003 Read_Master_Log_Pos: 8641 Relay_Log_File: mysql-relay-bin.000002 Relay_Log_Pos: 781 Relay_Master_Log_File: mysql-log-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 8641 Relay_Log_Space: 1078 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1376396033 Master_SSL_Crl: Master_SSL_Crlpath: Using_Gtid: Slave_Pos
            Hide
            elenst Elena Stepanova added a comment -

            Test case:

            --source include/have_innodb.inc
            --source include/master-slave.inc
            --source include/have_binlog_format_row.inc

            --connection slave
            --source include/stop_slave.inc
            set sql_log_bin = 0;
            alter table mysql.gtid_slave_pos engine = InnoDB;
            change master to master_use_gtid = current_pos;
            set sql_log_bin = 1;
            --source include/start_slave.inc

            --connection master
            --echo # [master]
            create table t1 (i int) engine=innodb;
            insert into t1 values (1),(2);

            --sync_slave_with_master
            --echo # [slave]
            -echo #-----------------------------
            --echo # Removing a row manually
            -echo #-----------------------------
            delete from t1 where i = 1;

            --connection master
            -echo #-----------------------------
            --echo # [master]
            --echo # Inserting something else, to raise the GTID
            --echo # and avoid the error on the next slave connection, caused
            --echo # by the non-existing seqno
            -echo #-----------------------------
            insert into t1 values (3),(4);
            insert into t1 values (5),(6);
            delete from t1;
            select @@gtid_binlog_pos;

            --connection slave
            --echo # [slave]
            -echo #-----------------------------
            --echo # Waiting for the last INSERT (it has GTID 0-1-4)
            --echo # and checking that replication aborted due to the missing row
            -echo #-----------------------------
            --let $wait_condition = SELECT @@gtid_slave_pos = '0-1-4'
            --source include/wait_condition.inc

            show variables like 'gtid%';
            query_vertical show slave status;

            -echo #-----------------------------
            --echo # Stopping and starting the slave,
            --echo # waiting till it advanced to the next gtid_slave_pos
            --echo # and checking that it aborted again due to the missing row
            -echo #-----------------------------

            --source include/stop_slave.inc
            start slave;
            --let $wait_condition = SELECT @@gtid_slave_pos = '0-1-5'
            --source include/wait_condition.inc
            show variables like 'gtid%';
            query_vertical show slave status;

            -echo #-----------------------------
            --echo # Stopping and starting the slave again,
            --echo # and checking that now it did NOT abort
            -echo #-----------------------------

            --source include/stop_slave.inc
            --source include/start_slave.inc

            --let $wait_condition = SELECT @@gtid_slave_pos = '0-1-5'
            --source include/wait_condition.inc

            show variables like 'gtid%';
            query_vertical show slave status;

            --connection master
            --echo # [master]
            drop table t1;

            --sync_slave_with_master
            --echo # [slave]
            -echo #-----------------------------
            --echo # And now the slave is willing to replicate further
            -echo #-----------------------------
            --let $wait_condition = SELECT @@gtid_slave_pos = '0-1-6'
            --source include/wait_condition.inc
            show variables like 'gtid%';
            query_vertical show slave status;

            --connection master
            --source include/rpl_end.inc

            Show
            elenst Elena Stepanova added a comment - Test case: --source include/have_innodb.inc --source include/master-slave.inc --source include/have_binlog_format_row.inc --connection slave --source include/stop_slave.inc set sql_log_bin = 0; alter table mysql.gtid_slave_pos engine = InnoDB; change master to master_use_gtid = current_pos; set sql_log_bin = 1; --source include/start_slave.inc --connection master --echo # [master] create table t1 (i int) engine=innodb; insert into t1 values (1),(2); --sync_slave_with_master --echo # [slave] - echo # ----------------------------- --echo # Removing a row manually - echo # ----------------------------- delete from t1 where i = 1; --connection master - echo # ----------------------------- --echo # [master] --echo # Inserting something else, to raise the GTID --echo # and avoid the error on the next slave connection, caused --echo # by the non-existing seqno - echo # ----------------------------- insert into t1 values (3),(4); insert into t1 values (5),(6); delete from t1; select @@gtid_binlog_pos; --connection slave --echo # [slave] - echo # ----------------------------- --echo # Waiting for the last INSERT (it has GTID 0-1-4) --echo # and checking that replication aborted due to the missing row - echo # ----------------------------- --let $wait_condition = SELECT @@gtid_slave_pos = '0-1-4' --source include/wait_condition.inc show variables like 'gtid%'; query_vertical show slave status; - echo # ----------------------------- --echo # Stopping and starting the slave, --echo # waiting till it advanced to the next gtid_slave_pos --echo # and checking that it aborted again due to the missing row - echo # ----------------------------- --source include/stop_slave.inc start slave; --let $wait_condition = SELECT @@gtid_slave_pos = '0-1-5' --source include/wait_condition.inc show variables like 'gtid%'; query_vertical show slave status; - echo # ----------------------------- --echo # Stopping and starting the slave again, --echo # and checking that now it did NOT abort - echo # ----------------------------- --source include/stop_slave.inc --source include/start_slave.inc --let $wait_condition = SELECT @@gtid_slave_pos = '0-1-5' --source include/wait_condition.inc show variables like 'gtid%'; query_vertical show slave status; --connection master --echo # [master] drop table t1; --sync_slave_with_master --echo # [slave] - echo # ----------------------------- --echo # And now the slave is willing to replicate further - echo # ----------------------------- --let $wait_condition = SELECT @@gtid_slave_pos = '0-1-6' --source include/wait_condition.inc show variables like 'gtid%'; query_vertical show slave status; --connection master --source include/rpl_end.inc
            Hide
            knielsen Kristian Nielsen added a comment -

            Pushed to 10.0-base:

            MDEV-4906: When event apply fails, next SQL thread start errorneously commits the failing GTID to gtid_slave_pos

            When a GTID event is executed, we remember the contained GTID position so that
            when we have applied the entire event group we can commit it to
            gtid_slave_pos.

            However, if the event group fails to apply due to some error and the SQL
            thread aborts, the code did not correctly clear the remembered GTID. Thus,
            when SQL thread was restarted, the old GTID of the failing event group was
            incorrectly updated to gtid_slave_pos when the initial rotate event was
            executed, corrupting the GTID position.

            Show
            knielsen Kristian Nielsen added a comment - Pushed to 10.0-base: MDEV-4906 : When event apply fails, next SQL thread start errorneously commits the failing GTID to gtid_slave_pos When a GTID event is executed, we remember the contained GTID position so that when we have applied the entire event group we can commit it to gtid_slave_pos. However, if the event group fails to apply due to some error and the SQL thread aborts, the code did not correctly clear the remembered GTID. Thus, when SQL thread was restarted, the old GTID of the failing event group was incorrectly updated to gtid_slave_pos when the initial rotate event was executed, corrupting the GTID position.

              People

              • Assignee:
                knielsen Kristian Nielsen
                Reporter:
                mpattiso Michael Pattison
              • Votes:
                1 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: