Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4470

the replication broken when use the xa transcation

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Not a Bug
    • Affects Version/s: 10.0.2
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      the replication broken when use the xa transaction
      master

      mysql> create table t(id int auto_increment primary key, a int) engine=innodb;
      mysql> xa start '111';
      mysql> insert into t(a) values(1);
      mysql> xa end '111';
      mysql> xa prepare '111';
      kill -9 master_pid
      

      restart the master

      mysql> xa recover;
      +----------+--------------+--------------+------+
      | formatID | gtrid_length | bqual_length | data |
      +----------+--------------+--------------+------+
      |        1 |            3 |            0 |  111 |
      +----------+--------------+--------------+------+
      mysql> xa commit '111';
      mysql> select * from t;
      Empty set (0.00 sec)
      

      why? I did the xa commit '111', but no result return, maybe another bug?
      do the test continue

      mysql> xa start '222';
      mysql> insert into t(a) values(2);
      mysql> xa end '222';
      mysql> prepare '222';
      mysql> xa commit '111';
      mysql> select * from t;
      +----+------+
      | id |    a |
      +----+------+
      |  1 |    1 |
      |  2 |    2 |
      +----+------+
      

      value(1,1) return at this time, why?

      what's happen on the slave?

      mysql> select * from t;
      +----+------+
      | id |    a |
      +----+------+
      |  2 |    2 |
      +----+------+
      

      the replication is broken now!!

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            boyce boyce added a comment -

            the replication broken when use the xa transaction
            master
            mysql> create table t(id int auto_increment primary key, a int) engine=innodb;
            mysql> xa start '111';
            mysql> insert into t(a) values(1);
            mysql> xa end '111';
            mysql> xa prepare '111';
            kill -9 master_pid
            restart the master
            mysql> xa recover;
            --------------------------------------+

            formatID gtrid_length bqual_length data

            --------------------------------------+

            1 3 0 111

            --------------------------------------+
            mysql> xa commit '111';
            mysql> select * from t;
            Empty set (0.00 sec)
            why? I did the xa commit '111', but no result return, maybe another bug?
            do the test continue
            mysql> xa start '222';
            mysql> insert into t(a) values(2);
            mysql> xa end '222';
            mysql> prepare '222';
            mysql> xa commit '111';
            mysql> select * from t;
            --------+

            id a

            --------+

            1 1
            2 2

            --------+
            value(1,1) return at this time, why?

            what's hanppen on the slave?
            mysql> select * from t;
            --------+

            id a

            --------+

            2 2

            --------+
            the replication is broken now!!

            Show
            boyce boyce added a comment - the replication broken when use the xa transaction master mysql> create table t(id int auto_increment primary key, a int) engine=innodb; mysql> xa start '111'; mysql> insert into t(a) values(1); mysql> xa end '111'; mysql> xa prepare '111'; kill -9 master_pid restart the master mysql> xa recover; --------- ------------ ------------ -----+ formatID gtrid_length bqual_length data --------- ------------ ------------ -----+ 1 3 0 111 --------- ------------ ------------ -----+ mysql> xa commit '111'; mysql> select * from t; Empty set (0.00 sec) why? I did the xa commit '111', but no result return, maybe another bug? do the test continue mysql> xa start '222'; mysql> insert into t(a) values(2); mysql> xa end '222'; mysql> prepare '222'; mysql> xa commit '111'; mysql> select * from t; --- -----+ id a --- -----+ 1 1 2 2 --- -----+ value(1,1) return at this time, why? what's hanppen on the slave? mysql> select * from t; --- -----+ id a --- -----+ 2 2 --- -----+ the replication is broken now!!
            Hide
            elenst Elena Stepanova added a comment -

            Hi,

            So, there were two questions in here:

            • why the record (1,1) didn't make it to the slave;
            • why the record (1,1) didn't show up on the first SELECT.

            The answer to the first question is that it's a limitation of MySQL XA transactions, see MySQL manual (http://dev.mysql.com/doc/refman/5.6/en/xa-restrictions.html):
            " If an XA transaction has reached the PREPARED state and the MySQL server is killed (for example, with kill -9 on Unix) or shuts down abnormally, the transaction can be continued after the server restarts. However, if the client reconnects and commits the transaction, the transaction will be absent from the binary log even though it has been committed. This means the data and the binary log have gone out of synchrony. An implication is that XA cannot be used safely together with replication. "

            This is exactly your scenario.

            The answer to the second question is less obvious. From all I see in your scenario, it's not an exact quote from your MySQL client, but rather a manual compilation of different bits and pieces. There are several indications of it:

            a) prepare '222';
            would have caused a syntax error (XA is missing)

            b) the second "xa commit '111';" (after "prepare '222'") would have caused "Unknown XID" error or such, since you had already committed '111' before;

            c) since you had never issued "xa commit '222'", there is no way (2,2) would have appeared on slave.

            So, something is obviously missing here, and something is wrong. My best guess is that by the time of your first "select * from t" you hadn't actually issued "xa commit '111'" yet, in this case you would have received an empty set indeed, even although the transaction is in prepared state.

            Please let us know whether the above answers your questions.

            Thanks.

            Show
            elenst Elena Stepanova added a comment - Hi, So, there were two questions in here: why the record (1,1) didn't make it to the slave; why the record (1,1) didn't show up on the first SELECT. The answer to the first question is that it's a limitation of MySQL XA transactions, see MySQL manual ( http://dev.mysql.com/doc/refman/5.6/en/xa-restrictions.html): " If an XA transaction has reached the PREPARED state and the MySQL server is killed (for example, with kill -9 on Unix) or shuts down abnormally, the transaction can be continued after the server restarts. However, if the client reconnects and commits the transaction, the transaction will be absent from the binary log even though it has been committed. This means the data and the binary log have gone out of synchrony. An implication is that XA cannot be used safely together with replication. " This is exactly your scenario. The answer to the second question is less obvious. From all I see in your scenario, it's not an exact quote from your MySQL client, but rather a manual compilation of different bits and pieces. There are several indications of it: a) prepare '222'; would have caused a syntax error (XA is missing) b) the second "xa commit '111';" (after "prepare '222'") would have caused "Unknown XID" error or such, since you had already committed '111' before; c) since you had never issued "xa commit '222'", there is no way (2,2) would have appeared on slave. So, something is obviously missing here, and something is wrong. My best guess is that by the time of your first "select * from t" you hadn't actually issued "xa commit '111'" yet, in this case you would have received an empty set indeed, even although the transaction is in prepared state. Please let us know whether the above answers your questions. Thanks.
            Hide
            boyce boyce added a comment -
            • why the record (1,1) didn't make it to the slave
              Thanks for your answer.
            • why the record (1,1) didn't show up on the first SELECT
              sorry , It's my fault, there maybe some input errors. so I test again and copy the result blew:

            mysql> truncate table t;
            Query OK, 0 rows affected (0.00 sec)

            mysql> xa start '111';
            Query OK, 0 rows affected (0.00 sec)

            mysql> insert into t(a) values(1);
            Query OK, 1 row affected (0.00 sec)

            mysql> xa end '111';
            Query OK, 0 rows affected (0.00 sec)

            mysql> xa prepare '111';
            Query OK, 0 rows affected (0.00 sec)

            kill -9 pid at this time

            mysql> xa recover;
            ERROR 2006 (HY000): MySQL server has gone away
            No connection. Trying to reconnect...
            Connection id: 2
            Current database: mydb

            --------------------------------------+

            formatID gtrid_length bqual_length data

            --------------------------------------+

            1 3 0 111

            --------------------------------------+
            1 row in set (0.00 sec)

            mysql> select * from t;
            Empty set (0.00 sec)

            mysql> xa commit '111';
            Query OK, 0 rows affected (0.00 sec)

            mysql> select * from t;
            Empty set (0.00 sec)

            why? the value(1,1) not return!

            mysql> xa start '222';
            Query OK, 0 rows affected (0.00 sec)

            mysql> insert into t(a) values(2);
            Query OK, 1 row affected (0.00 sec)

            mysql> xa end '222';
            Query OK, 0 rows affected (0.00 sec)

            mysql> xa prepare '222';
            Query OK, 0 rows affected (0.01 sec)

            mysql> xa commit '222';
            Query OK, 0 rows affected (0.00 sec)

            mysql> select * from t;
            --------+

            id a

            --------+

            1 1
            2 2

            --------+
            2 rows in set (0.00 sec)

            why? return the value(1,1),(2,2)?

            Show
            boyce boyce added a comment - why the record (1,1) didn't make it to the slave Thanks for your answer. why the record (1,1) didn't show up on the first SELECT sorry , It's my fault, there maybe some input errors. so I test again and copy the result blew: mysql> truncate table t; Query OK, 0 rows affected (0.00 sec) mysql> xa start '111'; Query OK, 0 rows affected (0.00 sec) mysql> insert into t(a) values(1); Query OK, 1 row affected (0.00 sec) mysql> xa end '111'; Query OK, 0 rows affected (0.00 sec) mysql> xa prepare '111'; Query OK, 0 rows affected (0.00 sec) kill -9 pid at this time mysql> xa recover; ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 2 Current database: mydb --------- ------------ ------------ -----+ formatID gtrid_length bqual_length data --------- ------------ ------------ -----+ 1 3 0 111 --------- ------------ ------------ -----+ 1 row in set (0.00 sec) mysql> select * from t; Empty set (0.00 sec) mysql> xa commit '111'; Query OK, 0 rows affected (0.00 sec) mysql> select * from t; Empty set (0.00 sec) why? the value(1,1) not return! mysql> xa start '222'; Query OK, 0 rows affected (0.00 sec) mysql> insert into t(a) values(2); Query OK, 1 row affected (0.00 sec) mysql> xa end '222'; Query OK, 0 rows affected (0.00 sec) mysql> xa prepare '222'; Query OK, 0 rows affected (0.01 sec) mysql> xa commit '222'; Query OK, 0 rows affected (0.00 sec) mysql> select * from t; --- -----+ id a --- -----+ 1 1 2 2 --- -----+ 2 rows in set (0.00 sec) why? return the value(1,1),(2,2)?
            Hide
            elenst Elena Stepanova added a comment -

            Thanks, it makes sense now. The important difference is yet another SELECT which you ran between XA RECOVER '111' and XA COMMIT '111'.
            I assume you have query cache enabled (you can check the value of query_cache_size in your config file or in SHOW VARIABLES).
            With the query cache, I'm getting the problem as well:

            MariaDB [test]> create table t(id int auto_increment primary key, a int) engine=innodb;
            Query OK, 0 rows affected (0.24 sec)

            MariaDB [test]> xa start '111';
            Query OK, 0 rows affected (0.00 sec)

            MariaDB [test]> insert into t(a) values(1);
            Query OK, 1 row affected (0.00 sec)

            MariaDB [test]> xa end '111';
            Query OK, 0 rows affected (0.00 sec)

            MariaDB [test]> xa prepare '111';
            Query OK, 0 rows affected (0.27 sec)

            MariaDB [test]> system killall -s 9 mysqld
            MariaDB [test]> system start_server.sh --query-cache-size=1048576

            MariaDB [test]> xa recover;
            ERROR 2013 (HY000): Lost connection to MySQL server during query
            MariaDB [test]> xa recover;
            ERROR 2006 (HY000): MySQL server has gone away
            No connection. Trying to reconnect...
            Connection id: 2
            Current database: test

            --------------------------------------+

            formatID gtrid_length bqual_length data

            --------------------------------------+

            1 3 0 111

            --------------------------------------+
            1 row in set (0.01 sec)

            MariaDB [test]> select * from t;
            Empty set (0.00 sec)

            MariaDB [test]> xa commit '111';
            Query OK, 0 rows affected (0.10 sec)

            MariaDB [test]> select * from t;
            Empty set (0.00 sec)

            MariaDB [test]> select sql_no_cache * from t;
            --------+

            id a

            --------+

            1 1

            --------+
            1 row in set (0.00 sec)

            MariaDB [test]> show status like 'Qcache_hits';
            --------------------+

            Variable_name Value

            --------------------+

            Qcache_hits 1

            --------------------+
            1 row in set (0.00 sec)

            As you can see above, the SELECT hits the query cache. I'd say it's a bug, but I suggest to file it separately.

            Show
            elenst Elena Stepanova added a comment - Thanks, it makes sense now. The important difference is yet another SELECT which you ran between XA RECOVER '111' and XA COMMIT '111'. I assume you have query cache enabled (you can check the value of query_cache_size in your config file or in SHOW VARIABLES). With the query cache, I'm getting the problem as well: MariaDB [test] > create table t(id int auto_increment primary key, a int) engine=innodb; Query OK, 0 rows affected (0.24 sec) MariaDB [test] > xa start '111'; Query OK, 0 rows affected (0.00 sec) MariaDB [test] > insert into t(a) values(1); Query OK, 1 row affected (0.00 sec) MariaDB [test] > xa end '111'; Query OK, 0 rows affected (0.00 sec) MariaDB [test] > xa prepare '111'; Query OK, 0 rows affected (0.27 sec) MariaDB [test] > system killall -s 9 mysqld MariaDB [test] > system start_server.sh --query-cache-size=1048576 MariaDB [test] > xa recover; ERROR 2013 (HY000): Lost connection to MySQL server during query MariaDB [test] > xa recover; ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 2 Current database: test --------- ------------ ------------ -----+ formatID gtrid_length bqual_length data --------- ------------ ------------ -----+ 1 3 0 111 --------- ------------ ------------ -----+ 1 row in set (0.01 sec) MariaDB [test] > select * from t; Empty set (0.00 sec) MariaDB [test] > xa commit '111'; Query OK, 0 rows affected (0.10 sec) MariaDB [test] > select * from t; Empty set (0.00 sec) MariaDB [test] > select sql_no_cache * from t; --- -----+ id a --- -----+ 1 1 --- -----+ 1 row in set (0.00 sec) MariaDB [test] > show status like 'Qcache_hits'; -------------- ------+ Variable_name Value -------------- ------+ Qcache_hits 1 -------------- ------+ 1 row in set (0.00 sec) As you can see above, the SELECT hits the query cache. I'd say it's a bug, but I suggest to file it separately.
            Hide
            boyce boyce added a comment -

            Thanks
            I want to check all the data is ok, so do the select between 'xa recover' and 'xa commit' , to my surprise, I hit this bug, I'll create another issue for it.
            do test without the select, it's right, thanks again.

            mysql> truncate table t;
            Query OK, 0 rows affected (0.00 sec)

            mysql>
            mysql>
            mysql> select * from t;
            Empty set (0.00 sec)

            mysql> xa start '111';
            Query OK, 0 rows affected (0.00 sec)

            mysql> insert into t(a) values(1);
            Query OK, 1 row affected (0.00 sec)

            mysql> xa end '111';
            Query OK, 0 rows affected (0.00 sec)

            mysql> xa prepare '111';
            Query OK, 0 rows affected (0.00 sec)

            mysql> xa recover;
            ERROR 2006 (HY000): MySQL server has gone away
            No connection. Trying to reconnect...
            Connection id: 2
            Current database: mydb

            --------------------------------------+

            formatID gtrid_length bqual_length data

            --------------------------------------+

            1 3 0 111

            --------------------------------------+
            1 row in set (0.00 sec)

            mysql> xa commit '111';
            Query OK, 0 rows affected (0.00 sec)

            mysql> select * from t;
            --------+

            id a

            --------+

            1 1

            --------+
            1 row in set (0.00 sec)

            Show
            boyce boyce added a comment - Thanks I want to check all the data is ok, so do the select between 'xa recover' and 'xa commit' , to my surprise, I hit this bug, I'll create another issue for it. do test without the select, it's right, thanks again. mysql> truncate table t; Query OK, 0 rows affected (0.00 sec) mysql> mysql> mysql> select * from t; Empty set (0.00 sec) mysql> xa start '111'; Query OK, 0 rows affected (0.00 sec) mysql> insert into t(a) values(1); Query OK, 1 row affected (0.00 sec) mysql> xa end '111'; Query OK, 0 rows affected (0.00 sec) mysql> xa prepare '111'; Query OK, 0 rows affected (0.00 sec) mysql> xa recover; ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 2 Current database: mydb --------- ------------ ------------ -----+ formatID gtrid_length bqual_length data --------- ------------ ------------ -----+ 1 3 0 111 --------- ------------ ------------ -----+ 1 row in set (0.00 sec) mysql> xa commit '111'; Query OK, 0 rows affected (0.00 sec) mysql> select * from t; --- -----+ id a --- -----+ 1 1 --- -----+ 1 row in set (0.00 sec)
            Hide
            elenst Elena Stepanova added a comment -

            As discussed in the comments, the issue with replication is a known MySQL XA limitation, so I am closing it as 'Not a bug'.
            A problem with SELECT has been filed separately as MDEV-4471.

            Show
            elenst Elena Stepanova added a comment - As discussed in the comments, the issue with replication is a known MySQL XA limitation, so I am closing it as 'Not a bug'. A problem with SELECT has been filed separately as MDEV-4471 .
            Hide
            jeremycole Jeremy Cole added a comment -

            For tracking purposes maybe it makes sense to create a new resolution of "Known bug with upstream" and use that for such cases. This is very definitely a bug, just not a bug with MariaDB exclusively.

            Show
            jeremycole Jeremy Cole added a comment - For tracking purposes maybe it makes sense to create a new resolution of "Known bug with upstream" and use that for such cases. This is very definitely a bug, just not a bug with MariaDB exclusively.
            Hide
            elenst Elena Stepanova added a comment -

            For tracking purposes we have that, it's label 'upstream'.

            But which part do you mean saying "definitely a bug" – replication issue or the SELECT with query cache?
            For the replication issue, let me re-quote again the manual:
            (http://dev.mysql.com/doc/refman/5.6/en/xa-restrictions.html):
            " If an XA transaction has reached the PREPARED state and the MySQL server is killed (for example, with kill -9 on Unix) or shuts down abnormally, the transaction can be continued after the server restarts. However, if the client reconnects and commits the transaction, the transaction will be absent from the binary log even though it has been committed. This means the data and the binary log have gone out of synchrony. An implication is that XA cannot be used safely together with replication. "

            I wouldn't even bother to file it as a bug at bugs.mysql.com, it will be closed immediately as a documented limitation.

            SELECT is a bug, it's been filed separately, as said before.

            Show
            elenst Elena Stepanova added a comment - For tracking purposes we have that, it's label 'upstream'. But which part do you mean saying "definitely a bug" – replication issue or the SELECT with query cache? For the replication issue, let me re-quote again the manual: ( http://dev.mysql.com/doc/refman/5.6/en/xa-restrictions.html): " If an XA transaction has reached the PREPARED state and the MySQL server is killed (for example, with kill -9 on Unix) or shuts down abnormally, the transaction can be continued after the server restarts. However, if the client reconnects and commits the transaction, the transaction will be absent from the binary log even though it has been committed. This means the data and the binary log have gone out of synchrony. An implication is that XA cannot be used safely together with replication. " I wouldn't even bother to file it as a bug at bugs.mysql.com, it will be closed immediately as a documented limitation. SELECT is a bug, it's been filed separately, as said before.

              People

              • Assignee:
                elenst Elena Stepanova
                Reporter:
                boyce boyce
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: