Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4106

ALTER TABLE races in replication on slave side

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: 10.0.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:
      Linux RHEL6

      Description

      Setup:

      Master: Percona 5.5.24-rel26.0.256
      Middle: Percona 5.5.20-rel24.1.217
      Slave: MariaDB-10.0.0

      On Master:
      Say you have the following table:

      CREATE TABLE `audit_publication_publication_rtb` (
      `audit_publication_publication_rtb_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `publication_id` int(10) unsigned NOT NULL,
      `old_publisher_id` int(10) unsigned DEFAULT NULL,
      `old_rtb_id` varchar(255) DEFAULT NULL,
      `new_publisher_id` int(10) unsigned DEFAULT NULL,
      `new_rtb_id` varchar(255) DEFAULT NULL,
      `record_action` int(1) DEFAULT NULL,
      `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
      PRIMARY KEY (`audit_publication_publication_rtb_id`)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

      This table is being inserted into constantly.

      I then run the following statement:
      ALTER TABLE adfonic.audit_publication_publication_rtb
      ADD old_status VARCHAR(32) AFTER old_rtb_id,
      ADD new_status VARCHAR(32) AFTER new_rtb_id;

      This propagates fine to the Middle server.

      However, on the Slave server running MariaDB 10.0.0, this seems to cause replication to break:

      Slave_SQL_Running is No
      Last_Error is Column 4 of table 'db.audit_publication_publication_rtb' cannot be converted from type 'varchar(96)' to type 'int(10) unsigned', Last_SQL_Error is Column 4 of table 'db.audit_publication_publication_rtb' cannot be converted from type 'varchar(96)' to type 'int(10) unsigned'

      At a glance, it looks like the inserts in new format are somehow overtaking the alter table statement in the replication stream.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            elenst Elena Stepanova added a comment -

            Hi Gordan,

            Can you locate and paste the fragments from the binary log files (master and middle servers) related to guilty statement/event and preceding it?

            Finding them in the middle server binlog is simple, since the slave's error log provides the exact binlog name and position where the problem occurred. Finding them in the master log might be trickier, but should also be possible.

            Show
            elenst Elena Stepanova added a comment - Hi Gordan, Can you locate and paste the fragments from the binary log files (master and middle servers) related to guilty statement/event and preceding it? Finding them in the middle server binlog is simple, since the slave's error log provides the exact binlog name and position where the problem occurred. Finding them in the master log might be trickier, but should also be possible.
            Hide
            gordan Gordan Bobic added a comment -

            Additional info:

            This replication failure also affects MySQL 5.6.6-m9.
            Could it be that a commit in the 5.6.x branch of MySQL is responsible for this problem and it got pulled into MariaDB 10.0.0?

            Show
            gordan Gordan Bobic added a comment - Additional info: This replication failure also affects MySQL 5.6.6-m9. Could it be that a commit in the 5.6.x branch of MySQL is responsible for this problem and it got pulled into MariaDB 10.0.0?
            Hide
            gordan Gordan Bobic added a comment -

            I have uploaded a processed binog extract, with 100 preceeding and 100 following statements to ftp://ftp.askmonty.org/private/. The file name is prefixed with the ticket number. The slave reported the error at binlog position 495830995.

            Show
            gordan Gordan Bobic added a comment - I have uploaded a processed binog extract, with 100 preceeding and 100 following statements to ftp://ftp.askmonty.org/private/ . The file name is prefixed with the ticket number. The slave reported the error at binlog position 495830995.
            Hide
            elenst Elena Stepanova added a comment -

            There has always been a fair share of race conditions in binary logging involving DDL, so generally it's likely to be a legacy bug. What makes your case unusual though, is that if your middle server only works as a slave for 'master' and as a master for 'slave', without any direct inflow, there should be no concurrency at all. If it executes the statements in the correct order, it should write them the same way. That's why it's interesting to see the binary logs from both servers.

            Do I understand correctly that you have the simple replication setup master => middle => slave, nothing fancy with slave using multi-master replication and getting updates from both master and middle?

            Show
            elenst Elena Stepanova added a comment - There has always been a fair share of race conditions in binary logging involving DDL, so generally it's likely to be a legacy bug. What makes your case unusual though, is that if your middle server only works as a slave for 'master' and as a master for 'slave', without any direct inflow, there should be no concurrency at all. If it executes the statements in the correct order, it should write them the same way. That's why it's interesting to see the binary logs from both servers. Do I understand correctly that you have the simple replication setup master => middle => slave, nothing fancy with slave using multi-master replication and getting updates from both master and middle?
            Hide
            elenst Elena Stepanova added a comment -

            Thank you, I will check the logs shortly.

            Show
            elenst Elena Stepanova added a comment - Thank you, I will check the logs shortly.
            Hide
            gordan Gordan Bobic added a comment -

            The MySQL 5.6 server only had one master, but the MariaDB 10.0 server had extra masters (this was the main reason why we deployed them). None of the other masters, however, would have had that table on them.

            Show
            gordan Gordan Bobic added a comment - The MySQL 5.6 server only had one master, but the MariaDB 10.0 server had extra masters (this was the main reason why we deployed them). None of the other masters, however, would have had that table on them.
            Hide
            elenst Elena Stepanova added a comment -

            Gordan,

            For the binary log file, this fragment does not contain the ALTER statement. Could you please extract the part where both ALTER and the guilty event (and some more after it) are included?
            While doing so, please run mysqlbinlog with "--base64-output=DECODE-ROWS --verbose" options, it will make the result readable.

            Also, if you have access to the middle and slave servers, could you run SHOW CREATE TABLE for the table in question on each of them and provide the output?

            Finally, if possible, please attach your middle and slave cnf files.

            Thanks!

            Show
            elenst Elena Stepanova added a comment - Gordan, For the binary log file, this fragment does not contain the ALTER statement. Could you please extract the part where both ALTER and the guilty event (and some more after it) are included? While doing so, please run mysqlbinlog with "--base64-output=DECODE-ROWS --verbose" options, it will make the result readable. Also, if you have access to the middle and slave servers, could you run SHOW CREATE TABLE for the table in question on each of them and provide the output? Finally, if possible, please attach your middle and slave cnf files. Thanks!
            Hide
            elenst Elena Stepanova added a comment -

            Hi Gordan,

            Are you still having the problem?

            Looking at the description again, I got a question. Are using any replicate-do-db, replicate-ignore-db, or replicate-rewrite-db options?
            Your ALTER TABLE works on adfonic.audit_publication_publication_rtb, while the replication error is about db.audit_publication_publication_rtb, so I assume you have (or expect to have) adfonic => db mapping?
            But then again, replicate-rewrite-db only works if the source database is a default database on masteer, which might well not to be true in your setup. It's possible that ALTER ends up being ignored.

            So, if you are still interested in this report, please provide config files from all involved servers, and if possible the fragments of binary logs which contain the ALTER statement.

            Thanks.

            Show
            elenst Elena Stepanova added a comment - Hi Gordan, Are you still having the problem? Looking at the description again, I got a question. Are using any replicate-do-db, replicate-ignore-db, or replicate-rewrite-db options? Your ALTER TABLE works on adfonic.audit_publication_publication_rtb, while the replication error is about db.audit_publication_publication_rtb, so I assume you have (or expect to have) adfonic => db mapping? But then again, replicate-rewrite-db only works if the source database is a default database on masteer, which might well not to be true in your setup. It's possible that ALTER ends up being ignored. So, if you are still interested in this report, please provide config files from all involved servers, and if possible the fragments of binary logs which contain the ALTER statement. Thanks.
            Hide
            elenst Elena Stepanova added a comment -

            Closing it as incomplete for now, if you have any new information, please add it to re-open the report.

            Show
            elenst Elena Stepanova added a comment - Closing it as incomplete for now, if you have any new information, please add it to re-open the report.

              People

              • Assignee:
                elenst Elena Stepanova
                Reporter:
                gordan Gordan Bobic
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: