Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6451

"Error 'Table 't1' already exists' on query" with slave_ddl_exec_mode=IDEMPOTENT

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.0.12
    • Fix Version/s: 10.0.13
    • Component/s: None
    • Labels:
      None

      Description

      Hi!

      I set up a simple replication topology with 3 hosts, all running MariaDB 10.0.12.

      node1 <--> node2
        \         /
         \       /
          \     /
           node3
      

      node1 and node2 are in active master-master with each other.

      node3 is set up in one-way multi-source from node1 and node2.

      Replication is set up using old-style binlog positions (GTIDs are not used for replication).

      I execute a CREATE TABLE statement on node1, which is replicated to node2 as expected and logged to its binary log due to log-slave-updates.

      node3 receives this CREATE TABLE event from both streams. node3 is running with gtid_ignore_duplicates=OFF, so it tries executing the CREATE TABLE event both times. I expect this.

      However, I am surprised to see this:

                     Last_SQL_Error: Error 'Table 't1' already exists' on query. Default database: 'test'. Query: 'create table t1 (id int unsigned not null auto_increment primary key, server_id int) comment='created on node1''
      

      If I do START ALL SLAVES on node3, the CREATE TABLE statement executes without any errors. I then see two copies of this event in the node3's binary log (node3 has log-slave-updates enabled).

      Apparently, if I execute subsequent CREATE TABLE statements on node1 or node2, they are executed twice on the slaves without any intervention.

      So, it seems that the first CREATE TABLE fails but subsequent create tables execute using slave_ddl_exec_mode=IDEMPOTENT behavior?

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            elenst Elena Stepanova added a comment -

            I could reproduce it manually reliably, but when it's put in MTR, it turns out to be weirdly sporadic. I need to look more at it.

            Show
            elenst Elena Stepanova added a comment - I could reproduce it manually reliably, but when it's put in MTR, it turns out to be weirdly sporadic. I need to look more at it.
            Hide
            elenst Elena Stepanova added a comment - - edited

            So, when the scenario is run manually, the problem is sporadic too.
            Apparently, there is a race condition between two slave SQL threads executing CREATE nearly simultaneously, caused by the specific (maybe erroneous) locking algorithm when it comes to converting CREATE to CREATE OR REPLACE inside the slave with IDEMPOTENT DDL.

            I wasn't able to produce a deterministic test case because I cannot trigger sync points inside slave SQL threads, and running regular CREATE OR REPLACE statements doesn't seem to cause the same problem since it's processed differently.

            Below is a crude MTR test case to reproduce the problem (not to be included into the suite!).

            
            --let $rpl_topology=1->2->3
            --source include/rpl_init.inc
            --source include/have_binlog_format_statement.inc
            
            --enable_connect_log
            
            connection server_3;
            --source include/stop_slave.inc
            reset slave all;
            eval change master 'm1' to master_host='127.0.0.1', master_port=$SERVER_MYPORT_1, master_user='root';
            eval change master 'm2' to master_host='127.0.0.1', master_port=$SERVER_MYPORT_2, master_user='root';
            start all slaves;
            
            connection server_1;
            
            let $count = 100;
            while ($count)
            {
            
            	connection server_1;
            	eval create table t$count (i int);
            	save_master_pos;
            
            	connection server_2;
            	sync_with_master;
            
            	connection server_3;
            	sync_with_master 0, 'm1';
            
            	connection server_2;
            	save_master_pos;
            
            	connection server_3;
            	sync_with_master 0, 'm2';
            
            	dec $count;
            }
            
            connection server_1;
            drop database test;
            create database test;
            
            --source include/rpl_end.inc
            
            Show
            elenst Elena Stepanova added a comment - - edited So, when the scenario is run manually, the problem is sporadic too. Apparently, there is a race condition between two slave SQL threads executing CREATE nearly simultaneously, caused by the specific (maybe erroneous) locking algorithm when it comes to converting CREATE to CREATE OR REPLACE inside the slave with IDEMPOTENT DDL. I wasn't able to produce a deterministic test case because I cannot trigger sync points inside slave SQL threads, and running regular CREATE OR REPLACE statements doesn't seem to cause the same problem since it's processed differently. Below is a crude MTR test case to reproduce the problem (not to be included into the suite!). --let $rpl_topology=1->2->3 --source include/rpl_init.inc --source include/have_binlog_format_statement.inc --enable_connect_log connection server_3; --source include/stop_slave.inc reset slave all; eval change master 'm1' to master_host='127.0.0.1', master_port=$SERVER_MYPORT_1, master_user='root'; eval change master 'm2' to master_host='127.0.0.1', master_port=$SERVER_MYPORT_2, master_user='root'; start all slaves; connection server_1; let $count = 100; while ($count) { connection server_1; eval create table t$count (i int); save_master_pos; connection server_2; sync_with_master; connection server_3; sync_with_master 0, 'm1'; connection server_2; save_master_pos; connection server_3; sync_with_master 0, 'm2'; dec $count; } connection server_1; drop database test; create database test; --source include/rpl_end.inc
            Hide
            monty Michael Widenius added a comment -

            Hi!

            I was not able to repeat the problem.
            How should the above program fail ?

            I was trying to test this in 10.0, but noticed that the required test.cnf file was missing.

            I created the following file mysql-test/suite/rpl/t/skr.cnf

            !include suite/rpl/my.cnf

            [mysqld.1]
            log-slave-updates
            loose-innodb

            [mysqld.2]
            log-slave-updates
            loose-innodb
            replicate-ignore-table=db.t2

            [mysqld.3]
            log-slave-updates
            loose-innodb

            [ENV]
            SERVER_MYPORT_3= @mysqld.3.port
            SERVER_MYSOCK_3= @mysqld.3.socket

            When I run the test case as:

            mysql-test-run rpl.skr

            This crashed in rpl_end with:
            t line 149: sync_slave_with_master failed: 'select master_pos_wait('slave-bin.000001', 13044, 300, '')' returned NULL indicating slave SQL thread failure

            However, I don't have any problems with create table failing (as far as I can see)

            Show
            monty Michael Widenius added a comment - Hi! I was not able to repeat the problem. How should the above program fail ? I was trying to test this in 10.0, but noticed that the required test.cnf file was missing. I created the following file mysql-test/suite/rpl/t/skr.cnf !include suite/rpl/my.cnf [mysqld.1] log-slave-updates loose-innodb [mysqld.2] log-slave-updates loose-innodb replicate-ignore-table=db.t2 [mysqld.3] log-slave-updates loose-innodb [ENV] SERVER_MYPORT_3= @mysqld.3.port SERVER_MYSOCK_3= @mysqld.3.socket When I run the test case as: mysql-test-run rpl.skr This crashed in rpl_end with: t line 149: sync_slave_with_master failed: 'select master_pos_wait('slave-bin.000001', 13044, 300, '')' returned NULL indicating slave SQL thread failure However, I don't have any problems with create table failing (as far as I can see)
            Hide
            elenst Elena Stepanova added a comment -

            Here is my cnf file:

            !include suite/rpl/my.cnf
            
            [mysqld.1]
            log-slave-updates
            loose-innodb
            
            [mysqld.2]
            log-slave-updates
            loose-innodb
            
            [mysqld.3]
            log-slave-updates
            loose-innodb
            
            [ENV]
            SERVER_MYPORT_3=                @mysqld.3.port
            SERVER_MYSOCK_3=                @mysqld.3.socket
            

            For me, the test also fails like this:

            analyze: sync_with_master
            mysqltest: At line 37: sync_with_master failed: 'select master_pos_wait('master-bin.000001', 689, 300, 'm1')' returned NULL indicating slave SQL thread failure
            

            And the error logs shows this:

            140730 17:59:22 [ERROR] Master 'm1': Slave SQL: Error 'Table 't98' already exists' on query. Default database: 'test'. Query: 'create table t98 (i int)', Gtid 0-1-3, Internal MariaDB error code: 1050
            140730 17:59:22 [Warning] Master 'm1': Slave: Table 't98' already exists Error_code: 1050
            140730 17:59:22 [ERROR] Master 'm1': Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 564
            
            Show
            elenst Elena Stepanova added a comment - Here is my cnf file: !include suite/rpl/my.cnf [mysqld.1] log-slave-updates loose-innodb [mysqld.2] log-slave-updates loose-innodb [mysqld.3] log-slave-updates loose-innodb [ENV] SERVER_MYPORT_3= @mysqld.3.port SERVER_MYSOCK_3= @mysqld.3.socket For me, the test also fails like this: analyze: sync_with_master mysqltest: At line 37: sync_with_master failed: 'select master_pos_wait('master-bin.000001', 689, 300, 'm1')' returned NULL indicating slave SQL thread failure And the error logs shows this: 140730 17:59:22 [ERROR] Master 'm1': Slave SQL: Error 'Table 't98' already exists' on query. Default database: 'test'. Query: 'create table t98 (i int)', Gtid 0-1-3, Internal MariaDB error code: 1050 140730 17:59:22 [Warning] Master 'm1': Slave: Table 't98' already exists Error_code: 1050 140730 17:59:22 [ERROR] Master 'm1': Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 564
            Hide
            monty Michael Widenius added a comment -

            The race condition was in lock_table_names() which failed if we tried to do a create table withing the replication thread if the table was in use.

            Fixed by propagating the REPLACE flag to lock_table_names().

            Show
            monty Michael Widenius added a comment - The race condition was in lock_table_names() which failed if we tried to do a create table withing the replication thread if the table was in use. Fixed by propagating the REPLACE flag to lock_table_names().
            Hide
            monty Michael Widenius added a comment -

            Pushed into 10.0 tree

            Show
            monty Michael Widenius added a comment - Pushed into 10.0 tree
            Hide
            ondohotola Dr Eberhard W Lisse added a comment -

            Hi

            I am encountering the same issue but with a VIEW.

            Master is Server version: 10.0.13-MariaDB-log Homebrew
            on OS X 10.10.3

            Slave is Server version: 10.0.19-MariaDB-1~precise-log mariadb.org binary distribution
            on Ubuntu 14.04.1 LTS

            Jun 13 16:54:49 whktest mysqld: 150613 16:54:49 [Warning] Slave: Table
            'listview1' already exists Error_code: 1050
            Jun 13 16:54:49 whktest mysqld: 150613 16:54:49 [ERROR] Error running
            query, slave SQL thread aborted. Fix the problem, and restart the slave
            SQL thread with "SLAVE START". We stopped at log 'master-bin.000002'
            position 55904540

            The statement causing this is

            CREATE ALGORITHM = UNDEFINED
            DEFINER = `xxxxxxxx`@`localhost`
            SQL SECURITY DEFINER
            VIEW `listview1`
            AS
            SELECT id ,
            from_fld ,
            to_fld ,
            subj_fld ,
            date_fld ,
            mailbox ,
            body_ fld ,
            numAttach ,
            attachNames,
            attachText ,
            headings
            FROM email
            WHERE id IN(197854,
            197853,
            197852)

            Stepping over the statement does not work.

            greetings, el

            Show
            ondohotola Dr Eberhard W Lisse added a comment - Hi I am encountering the same issue but with a VIEW. Master is Server version: 10.0.13-MariaDB-log Homebrew on OS X 10.10.3 Slave is Server version: 10.0.19-MariaDB-1~precise-log mariadb.org binary distribution on Ubuntu 14.04.1 LTS Jun 13 16:54:49 whktest mysqld: 150613 16:54:49 [Warning] Slave: Table 'listview1' already exists Error_code: 1050 Jun 13 16:54:49 whktest mysqld: 150613 16:54:49 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000002' position 55904540 The statement causing this is CREATE ALGORITHM = UNDEFINED DEFINER = `xxxxxxxx`@`localhost` SQL SECURITY DEFINER VIEW `listview1` AS SELECT id , from_fld , to_fld , subj_fld , date_fld , mailbox , body_ fld , numAttach , attachNames, attachText , headings FROM email WHERE id IN(197854, 197853, 197852) Stepping over the statement does not work. greetings, el
            Hide
            elenst Elena Stepanova added a comment -

            Dr Eberhard W Lisse,

            I am encountering the same issue but with a VIEW

            By "the same issue" do you mean that you also have two masters executing the statement, and observe the error despite slave_ddl_exec_mode=IDEMPOTENT?
            Or do you just mean that you are getting replication failure with a similar error text?

            This bug report was about a very specific situation. The conflict was expected to happen in the given setup, but it was supposed to be suppressed by the slave configuration.

            If you mean the same – that you indeed create a duplicate view, but don't expect it to cause a problem due to IDEMPOTENT mode, then I think the expectation is not valid, according to the variable description it does not affect views.

            If you just mean that you are getting the error when you think there should be no conflict, we will need much more information to find out why it might be happening, and it's not related to this bug report. Please create another JIRA issue, attach your binary logs (or their fragments around this view creation, the wider the better), cnf files and error logs.

            Show
            elenst Elena Stepanova added a comment - Dr Eberhard W Lisse , I am encountering the same issue but with a VIEW By "the same issue" do you mean that you also have two masters executing the statement, and observe the error despite slave_ddl_exec_mode=IDEMPOTENT? Or do you just mean that you are getting replication failure with a similar error text? This bug report was about a very specific situation. The conflict was expected to happen in the given setup, but it was supposed to be suppressed by the slave configuration. If you mean the same – that you indeed create a duplicate view, but don't expect it to cause a problem due to IDEMPOTENT mode, then I think the expectation is not valid, according to the variable description it does not affect views. If you just mean that you are getting the error when you think there should be no conflict, we will need much more information to find out why it might be happening, and it's not related to this bug report. Please create another JIRA issue, attach your binary logs (or their fragments around this view creation, the wider the better), cnf files and error logs.

              People

              • Assignee:
                monty Michael Widenius
                Reporter:
                kolbe Kolbe Kegel
              • Votes:
                1 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: