Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8075

NEED REVIEW: DROP TEMPORARY TABLE not marked as ddl, causing optimistic parallel replication to fail

    Details

    • Type: Bug
    • Status: Stalled
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 10.1
    • Fix Version/s: 10.1
    • Component/s: Replication
    • Labels:
      None

      Description

      Note: I am only getting the failure with optimistic mode, although there is no guarantee it never happens in other circumstances.

      Stack trace from 10.1 commit 46816996
      10.1/sql/sql_string.h:207: char* String::c_ptr(): Assertion `!alloced || !Ptr || !Alloced_length || (Alloced_length >= (str_length + 1))' failed.
      150428 21:01:39 [ERROR] mysqld got signal 6 ;
      
      #6  0x00007f2216eab311 in *__GI___assert_fail (assertion=0x7f221a0ff4b0 "!alloced || !Ptr || !Alloced_length || (Alloced_length >= (str_length + 1))", file=<optimized out>, line=207, function=0x7f221a103160 "char* String::c_ptr()") at assert.c:81
      #7  0x00007f2219768e2d in String::c_ptr (this=0x7f21ec210158) at 10.1/sql/sql_string.h:206
      #8  0x00007f2219777c77 in open_temporary_table (thd=0x7f21f0423070, tl=0x7f21f045f0f8) at 10.1/sql/sql_base.cc:5972
      #9  0x00007f2219777db9 in open_temporary_tables (thd=0x7f21f0423070, tl_list=0x7f21f045f0f8) at 10.1/sql/sql_base.cc:6017
      #10 0x00007f22197d8632 in mysql_execute_command (thd=0x7f21f0423070) at 10.1/sql/sql_parse.cc:2842
      #11 0x00007f22197e5c1e in mysql_parse (thd=0x7f21f0423070, rawbuf=0x7f21e9da63f9 "UPDATE table_1 SET `col_int_key` = 6", length=36, parser_state=0x7f2216e7ea60) at 10.1/sql/sql_parse.cc:7165
      #12 0x00007f2219afb890 in Query_log_event::do_apply_event (this=0x7f21e9e10c70, rgi=0x7f21e9dec800, query_arg=0x7f21e9da63f9 "UPDATE table_1 SET `col_int_key` = 6", q_len_arg=36) at 10.1/sql/log_event.cc:4287
      #13 0x00007f2219afab15 in Query_log_event::do_apply_event (this=0x7f21e9e10c70, rgi=0x7f21e9dec800) at 10.1/sql/log_event.cc:4013
      #14 0x00007f22197432c0 in Log_event::apply_event (this=0x7f21e9e10c70, rgi=0x7f21e9dec800) at 10.1/sql/log_event.h:1347
      #15 0x00007f2219738e1b in apply_event_and_update_pos (ev=0x7f21e9e10c70, thd=0x7f21f0423070, rgi=0x7f21e9dec800, rpt=0x7f21f3ac3620) at 10.1/sql/slave.cc:3274
      #16 0x00007f221996fd13 in rpt_handle_event (qev=0x7f21e9e09170, rpt=0x7f21f3ac3620) at 10.1/sql/rpl_parallel.cc:49
      #17 0x00007f2219971fbc in handle_rpl_parallel_thread (arg=0x7f21f3ac3620) at 10.1/sql/rpl_parallel.cc:942
      #18 0x00007f2218ec6b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
      #19 0x00007f2216f5b95d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
      

      To reproduce, I

      • start master on a clean datadir, with --server-id=1 --log-bin=mysql-bin, with the attached mysql-bin.* files placed into the datadir;
      • start slave on a clean datadir with --server-id=2 --slave-parallel-mode=optimistic --slave-parallel-threads=8;
      • run change master to master_host='127.0.0.1', master_port=3306, master_user='root'; start slave on the slave;
      • wait.

      Takes a few minutes, crashes every time for me.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            knielsen Kristian Nielsen added a comment - - edited

            This is the binlog around where it fails (the assertion occurs during
            replication of GTID 0-1-3605):

            # at 770490
            #150427 20:41:36 server id 1  end_log_pos 770528        GTID 0-1-3604
            DROP /*!40005 TEMPORARY */ TABLE IF EXISTS `table_2`,`view_table10_innodb`,`table11_innodb`,`view_table11_innodb`,`table_1`
            
            # at 770714
            #150427 20:41:36 server id 1  end_log_pos 770752        GTID 0-1-3605 trans
            BEGIN
            UPDATE table_1 SET `col_int_key` = 6
            CREATE TEMPORARY TABLE table_2 ( `col_int_key` INTEGER, KEY (`col_int_key`) ) SELECT /* QUERY_ID: 10265600 */ /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ func_3 ( 4 ) AS `col_int_key` FROM `table11_innodb`
            ROLLBACK
            

            I think what happens is that in GTID 0-1-3604 we drop a temporary table
            `table_1` that shadows the global `table_1`.

            So when we speculatively execute 0-1-3605, we get the wrong table (the
            temporary table that is supposed to be dropped in 0-1-3604), not the global
            one.

            (And even worse, we then wait for 0-1-3604 to drop the temporary table,
            after which we try to use it - causing the assertion seen).

            So this seems to show that DROP TEMPORARY TABLE needs to be marked as "ddl"
            in the binlog, as speculative apply is not safe after it - just like any
            other DDL statement.

            A standalone DROP TEMPORARY TABLE is marked as ddl already, but there are
            two more cases that do not. One is a DROP TEMPORARY inside a
            transaction. The other is the implicit DROP TEMPORARY TABLE that is done
            when a client connection is closed.

            Show
            knielsen Kristian Nielsen added a comment - - edited This is the binlog around where it fails (the assertion occurs during replication of GTID 0-1-3605): # at 770490 #150427 20:41:36 server id 1 end_log_pos 770528 GTID 0-1-3604 DROP /*!40005 TEMPORARY */ TABLE IF EXISTS `table_2`,`view_table10_innodb`,`table11_innodb`,`view_table11_innodb`,`table_1` # at 770714 #150427 20:41:36 server id 1 end_log_pos 770752 GTID 0-1-3605 trans BEGIN UPDATE table_1 SET `col_int_key` = 6 CREATE TEMPORARY TABLE table_2 ( `col_int_key` INTEGER, KEY (`col_int_key`) ) SELECT /* QUERY_ID: 10265600 */ /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ func_3 ( 4 ) AS `col_int_key` FROM `table11_innodb` ROLLBACK I think what happens is that in GTID 0-1-3604 we drop a temporary table `table_1` that shadows the global `table_1`. So when we speculatively execute 0-1-3605, we get the wrong table (the temporary table that is supposed to be dropped in 0-1-3604), not the global one. (And even worse, we then wait for 0-1-3604 to drop the temporary table, after which we try to use it - causing the assertion seen). So this seems to show that DROP TEMPORARY TABLE needs to be marked as "ddl" in the binlog, as speculative apply is not safe after it - just like any other DDL statement. A standalone DROP TEMPORARY TABLE is marked as ddl already, but there are two more cases that do not. One is a DROP TEMPORARY inside a transaction. The other is the implicit DROP TEMPORARY TABLE that is done when a client connection is closed.
            Show
            knielsen Kristian Nielsen added a comment - http://lists.askmonty.org/pipermail/commits/2015-May/007819.html
            Hide
            knielsen Kristian Nielsen added a comment -

            I discovered a problem with the patch. Consider queries like these:

            begin; insert into t1 values (8); drop table t4;
            

            A transaction that is implicitly committed by a following DDL. With the
            patch, the transaction is errorneously marked as ddl, even though it is not.

            I will come up with a new patch that fixes this ...

            Show
            knielsen Kristian Nielsen added a comment - I discovered a problem with the patch. Consider queries like these: begin; insert into t1 values (8); drop table t4; A transaction that is implicitly committed by a following DDL. With the patch, the transaction is errorneously marked as ddl, even though it is not. I will come up with a new patch that fixes this ...
            Show
            knielsen Kristian Nielsen added a comment - https://lists.launchpad.net/maria-developers/msg08571.html

              People

              • Assignee:
                knielsen Kristian Nielsen
                Reporter:
                elenst Elena Stepanova
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: