Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8147

Assertion `m_lock_type == 2' failed in handler::ha_close() during parallel replication

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.1, 10.0
    • Fix Version/s: 10.0.20, 10.1.5
    • Component/s: Replication
    • Labels:
      None

      Description

      Note: Unfortunately I can't check if the problem is reproducible on 10.0, because the binlog uses syntax introduced in 10.1 with table encryption.

      Stack trace is from 10.1 commit 8c54182e + patch http://lists.askmonty.org/pipermail/commits/2015-May/007819.html ; the binlog was produced on 10.1 commit 2f25c653 + the same patch + patch for MDEV-8045.

      sql/handler.cc:2579: int handler::ha_close(): Assertion `m_lock_type == 2' failed.
      150512 17:14:41 [ERROR] mysqld got signal 6 ;
      
      #6  0x00007f0d3fb15311 in *__GI___assert_fail (assertion=0x7f0d42e723ce "m_lock_type == 2", file=<optimized out>, line=2579, function=0x7f0d42e76170 "int handler::ha_close()") at assert.c:81
      #7  0x00007f0d4267c967 in handler::ha_close (this=0x7f0d14cae888) at 10.1-patched/sql/handler.cc:2579
      #8  0x00007f0d425344e4 in closefrm (table=0x7f0d0e82c070, free_share=false) at 10.1-patched/sql/table.cc:3012
      #9  0x00007f0d423e08b0 in close_temporary (table=0x7f0d0e82c070, free_share=true, delete_table=false) at 10.1-patched/sql/sql_base.cc:1812
      #10 0x00007f0d42566527 in Relay_log_info::close_temporary_tables (this=0x7f0d2d488c40) at 10.1-patched/sql/rpl_rli.cc:1069
      #11 0x00007f0d4276d5fa in Start_log_event_v3::do_apply_event (this=0x7f0d1409c1b0, rgi=0x7f0d14020800) at 10.1-patched/sql/log_event.cc:4728
      #12 0x00007f0d4276e217 in Format_description_log_event::do_apply_event (this=0x7f0d1409c1b0, rgi=0x7f0d14020800) at 10.1-patched/sql/log_event.cc:5103
      #13 0x00007f0d423b3a3a in Log_event::apply_event (this=0x7f0d1409c1b0, rgi=0x7f0d14020800) at 10.1-patched/sql/log_event.h:1347
      #14 0x00007f0d423a95af in apply_event_and_update_pos (ev=0x7f0d1409c1b0, thd=0x7f0d14049070, rgi=0x7f0d14020800, rpt=0x0) at 10.1-patched/sql/slave.cc:3274
      #15 0x00007f0d425e0843 in rpt_handle_event (qev=0x7f0d146d6570, rpt=0x0) at 10.1-patched/sql/rpl_parallel.cc:49
      #16 0x00007f0d425e635c in rpl_parallel::do_event (this=0x7f0d2d48bf40, serial_rgi=0x7f0d14020800, ev=0x7f0d1409c1b0, event_size=244) at 10.1-patched/sql/rpl_parallel.cc:2357
      #17 0x00007f0d423a9bbf in exec_relay_log_event (thd=0x7f0d14049070, rli=0x7f0d2d488c40, serial_rgi=0x7f0d14020800) at 10.1-patched/sql/slave.cc:3525
      #18 0x00007f0d423ad0d0 in handle_slave_sql (arg=0x7f0d2d487000) at 10.1-patched/sql/slave.cc:4677
      #19 0x00007f0d41b30b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
      #20 0x00007f0d3fbc595d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
      

      To reproduce:

      • start master with all default options + --log-bin=mysql-bin --server-id=1, with the attached mysql-bin.* files in the datadir;
      • start slave with all default options + --server-id=2 --plugin-load-add=file_key_management.so --file_key_management_filename=<your basedir>/mysql-test/std_data/keys.txt --slave_parallel_threads=8;
      • set up replication (either old-fashioned or GTID, doesn't matter);
      • wait.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            knielsen Kristian Nielsen added a comment - - edited

            Hm, unfortunately it did not reproduce for me in like 5 or 10 runs (the slave just replicated to completion without errors).
            EDIT: Ok, it reproduces fine for me. I was doing START SLAVE UNTIL <end of binlog>, but the assertion happens only on the first event in the following binlog

            Show
            knielsen Kristian Nielsen added a comment - - edited Hm, unfortunately it did not reproduce for me in like 5 or 10 runs (the slave just replicated to completion without errors). EDIT: Ok, it reproduces fine for me. I was doing START SLAVE UNTIL <end of binlog>, but the assertion happens only on the first event in the following binlog
            Hide
            knielsen Kristian Nielsen added a comment -

            Here is an MTR test case. It triggers the assertion reliably for me by
            adding the below sleep in the code.

            It looks like the wait_for_workers_idle() function returns too early. It
            does wait_for_prior_commit(), at which point all transactions are committed
            (at least written into binlog), but table locks may not yet be
            released. This allows the close_temporary_tables() for a restart FD event to
            run ahead and touch not yet closed temporary tables.

            --source include/have_partition.inc
            --source include/have_innodb.inc
            --source include/master-slave.inc
            
            --connection server_2
            SET @old_parallel_threads=@@GLOBAL.slave_parallel_threads;
            --source include/stop_slave.inc
            SET GLOBAL slave_parallel_threads=8;
            
            --connection server_1
            CREATE TABLE E (
            			pk INTEGER AUTO_INCREMENT,
            			col_int_nokey INTEGER /*! NULL */,
            			col_int_key INTEGER /*! NULL */,
            
            			col_date_key DATE /*! NULL */,
            			col_date_nokey DATE /*! NULL */,
            
            			col_time_key TIME /*! NULL */,
            			col_time_nokey TIME /*! NULL */,
            
            			col_datetime_key DATETIME /*! NULL */,
            			col_datetime_nokey DATETIME /*! NULL */,
            
            			col_varchar_key VARCHAR(1) /*! NULL */,
            			col_varchar_nokey VARCHAR(1) /*! NULL */,
            
            			PRIMARY KEY (pk),
            			KEY (col_int_key),
            			KEY (col_date_key),
            			KEY (col_time_key),
            			KEY (col_datetime_key),
            			KEY (col_varchar_key, col_int_key)
            		)  ENGINE=InnoDB;
            
            ALTER TABLE `E` PARTITION BY KEY() PARTITIONS 5;
            ALTER TABLE `E` REMOVE PARTITIONING;
            --write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
            wait
            EOF
            --shutdown_server 30
            --source include/wait_until_disconnected.inc
            --connection default
            --source include/wait_until_disconnected.inc
            --append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
            restart:
            EOF
            --enable_reconnect
            --source include/wait_until_connected_again.inc
            --connection server_1
            --enable_reconnect
            --source include/wait_until_connected_again.inc
            CREATE TABLE t1 (a INT PRIMARY KEY);
            --save_master_pos
            
            --connection server_2
            --source include/start_slave.inc
            --sync_with_master
            
            
            # Clean up.
            --connection server_2
            --source include/stop_slave.inc
            SET GLOBAL slave_parallel_threads=@old_parallel_threads;
            --source include/start_slave.inc
            
            --connection server_1
            DROP TABLE `E`;
            DROP TABLE t1;
            --source include/rpl_end.inc
            
            diff --git a/sql/sql_class.cc b/sql/sql_class.cc
            index 7de565d..fbb2716 100644
            --- a/sql/sql_class.cc
            +++ b/sql/sql_class.cc
            @@ -6994,6 +6994,7 @@ wait_for_commit::wakeup_subsequent_commits2(int wakeup_error)
                 a mutex), so no extra explicit barrier is needed here.
               */
               wakeup_subsequent_commits_running= false;
            +my_sleep(21000);
             }
             
             
            
            Show
            knielsen Kristian Nielsen added a comment - Here is an MTR test case. It triggers the assertion reliably for me by adding the below sleep in the code. It looks like the wait_for_workers_idle() function returns too early. It does wait_for_prior_commit(), at which point all transactions are committed (at least written into binlog), but table locks may not yet be released. This allows the close_temporary_tables() for a restart FD event to run ahead and touch not yet closed temporary tables. --source include/have_partition.inc --source include/have_innodb.inc --source include/master-slave.inc --connection server_2 SET @old_parallel_threads=@@GLOBAL.slave_parallel_threads; --source include/stop_slave.inc SET GLOBAL slave_parallel_threads=8; --connection server_1 CREATE TABLE E ( pk INTEGER AUTO_INCREMENT, col_int_nokey INTEGER /*! NULL */, col_int_key INTEGER /*! NULL */, col_date_key DATE /*! NULL */, col_date_nokey DATE /*! NULL */, col_time_key TIME /*! NULL */, col_time_nokey TIME /*! NULL */, col_datetime_key DATETIME /*! NULL */, col_datetime_nokey DATETIME /*! NULL */, col_varchar_key VARCHAR(1) /*! NULL */, col_varchar_nokey VARCHAR(1) /*! NULL */, PRIMARY KEY (pk), KEY (col_int_key), KEY (col_date_key), KEY (col_time_key), KEY (col_datetime_key), KEY (col_varchar_key, col_int_key) ) ENGINE=InnoDB; ALTER TABLE `E` PARTITION BY KEY() PARTITIONS 5; ALTER TABLE `E` REMOVE PARTITIONING; --write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect wait EOF --shutdown_server 30 --source include/wait_until_disconnected.inc --connection default --source include/wait_until_disconnected.inc --append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect restart: EOF --enable_reconnect --source include/wait_until_connected_again.inc --connection server_1 --enable_reconnect --source include/wait_until_connected_again.inc CREATE TABLE t1 (a INT PRIMARY KEY); --save_master_pos --connection server_2 --source include/start_slave.inc --sync_with_master # Clean up. --connection server_2 --source include/stop_slave.inc SET GLOBAL slave_parallel_threads=@old_parallel_threads; --source include/start_slave.inc --connection server_1 DROP TABLE `E`; DROP TABLE t1; --source include/rpl_end.inc diff --git a/sql/sql_class.cc b/sql/sql_class.cc index 7de565d..fbb2716 100644 --- a/sql/sql_class.cc +++ b/sql/sql_class.cc @@ -6994,6 +6994,7 @@ wait_for_commit::wakeup_subsequent_commits2(int wakeup_error) a mutex), so no extra explicit barrier is needed here. */ wakeup_subsequent_commits_running= false; +my_sleep(21000); }
            Show
            knielsen Kristian Nielsen added a comment - http://lists.askmonty.org/pipermail/commits/2015-May/007945.html
            Hide
            knielsen Kristian Nielsen added a comment -

            Pushed to 10.0 and 10.1

            Show
            knielsen Kristian Nielsen added a comment - Pushed to 10.0 and 10.1

              People

              • Assignee:
                knielsen Kristian Nielsen
                Reporter:
                elenst Elena Stepanova
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: