Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4182

Galera: safe_mutex: Found wrong usage of mutex 'LOCK_thd_data' and 'LOCK_wsrep_thd', deadlock on DDL

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.5.33a-galera
    • Component/s: None
    • Labels:

      Description

      It might well be some other Galera limitation, but I haven't found which particular one it could be. The flow is mainly HANDLER flow, and also some DML and ALTER TABLE.

      I execute it on two nodes in a 3-node cluster, 1 thread per node. Soon after start, the following message appears in the server error log(s):

      safe_mutex: Found wrong usage of mutex 'LOCK_thd_data' and 'LOCK_wsrep_thd'
      Mutex currently locked (in reverse order):
      LOCK_wsrep_thd                    maria-5.5-galera/sql/sql_class.cc  line 858
      LOCK_thd_data                     maria-5.5-galera/sql/sql_base.cc  line 9348
      rwlock->lock                      maria-5.5-galera/mysys/thr_rwlock.c  line 392
      

      Either immediately or soon afterwards, both nodes stop, at least one on ALTER table, and another one on ALTER or on a DML statement.

      ALTER from a client on one of nodes is shown as Killed, but doesn't disappear from the processlist. ALTER from a replication thread is waiting for a lock:

      +----+-------------+-----------------+------+---------+------+---------------------------------+----------------------------------------------------------------------------------------------+----------+
      | Id | User        | Host            | db   | Command | Time | State                           | Info                                                                                         | Progress |
      +----+-------------+-----------------+------+---------+------+---------------------------------+----------------------------------------------------------------------------------------------+----------+
      |  1 | system user |                 | test | Sleep   |  484 | Waiting for table metadata lock | ALTER TABLE `table1000_innodb_int_autoinc` DROP COLUMN filler                                |    0.000 |
      |  2 | system user |                 | NULL | Sleep   |  566 | wsrep aborter idle              | NULL                                                                                         |    0.000 |
      |  6 | root        | localhost:43415 | test | Killed  |  484 | checking permissions            | ALTER TABLE `table1000_innodb_int_autoinc` ADD COLUMN filler VARCHAR(255) DEFAULT ' filler ' |    0.000 |
      | 11 | root        | localhost:43428 | test | Query   |    0 | sleeping                        | show full processlist                                                                        |    0.000 |
      +----+-------------+-----------------+------+---------+------+---------------------------------+----------------------------------------------------------------------------------------------+----------+
      

      Another node is also waiting for something:

      +----+-------------+-----------------+------+---------+------+--------------------+--------------------------------------------------------------------+----------+
      | Id | User        | Host            | db   | Command | Time | State              | Info                                                               | Progress |
      +----+-------------+-----------------+------+---------+------+--------------------+--------------------------------------------------------------------+----------+
      |  1 | system user |                 | NULL | Sleep   |  681 | wsrep aborter idle | NULL                                                               |    0.000 |
      |  2 | system user |                 | NULL | Sleep   |  557 | committed 91       | NULL                                                               |    0.000 |
      | 10 | root        | localhost:41929 | test | Query   |  543 | query end          | UPDATE IGNORE `table1000_innodb_int_autoinc` SET `col_int_key` = 0 |    0.000 |
      | 18 | root        | localhost:41947 | test | Query   |    0 | sleeping           | show full processlist                                              |    0.000 |
      +----+-------------+-----------------+------+---------+------+--------------------+--------------------------------------------------------------------+----------+
      

      Selected threads:

      Thread 2 (Thread 0x7f8abc061700 (LWP 20603)):
      #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
      #1  0x00007f8ac7c2f7de in _gcs_sm_enqueue_common (cond=0x7f8abc05e980, sm=0x7f8ac9c0e010) at gcs/src/gcs_sm.h:160
      #2  gcs_sm_enter (scheduled=<optimized out>, cond=0x7f8abc05e980, sm=0x7f8ac9c0e010) at gcs/src/gcs_sm.h:238
      #3  gcs_repl (conn=0x3c29490, act=0x7f8abc05ecf0, scheduled=<optimized out>) at gcs/src/gcs.c:1490
      #4  0x00007f8ac7c81b1c in repl (scheduled=true, act=..., this=0x3c23a00) at galera/src/gcs.hpp:106
      #5  galera::ReplicatorSMM::replicate (this=0x3c23510, trx=0x7f8a6c01c800) at galera/src/replicator_smm.cpp:552
      #6  0x00007f8ac7c993ff in galera_pre_commit (gh=<optimized out>, conn_id=<optimized out>, trx_handle=<optimized out>, rbr_data=0x7f8a6c1ce620, rbr_data_len=11293, flags=1, global_seqno=0x504a1a8) at galera/src/wsrep_provider.cpp:388
      #7  0x0000000000771833 in wsrep_run_wsrep_commit (thd=0x5047dc0, hton=0x3c49160, all=false) at maria-5.5-galera/sql/wsrep_hton.cc:338
      #8  0x0000000000770685 in wsrep_prepare (hton=0x3c49160, thd=0x5047dc0, all=false) at maria-5.5-galera/sql/wsrep_hton.cc:119
      #9  0x00000000007e7988 in ha_commit_trans (thd=0x5047dc0, all=false) at maria-5.5-galera/sql/handler.cc:1294
      #10 0x0000000000730221 in trans_commit_stmt (thd=0x5047dc0) at maria-5.5-galera/sql/transaction.cc:325
      #11 0x0000000000621031 in mysql_execute_command (thd=0x5047dc0) at maria-5.5-galera/sql/sql_parse.cc:4903
      #12 0x0000000000624da9 in mysql_parse (thd=0x5047dc0, rawbuf=0x7f8a6c00e078 "UPDATE IGNORE `table1000_innodb_int_autoinc` SET `col_int_key` = 0", length=66, parser_state=0x7f8abc060550) at maria-5.5-galera/sql/sql_parse.cc:6304
      #13 0x0000000000623e70 in wsrep_mysql_parse (thd=0x5047dc0, rawbuf=0x7f8a6c00e078 "UPDATE IGNORE `table1000_innodb_int_autoinc` SET `col_int_key` = 0", length=66, parser_state=0x7f8abc060550) at maria-5.5-galera/sql/sql_parse.cc:6069
      #14 0x0000000000616335 in dispatch_command (command=COM_QUERY, thd=0x5047dc0, packet=0x4fed231 "UPDATE IGNORE `table1000_innodb_int_autoinc` SET `col_int_key` = 0 ", packet_length=67) at maria-5.5-galera/sql/sql_parse.cc:1245
      #15 0x0000000000615114 in do_command (thd=0x5047dc0) at maria-5.5-galera/sql/sql_parse.cc:891
      #16 0x000000000071ddd7 in do_handle_one_connection (thd_arg=0x5047dc0) at maria-5.5-galera/sql/sql_connect.cc:1291
      #17 0x000000000071d7af in handle_one_connection (arg=0x5047dc0) at maria-5.5-galera/sql/sql_connect.cc:1199
      #18 0x00007f8ac92cce9a in start_thread (arg=0x7f8abc061700) at pthread_create.c:308
      #19 0x00007f8ac89fdcbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
      #20 0x0000000000000000 in ?? ()
      
      Thread 21 (Thread 0x7f37490d6700 (LWP 20387)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215
      #1  0x0000000000cfcb90 in safe_cond_timedwait (cond=0x7f3740000c88, mp=0x7f3740000be0, abstime=0x7f37490d03c0, file=0xd9a3c8 "maria-5.5-galera/sql/mdl.cc", line=1202) at maria-5.5-galera/mysys/thr_mutex.c:547
      #2  0x0000000000725fc1 in inline_mysql_cond_timedwait (that=0x7f3740000c88, mutex=0x7f3740000be0, abstime=0x7f37490d03c0, src_file=0xd9a3c8 "maria-5.5-galera/sql/mdl.cc", src_line=1202) at maria-5.5-galera/include/mysql/psi/mysql_thread.h:1018
      #3  0x0000000000727095 in MDL_wait::timed_wait (this=0x7f3740000be0, thd=0x7f3740000af0, abs_timeout=0x7f37490d03c0, set_status_on_timeout=false, wait_state_name=0xd9a2e8 "Waiting for table metadata lock") at maria-5.5-galera/sql/mdl.cc:1202
      #4  0x0000000000728bc0 in MDL_context::acquire_lock (this=0x7f3740000be0, mdl_request=0x7f37490d0480, lock_wait_timeout=31536000) at maria-5.5-galera/sql/mdl.cc:2209
      #5  0x000000000072929a in MDL_context::upgrade_shared_lock_to_exclusive (this=0x7f3740000be0, mdl_ticket=0x7f3740085840, lock_wait_timeout=31536000) at maria-5.5-galera/sql/mdl.cc:2389
      #6  0x00000000005b7f1a in wait_while_table_is_used (thd=0x7f3740000af0, table=0x7f37400435a0, function=HA_EXTRA_PREPARE_FOR_RENAME) at maria-5.5-galera/sql/sql_base.cc:2347
      #7  0x00000000006bf67f in mysql_alter_table (thd=0x7f3740000af0, new_db=0x7f374000ced0 "test", new_name=0x7f374000c868 "table1000_innodb_int_autoinc", create_info=0x7f37490d1f90, table_list=0x7f374000c908, alter_info=0x7f37490d2090, order_num=0, order=0x0, ignore=false, require_online=false) at maria-5.5-galera/sql/sql_table.cc:7008
      #8  0x000000000095e20e in Alter_table_statement::execute (this=0x7f374000cf00, thd=0x7f3740000af0) at maria-5.5-galera/sql/sql_alter.cc:122
      #9  0x0000000000620c93 in mysql_execute_command (thd=0x7f3740000af0) at maria-5.5-galera/sql/sql_parse.cc:4844
      #10 0x0000000000624da9 in mysql_parse (thd=0x7f3740000af0, rawbuf=0x7f374004ec39 "ALTER TABLE `table1000_innodb_int_autoinc` DROP COLUMN filler", length=61, parser_state=0x7f37490d2e80) at maria-5.5-galera/sql/sql_parse.cc:6304
      #11 0x00000000008c4b01 in Query_log_event::do_apply_event (this=0x7f3740044be0, rli=0x7f3740010aa0, query_arg=0x7f374004ec39 "ALTER TABLE `table1000_innodb_int_autoinc` DROP COLUMN filler", q_len_arg=61) at maria-5.5-galera/sql/log_event.cc:3792
      #12 0x00000000008c404f in Query_log_event::do_apply_event (this=0x7f3740044be0, rli=0x7f3740010aa0) at maria-5.5-galera/sql/log_event.cc:3531
      #13 0x0000000000593196 in Log_event::apply_event (this=0x7f3740044be0, rli=0x7f3740010aa0) at maria-5.5-galera/sql/log_event.h:1230
      #14 0x000000000062948d in wsrep_apply_rbr (thd=0x7f3740000af0, rbr_buf=0x7f374001f170 "\350\\\"Q\002", buf_len=0) at maria-5.5-galera/sql/sql_parse.cc:8131
      #15 0x0000000000629a79 in wsrep_apply_cb (ctx=0x7f3740000af0, buf=0x7f374001f170, buf_len=129, global_seqno=90) at maria-5.5-galera/sql/sql_parse.cc:8210
      #16 0x00007f375431eabf in apply_wscoll (trx=..., apply_cb=0x6299cb <wsrep_apply_cb(void*, void const*, unsigned long, long)>, recv_ctx=0x7f3740000af0) at galera/src/replicator_smm.cpp:37
      #17 apply_trx_ws (recv_ctx=0x7f3740000af0, apply_cb=0x6299cb <wsrep_apply_cb(void*, void const*, unsigned long, long)>, commit_cb=0x629cd6 <wsrep_commit_cb(void*, long, bool)>, trx=...) at galera/src/replicator_smm.cpp:81
      #18 0x00007f3754324030 in galera::ReplicatorSMM::apply_trx (this=<optimized out>, recv_ctx=0x7f3740000af0, trx=0x7f3740022ff0) at galera/src/replicator_smm.cpp:470
      #19 0x00007f37543247c0 in galera::ReplicatorSMM::process_trx (this=0x367f540, recv_ctx=0x7f3740000af0, trx=0x7f3740022ff0) at galera/src/replicator_smm.cpp:1056
      #20 0x00007f3754301d34 in galera::GcsActionSource::dispatch (this=0x367fac8, recv_ctx=0x7f3740000af0, act=...) at galera/src/gcs_action_source.cpp:110
      #21 0x00007f37543029b3 in galera::GcsActionSource::process (this=0x367fac8, recv_ctx=0x7f3740000af0) at galera/src/gcs_action_source.cpp:168
      #22 0x00007f375432637d in async_recv (recv_ctx=0x7f3740000af0, this=0x367f540) at galera/src/replicator_smm.cpp:375
      #23 galera::ReplicatorSMM::async_recv (this=0x367f540, recv_ctx=0x7f3740000af0) at galera/src/replicator_smm.cpp:356
      #24 0x00007f3754339cae in galera_recv (gh=<optimized out>, recv_ctx=<optimized out>) at galera/src/wsrep_provider.cpp:201
      #25 0x0000000000629ff0 in wsrep_replication_process (thd=0x7f3740000af0) at maria-5.5-galera/sql/sql_parse.cc:8363
      #26 0x00000000005693ff in start_wsrep_THD (arg=0x629f6d) at maria-5.5-galera/sql/mysqld.cc:4832
      #27 0x00007f375596ee9a in start_thread (arg=0x7f37490d6700) at pthread_create.c:308
      #28 0x00007f375509fcbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
      #29 0x0000000000000000 in ?? ()
      
      Thread 2 (Thread 0x7f374810e700 (LWP 20641)):
      #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
      #1  0x00007f3754320985 in wait (cond=..., this=<optimized out>) at galerautils/src/gu_lock.hpp:56
      #2  galera::Monitor<galera::ReplicatorSMM::CommitOrder>::enter (this=0x367ffd8, obj=...) at galera/src/monitor.hpp:126
      #3  0x00007f3754322584 in galera::ReplicatorSMM::to_isolation_begin (this=0x367f540, trx=0x7f3708263200) at galera/src/replicator_smm.cpp:961
      #4  0x00007f375433c627 in galera_to_execute_start (gh=<optimized out>, conn_id=6, key=<optimized out>, key_len=1, query=0x7f370825f090, query_len=160, global_seqno=0x4b0c958) at galera/src/wsrep_provider.cpp:579
      #5  0x0000000000775c25 in wsrep_TOI_begin (thd=0x4b0a570, db_=0x0, table_=0x0, table_list=0x7f37082f8328) at maria-5.5-galera/sql/wsrep_mysqld.cc:1091
      #6  0x0000000000776a0a in wsrep_to_isolation_begin (thd=0x4b0a570, db_=0x0, table_=0x0, table_list=0x7f37082f8328) at maria-5.5-galera/sql/wsrep_mysqld.cc:1227
      #7  0x000000000095e0f0 in Alter_table_statement::execute (this=0x7f37082f8ab8, thd=0x4b0a570) at maria-5.5-galera/sql/sql_alter.cc:110
      #8  0x0000000000620c93 in mysql_execute_command (thd=0x4b0a570) at maria-5.5-galera/sql/sql_parse.cc:4844
      #9  0x0000000000624da9 in mysql_parse (thd=0x4b0a570, rawbuf=0x7f37082f8138 "ALTER TABLE `table1000_innodb_int_autoinc` ADD COLUMN filler VARCHAR(255) DEFAULT ' filler '", length=92, parser_state=0x7f374810d550) at maria-5.5-galera/sql/sql_parse.cc:6304
      #10 0x0000000000623e70 in wsrep_mysql_parse (thd=0x4b0a570, rawbuf=0x7f37082f8138 "ALTER TABLE `table1000_innodb_int_autoinc` ADD COLUMN filler VARCHAR(255) DEFAULT ' filler '", length=92, parser_state=0x7f374810d550) at maria-5.5-galera/sql/sql_parse.cc:6069
      #11 0x0000000000616335 in dispatch_command (command=COM_QUERY, thd=0x4b0a570, packet=0x4b0e251 "ALTER TABLE `table1000_innodb_int_autoinc` ADD COLUMN filler VARCHAR(255) DEFAULT ' filler '", packet_length=92) at maria-5.5-galera/sql/sql_parse.cc:1245
      #12 0x0000000000615114 in do_command (thd=0x4b0a570) at maria-5.5-galera/sql/sql_parse.cc:891
      #13 0x000000000071ddd7 in do_handle_one_connection (thd_arg=0x4b0a570) at maria-5.5-galera/sql/sql_connect.cc:1291
      #14 0x000000000071d7af in handle_one_connection (arg=0x4b0a570) at maria-5.5-galera/sql/sql_connect.cc:1199
      #15 0x00007f375596ee9a in start_thread (arg=0x7f374810e700) at pthread_create.c:308
      #16 0x00007f375509fcbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
      #17 0x0000000000000000 in ?? ()
      
      

      Server command lines:

      mysqld --no-defaults --basedir=/home/elenst/maria-5.5-galera --lc-messages-dir=/home/elenst/maria-5.5-galera/sql/share/ --core --datadir=/home/elenst/maria-5.5-galera/data1 --tmpdir=/home/elenst/maria-5.5-galera/data1/tmp --port=8306 --socket=/home/elenst/maria-5.5-galera/data1/tmp/node1.sock --wsrep-provider=/home/elenst/galera/libgalera_smm.so --wsrep-cluster-address=gcomm:// --binlog-format=row --wsrep-sst-method=rsync --log-error=/home/elenst/maria-5.5-galera/data1/log.err --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --general-log=1
      
      mysqld --no-defaults --basedir=/home/elenst/maria-5.5-galera --lc-messages-dir=/home/elenst/maria-5.5-galera/sql/share/ --core --datadir=/home/elenst/maria-5.5-galera/data2 --tmpdir=/home/elenst/maria-5.5-galera/data2/tmp --port=8307 --socket=/home/elenst/maria-5.5-galera/data2/tmp/node2.sock --wsrep-provider=/home/elenst/galera/libgalera_smm.so --wsrep-cluster-address=gcomm://127.0.0.1:4567?gmcast.listen_addr=tcp://127.0.0.1:4566 --binlog-format=row --wsrep-sst-method=rsync --log-error=/home/elenst/maria-5.5-galera/data2/log.err --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --general-log=1
      

      To reproduce using RQG:

      # On one node:
      perl ./gentest.pl --gendata=conf/engines/handler.zz --threads=1 --queries=100M --duration=600 --dsn=dbi:mysql:host=127.0.0.1:port=8306:user=root:database=test --grammar=conf/engines/handler.yy --engine=innodb
      
      # On another node, after the first node creates the table and starts the flow:
      perl ./gentest.pl  --threads=1 --queries=100M --duration=600 --dsn=dbi:mysql:host=127.0.0.1:port=8307:user=root:database=test --grammar=conf/engines/handler.yy
      
      revision-id: seppo.jaakola@codership.com-20130216222240-syypg8kc355qcpsp
      revno: 3380
      branch-nick: maria-5.5-galera
      

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            seppo Seppo Jaakola added a comment -

            The safe_mutex issue has been fixed with earlier releases (cannot pinpoint which one exactly). However, this randgen handler.yy load causes unresolved MDL conflict due to aborted handler session.

            This deadlock is avoided in fixes for: https://bugs.launchpad.net/codership-mysql/+bug/1233353

            Show
            seppo Seppo Jaakola added a comment - The safe_mutex issue has been fixed with earlier releases (cannot pinpoint which one exactly). However, this randgen handler.yy load causes unresolved MDL conflict due to aborted handler session. This deadlock is avoided in fixes for: https://bugs.launchpad.net/codership-mysql/+bug/1233353
            Show
            seppo Seppo Jaakola added a comment - Fix was pushed in revision: http://bazaar.launchpad.net/~maria-captains/maria/maria-5.5-galera/revision/3429

              People

              • Assignee:
                seppo Seppo Jaakola
                Reporter:
                elenst Elena Stepanova
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 4 hours
                  4h