Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-517

Galera: Server crashes with "no such a transition REPLICATING -> REPLICATING" on concurrent CREATE TABLE .. AS

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 5.5.25-galera
    • Fix Version/s: 5.5.28a-galera
    • Component/s: None
    • Labels:
      None

      Description

      WSREP: FSM: no such a transition REPLICATING -> REPLICATING
      mysqld got signal 6 ;
      
      #4  0x00007f870932ab0b in __GI_abort () at abort.c:92
      #5  0x00007f8708673b21 in galera::FSM<galera::TrxHandle::State, galera::TrxHandle::Transition, galera::EmptyGuard, galera::EmptyAction>::shift_to (this=0x7f86d0018ca0, state=galera::TrxHandle::S_REPLICATING) at galera/src/fsm.hpp:81
      #6  0x00007f87086697f6 in set_state (state=galera::TrxHandle::S_REPLICATING, this=0x7f86d0018c10) at galera/src/trx_handle.hpp:196
      #7  galera::ReplicatorSMM::replicate (this=0x3276fc0, trx=0x7f86d0018c10) at galera/src/replicator_smm.cpp:513
      #8  0x00007f870868090f in galera_pre_commit (gh=<optimized out>, conn_id=<optimized out>, trx_handle=<optimized out>, rbr_data=0x42669e0, rbr_data_len=172, flags=0, global_seqno=0x7f86d0012e40) at galera/src/wsrep_provider.cpp:386
      #9  0x00000000006659bc in wsrep_run_wsrep_commit (thd=0x7f86d0010d10, hton=<optimized out>, all=<optimized out>) at sql/wsrep_hton.cc:332
      #10 0x00000000006665a3 in wsrep_prepare (hton=0x330df60, thd=0x7f86d0010d10, all=false) at sql/wsrep_hton.cc:118
      #11 0x00000000006b0a40 in ha_commit_trans (thd=0x7f86d0010d10, all=<optimized out>) at sql/handler.cc:1276
      #12 0x00000000006443f4 in trans_commit_stmt (thd=0x7f86d0010d10) at sql/transaction.cc:325
      #13 0x0000000000579a59 in select_create::send_eof (this=0x7f86c8002268) at sql/sql_insert.cc:4157
      #14 0x00000000005d3195 in return_zero_rows (all_fields=..., having=0x0, info=<optimized out>, select_options=<optimized out>, send_row=false, fields=..., tables=<optimized out>, result=0x7f86c8002268, join=0x7f86c8002340) at sql/sql_select.cc:11036
      #15 JOIN::exec (this=0x7f86c8002340) at sql/sql_select.cc:2277
      #16 0x00000000005d53b4 in mysql_select (thd=0x7f86d0010d10, rref_pointer_array=<optimized out>, tables=0x7f86c8001190, wild_num=0, fields=<optimized out>, conds=<optimized out>, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2953054976, result=0x7f86c8002268, unit=0x7f86d0013080, select_lex=0x7f86d0013758) at sql/sql_select.cc:3044
      #17 0x00000000005d9534 in handle_select (thd=0x7f86d0010d10, lex=0x7f86d0012fd0, result=0x7f86c8002268, setup_tables_done_option=0) at sql/sql_select.cc:313
      #18 0x0000000000598b14 in mysql_execute_command (thd=0x7f86d0010d10) at sql/sql_parse.cc:2972
      #19 0x0000000000599db9 in mysql_parse (parser_state=0x7f86fc30d340, thd=0x7f86d0010d10, rawbuf=<optimized out>, length=<optimized out>) at sql/sql_parse.cc:6235
      #20 mysql_parse (thd=0x7f86d0010d10, rawbuf=<optimized out>, length=103, parser_state=0x7f86fc30d340) at sql/sql_parse.cc:6160
      #21 0x000000000059bcfc in dispatch_command (command=COM_QUERY, thd=0x7f86d0010d10, packet=0x7f86d002fec1 "CREATE TABLE IF NOT EXISTS test.ts_3 AS SELECT SCHEMA_NAME FROM information_schema.schemata WHERE 1 = 0", packet_length=<optimized out>) at sql/sql_parse.cc:1240
      #22 0x000000000059c412 in do_command (thd=0x7f86d0010d10) at sql/sql_parse.cc:899
      #23 0x0000000000638394 in do_handle_one_connection (thd_arg=<optimized out>) at sql/sql_connect.cc:1278
      #24 0x000000000063843a in handle_one_connection (arg=0x7f86d0010d10) at sql/sql_connect.cc:1186
      #25 0x00007f8709c24efc in start_thread (arg=0x7f86fc30e700) at pthread_create.c:304
      #26 0x00007f87093d459d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
      

      All threads stack trace is attached.

      bzr version-info

      revision-id: sergii@pisem.net-20120908075116-u2v35aaixa3nnhrl
      date: 2012-09-08 09:51:16 +0200
      build-date: 2012-09-09 02:36:37 +0400
      revno: 3346
      

      Also reproducible on 5.5.25-galera release binaries (x64_64 binary tarball).

      To reproduce, start the cluster and run the following script on one of nodes. Please note that you need either to have mysqlslap 5.5 on the path, or provide full path to it in slap option. If you have a non-default server port (not 3306), provide it in port option. Default number of concurrent instances is 10; if you are getting the crash fast, you can try to decrease it via clients option. If you are not getting the crash, try to increase it and maybe the number of cycles (cycles, default 50). On my machine, 2 clients is enough to produce the crash, but with 10 it happens almost momentarily.

      #!/bin/bash
      
      slap=`which mysqlslap`
      port=3306
      num_of_clients=10
      num_of_cycles=50
      options="--create-schema=test --concurrency=1 --detach=4 --protocol=tcp"
      
      for arg do
        val=`echo "$arg" | sed -e 's;^--[^=]*=;;'`
        case "$arg" in
          --port=*) port="$val" ;;
          --slap=*) slap="$val" ;;
          --clients=*) num_of_clients="$val" ;;
          --cycles=*) num_of_cycles="$val" ;;
          *) options="$options $val" ;;
        esac
      done
      
      
      if [ -z "$slap" ]
      then
        echo ERROR: mysqlslap is not found.
        echo Either provide it as --slap option,
        echo or make sure it is on the path
        exit 1
      fi
      
      c=1
      while [ $c -le $num_of_clients ] ; do
        echo CREATE TABLE IF NOT EXISTS test.ts_$c AS SELECT SCHEMA_NAME FROM information_schema.schemata WHERE 1 = 0 > slap_client_$c
        echo CREATE TABLE IF NOT EXISTS test.tr_$c  AS SELECT ROUTINE_SCHEMA FROM information_schema.routines WHERE 1 = 0 >> slap_client_$c
        echo DROP TABLE IF EXISTS test.ts_$c >> slap_client_$c
        echo DROP TABLE IF EXISTS test.tr_$c >> slap_client_$c
      
        (( c += 1 ))
      done
      
      c=1
      (( num_of_queries = num_of_cycles * 4 ))
      while [ $c -lt $num_of_clients ] ; do
        $slap --silent --query=slap_client_$c --number_of_queries=$num_of_queries --port=$port $options &
        (( c += 1 ))
      done
      $slap --silent --query=slap_client_$c --number_of_queries=$num_of_queries --port=$port $options
      
      

      My server command line:

      maria-5.5-galera/sql/mysqld --no-defaults --datadir=maria-5.5-galera/data1 --wsrep_provider=galera-23.2.1-src/libgalera_smm.so --wsrep_sst_method=rsync --core --binlog-format=row --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0 --log-error=log.err --wsrep_sst_auth=galera:galera --basedir=maria-5.5-galera/ --port=8306 --loose-lc-messages-dir=maria-5.5-galera/sql/share --socket=/tmp/galera-1.sock --tmpdir=maria-5.5-galera/data1/tmp --general-log=1 --wsrep_cluster_address=gcomm:// --core
      

        Gliffy Diagrams

          Attachments

          1. log.err
            15 kB
          2. threads1
            72 kB
          3. ubuntu11.log
            5 kB

            Activity

            Hide
            seppo Seppo Jaakola added a comment -

            The problem has happened in the crashing thread during the processing of previous SQL statement. Do you have mysql error log saved from the test run? It would be good to analyze.
            If you will reproduce, set wsrep_debug=1 for the test run. It may log something useful for the troubleshooting.

            Show
            seppo Seppo Jaakola added a comment - The problem has happened in the crashing thread during the processing of previous SQL statement. Do you have mysql error log saved from the test run? It would be good to analyze. If you will reproduce, set wsrep_debug=1 for the test run. It may log something useful for the troubleshooting.
            Hide
            elenst Elena Stepanova added a comment - - edited

            I've put the test script in the bug description, does it not work for you?

            Show
            elenst Elena Stepanova added a comment - - edited I've put the test script in the bug description, does it not work for you?
            Hide
            elenst Elena Stepanova added a comment -

            I ran the server with wsrep_debug=1, but I don't see and debug records in there. Anyway, I've attached the error and general log (log.err and ubuntu11.log, correspondingly); but please do try the script from the description, it will be much more efficient if you're able to reproduce it locally.

            Show
            elenst Elena Stepanova added a comment - I ran the server with wsrep_debug=1, but I don't see and debug records in there. Anyway, I've attached the error and general log (log.err and ubuntu11.log, correspondingly); but please do try the script from the description, it will be much more efficient if you're able to reproduce it locally.
            Hide
            seppo Seppo Jaakola added a comment -

            CREATE TABLE AS SELECT.. is not processed as atomic operation, but is cut in two halves:
            1. The table create is replicated and applied as TO isolated operation, up front in all nodes
            2. Inserts are replicated at commit time through write set replication

            Show
            seppo Seppo Jaakola added a comment - CREATE TABLE AS SELECT.. is not processed as atomic operation, but is cut in two halves: 1. The table create is replicated and applied as TO isolated operation, up front in all nodes 2. Inserts are replicated at commit time through write set replication
            Hide
            seppo Seppo Jaakola added a comment -

            fix merged in from upstream

            Show
            seppo Seppo Jaakola added a comment - fix merged in from upstream

              People

              • Assignee:
                seppo Seppo Jaakola
                Reporter:
                elenst Elena Stepanova
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 1 day, 1 hour
                  1d 1h