Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8247

Error 22 when trying to lock mutex info->append_buffer_lock on START SLAVE SQL_THREAD

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 10.1
    • Fix Version/s: 10.1
    • Component/s: Replication
    • Labels:
      None

      Description

      I have hit this problem 3 times by now, but it's very poorly reproducible. In the last run, I got it once in 15 attempts, and i consider it lucky. I don't have much hope you'll get it reproduced (although I'll describe later how to run the test), and I don't know which path to choose trying to narrow it down – since it's so badly reproducible to start with, it can take very long time. So, I'm filing it as is, maybe you'll get something out of the stack traces and logs – if not the root cause then maybe at least ideas what to do to trigger it more reliably.

      Got error 22 when trying to lock mutex info->append_buffer_lock at /home/elenst/git/10.1/mysys/mf_iocache.c, line 1738
      150529  1:57:54 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
      
      Stack trace from 10.1 commit bb15b9e29b33
      #5  0x00007f0964115b8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
      #6  0x00007f0966edd1bf in safe_mutex_lock (mp=0x7f0963791510, my_flags=0, file=0x7f096716fa58 "10.1/mysys/mf_iocache.c", line=1738) at 10.1/mysys/thr_mutex.c:301
      #7  0x00007f0966eb29ef in inline_mysql_mutex_lock (that=0x7f0963791510, src_file=0x7f096716fa58 "10.1/mysys/mf_iocache.c", src_line=1738) at 10.1/include/mysql/psi/mysql_thread.h:689
      #8  0x00007f0966eb5b96 in my_b_flush_io_cache (info=0x7f09637914b0, need_append_buffer_lock=1) at 10.1/mysys/mf_iocache.c:1738
      #9  0x00007f0966eb67e3 in my_b_seek (info=0x7f09637914b0, pos=0) at 10.1/mysys/mf_iocache2.c:146
      #10 0x00007f0966786d7d in init_master_info (mi=0x7f096378f000, master_info_fname=0x7f094ccb6190 "master.info", slave_info_fname=0x7f094ccb6390 "relay-log.info", abort_if_no_master_info_file=false, thread_mask=2) at 10.1/sql/rpl_mi.cc:327
      #11 0x00007f096669aaaf in start_slave (thd=0x7f093cac3070, mi=0x7f096378f000, net_report=true) at 10.1/sql/sql_repl.cc:3023
      #12 0x00007f09666684f4 in mysql_execute_command (thd=0x7f093cac3070) at 10.1/sql/sql_parse.cc:3493
      #13 0x00007f0966673dbd in mysql_parse (thd=0x7f093cac3070, rawbuf=0x7f0934dff088 "START SLAVE SQL_THREAD", length=22, parser_state=0x7f094ccb71e0) at 10.1/sql/sql_parse.cc:7161
      #14 0x00007f096666300a in dispatch_command (command=COM_QUERY, thd=0x7f093cac3070, packet=0x7f093ca7f071 "START SLAVE SQL_THREAD", packet_length=22) at 10.1/sql/sql_parse.cc:1462
      #15 0x00007f0966661de5 in do_command (thd=0x7f093cac3070) at 10.1/sql/sql_parse.cc:1090
      #16 0x00007f0966792abd in do_handle_one_connection (thd_arg=0x7f093cac3070) at 10.1/sql/sql_connect.cc:1347
      #17 0x00007f0966792815 in handle_one_connection (arg=0x7f093cac3070) at 10.1/sql/sql_connect.cc:1258
      #18 0x00007f0964aa2e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
      #19 0x00007f09641cfcbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
      
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x7f0934dff088): START SLAVE SQL_THREAD
      Connection ID (thread ID): 222
      Status: NOT_KILLED
      

      Attached:
      master.log - master's general log
      master.err - master's error log
      slave.log - slave's general log
      slave.err - slave's error log
      mysql-bin.tar.gz - master's binary files (slave was running without binlog).

      RQG command line (I use lp:~elenst/randgen/mariadb-patches revno 1019):

      perl ./runall-trials.pl --force --trials=15 --no-mask --seed=1432597535 --threads=16 --duration=400 --queries=100M --reporters=QueryTimeout,Backtrace,ErrorLog,Deadlock,ReplicationThreadRestarter --redefine=conf/mariadb/general-workarounds.yy --redefine=conf/mariadb/10.0-features-redefine.yy --mysqld=--log_output=FILE --mysqld=--slow_query_log --mysqld=--log_bin_trust_function_creators=1 --mysqld=--query_cache_size=64M --views --grammar=conf/runtime/metadata_stability.yy --gendata=conf/runtime/metadata_stability.zz --rpl_mode=row --mysqld=--slave-skip-errors=1049,1305,1539,1505 --mysqld=--slave-parallel-mode=optimistic --mysqld=--binlog_commit_wait_count=20 --mysqld=--binlog_commit_wait_usec=1000000 --mysqld=--slave-parallel-threads=20 --use-gtid=current_pos --mysqld=--max-binlog-size=4096 --basedir1=<your basedir> --vardir1=<your vardir>
      

      The test runs DDL/DML flow on InnoDB tables, and also starts/stops SQL threads every now and then. DML should be (mostly or only) row-based.

      The command above will run the test 15 times no matter what, and will store all failures in <your vardir>_trialX and <your vardir>_trialX_slave. If you want to stop after the first failure, remove --force (but then it's possible you'll hit some other one).

        Gliffy Diagrams

          Attachments

          1. master.err
            51 kB
          2. master.log
            1.65 MB
          3. mysql-bin.tar.gz
            54 kB
          4. slave.err
            12 kB
          5. slave.log
            306 kB

            Activity

            There are no comments yet on this issue.

              People

              • Assignee:
                knielsen Kristian Nielsen
                Reporter:
                elenst Elena Stepanova
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: