Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4972

MariaDB crash due a long semaphore wait (Adaptive hash partitions related) btr_search_latch_part[i]. Server Hung

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Incomplete
    • Affects Version/s: 5.5.32
    • Fix Version/s: N/A
    • Labels:
    • Environment:
      MariaDB 5.5.32
      Innodb version: 5.5.32-MariaDB-30.2

      Server:
      Amazon EC2 m1.medium instance
      Ubuntu 12.04.2 LTS
      Kernel: 3.2.0-48-virtual
      1 CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
      4GB RAM

      Description

      A "long semaphore wait" issue is happening several times at random moments, and after 15 minutes aprox (few more seconds than 900sec) the server crash itself and forces a restart.

      I have not been able to reproduce the event that hit the bug, but i'm seeing it a very often.

      This is my cnf AHI related info:

      innodb_adaptive_hash_index ON
      innodb_adaptive_hash_index_partitions 64

      A workaround is to reduce AHI partitions to 1, but i really need as much as possible due to concurrency issues. Also, i would turn off the adaptive hash index but i really take advantage of using it

      Details as follows:

      InnoDB: ###### Diagnostic info printed to the standard error stream
      InnoDB: Error: semaphore wait has lasted > 600 seconds
      InnoDB: We intentionally crash the server, because it appears to be hung.
      130829 18:55:02 InnoDB: Assertion failure in thread 140411943253760 in file srv0srv.c line 2945
      InnoDB: We intentionally generate a memory trap.
      InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
      InnoDB: If you get repeated assertion failures or crashes, even
      InnoDB: immediately after the mysqld startup, there may be
      InnoDB: corruption in the InnoDB tablespace. Please refer to
      InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
      InnoDB: about forcing recovery.
      130829 18:55:02 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.

      To report this bug, see http://kb.askmonty.org/en/reporting-bugs

      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.

      Server version: 5.5.32-MariaDB-1~precise
      key_buffer_size=536870912
      read_buffer_size=2097152
      max_used_connections=8
      max_threads=102
      thread_count=6
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 943883 K bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.

      Thread pointer: 0x0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x0 thread_stack 0x48000
      (my_addr_resolve failure: fork)
      /usr/sbin/mysqld(my_print_stacktrace+0x2b) [0x7fb4e3e8b22b]
      /usr/sbin/mysqld(handle_fatal_signal+0x471) [0x7fb4e3ab3561]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7fb4e2d9ecb0]
      /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7fb4e1994425]
      /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7fb4e1997b8b]
      /usr/sbin/mysqld(+0x75b337) [0x7fb4e3d4b337]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7fb4e2d96e9a]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fb4e1a51ccd]
      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.
      130829 18:55:03 mysqld_safe Number of processes running now: 0
      130829 18:55:03 mysqld_safe mysqld restarted

      And error.log info while the semaphore wait exists:

      InnoDB: ###### Diagnostic info printed to the standard error stream
      InnoDB: Warning: a long semaphore wait:
      --Thread 140411809036032 has waited at btr0sea.c line 1203 for 924.00 seconds the semaphore:
      X-lock (wait_ex) on RW-latch at 0x7fb4e9e26748 'btr_search_latch_part[i]'
      a writer (thread id 140411809036032) has reserved it in mode wait exclusive
      number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
      Last time read locked in file btr0sea.c line 1105
      Last time write locked in file /home/buildbot/buildbot/build/mariadb-5.5.32/storage/xtradb/btr/btr0sea.c line 675
      InnoDB: Warning: a long semaphore wait:
      --Thread 140412288718592 has waited at btr0sea.c line 1986 for 922.00 seconds the semaphore:
      X-lock on RW-latch at 0x7fb4e9e26748 'btr_search_latch_part[i]'
      a writer (thread id 140411809036032) has reserved it in mode wait exclusive
      number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
      Last time read locked in file btr0sea.c line 1105
      Last time write locked in file /home/buildbot/buildbot/build/mariadb-5.5.32/storage/xtradb/btr/btr0sea.c line 675
      InnoDB: Warning: a long semaphore wait:
      --Thread 140414884079360 has waited at row0sel.c line 3827 for 752.00 seconds the semaphore:
      S-lock on RW-latch at 0x7fb4e9e26748 'btr_search_latch_part[i]'
      a writer (thread id 140411809036032) has reserved it in mode wait exclusive
      number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
      Last time read locked in file btr0sea.c line 1105
      Last time write locked in file /home/buildbot/buildbot/build/mariadb-5.5.32/storage/xtradb/btr/btr0sea.c line 675
      InnoDB: Warning: a long semaphore wait:
      --Thread 140414883780352 has waited at btr0sea.c line 1105 for 546.00 seconds the semaphore:
      S-lock on RW-latch at 0x7fb4e9e26748 'btr_search_latch_part[i]'
      a writer (thread id 140411809036032) has reserved it in mode wait exclusive
      number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
      Last time read locked in file btr0sea.c line 1105
      Last time write locked in file /home/buildbot/buildbot/build/mariadb-5.5.32/storage/xtradb/btr/btr0sea.c line 675
      InnoDB: Warning: a long semaphore wait:
      --Thread 140411918075648 has waited at buf0flu.c line 1353 for 525.00 seconds the semaphore:
      S-lock on RW-latch at 0x7fb44fc032e0 '&block->lock'
      a writer (thread id 140412288718592) has reserved it in mode exclusive
      number of readers 0, waiters flag 1, lock_word: 0
      Last time read locked in file buf0flu.c line 1336
      Last time write locked in file /home/buildbot/buildbot/build/mariadb-5.5.32/storage/xtradb/row/row0ins.c line 2038
      InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
      InnoDB: Pending preads 0, pwrites 0

      =====================================
      130829 18:54:50 INNODB MONITOR OUTPUT
      =====================================
      Per second averages calculated from the last 0 seconds
      ----------
      SEMAPHORES
      ----------
      OS WAIT ARRAY INFO: reservation count 98926, signal count 88545
      --Thread 140411809036032 has waited at btr0sea.c line 1203 for 942.00 seconds the semaphore:
      X-lock (wait_ex) on RW-latch at 0x7fb4e9e26748 'btr_search_latch_part[i]'
      a writer (thread id 140411809036032) has reserved it in mode wait exclusive
      number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
      Last time read locked in file btr0sea.c line 1105
      Last time write locked in file /home/buildbot/buildbot/build/mariadb-5.5.32/storage/xtradb/btr/btr0sea.c line 675
      --Thread 140412288718592 has waited at btr0sea.c line 1986 for 940.00 seconds the semaphore:
      X-lock on RW-latch at 0x7fb4e9e26748 'btr_search_latch_part[i]'
      a writer (thread id 140411809036032) has reserved it in mode wait exclusive
      number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
      Last time read locked in file btr0sea.c line 1105
      Last time write locked in file /home/buildbot/buildbot/build/mariadb-5.5.32/storage/xtradb/btr/btr0sea.c line 675
      --Thread 140414884079360 has waited at row0sel.c line 3827 for 770.00 seconds the semaphore:
      S-lock on RW-latch at 0x7fb4e9e26748 'btr_search_latch_part[i]'
      a writer (thread id 140411809036032) has reserved it in mode wait exclusive
      number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
      Last time read locked in file btr0sea.c line 1105
      Last time write locked in file /home/buildbot/buildbot/build/mariadb-5.5.32/storage/xtradb/btr/btr0sea.c line 675
      --Thread 140414883780352 has waited at btr0sea.c line 1105 for 564.00 seconds the semaphore:
      S-lock on RW-latch at 0x7fb4e9e26748 'btr_search_latch_part[i]'
      a writer (thread id 140411809036032) has reserved it in mode wait exclusive
      number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
      Last time read locked in file btr0sea.c line 1105
      Last time write locked in file /home/buildbot/buildbot/build/mariadb-5.5.32/storage/xtradb/btr/btr0sea.c line 675
      --Thread 140411918075648 has waited at buf0flu.c line 1353 for 543.00 seconds the semaphore:
      S-lock on RW-latch at 0x7fb44fc032e0 '&block->lock'
      a writer (thread id 140412288718592) has reserved it in mode exclusive
      number of readers 0, waiters flag 1, lock_word: 0
      Last time read locked in file buf0flu.c line 1336
      Last time write locked in file /home/buildbot/buildbot/build/mariadb-5.5.32/storage/xtradb/row/row0ins.c line 2038
      Mutex spin waits 39976, rounds 1265451, OS waits 40438
      RW-shared spins 32034, rounds 969000, OS waits 31054
      RW-excl spins 15976, rounds 823475, OS waits 25738
      Spin rounds per wait: 31.66 mutex, 30.25 RW-shared, 51.54 RW-excl

      -------------------------------------
      INSERT BUFFER AND ADAPTIVE HASH INDEX
      -------------------------------------
      Ibuf: size 1, free list len 5, seg size 7, 520058 merges
      merged operations:
      insert 821217, delete mark 182704, delete 35291
      discarded operations:
      insert 0, delete mark 0, delete 0
      Hash table size 34679, node heap has 1 buffer(s)
      Hash table size 34679, node heap has 3 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 1 buffer(s)
      Hash table size 34679, node heap has 2089 buffer(s)
      Hash table size 34679, node heap has 1 buffer(s)
      Hash table size 34679, node heap has 3 buffer(s)
      Hash table size 34679, node heap has 13 buffer(s)
      Hash table size 34679, node heap has 1 buffer(s)
      Hash table size 34679, node heap has 4 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 1 buffer(s)
      Hash table size 34679, node heap has 1 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 5 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 1 buffer(s)
      Hash table size 34679, node heap has 1 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 1 buffer(s)
      Hash table size 34679, node heap has 8 buffer(s)
      Hash table size 34679, node heap has 6091 buffer(s)
      Hash table size 34679, node heap has 4 buffer(s)
      Hash table size 34679, node heap has 3 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 1 buffer(s)
      Hash table size 34679, node heap has 1 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 3 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 5 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 3 buffer(s)
      Hash table size 34679, node heap has 1 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 1 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 4 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 3 buffer(s)
      Hash table size 34679, node heap has 6 buffer(s)
      Hash table size 34679, node heap has 7 buffer(s)
      Hash table size 34679, node heap has 5 buffer(s)
      Hash table size 34679, node heap has 6 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 4 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 4 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 38 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 5 buffer(s)
      Hash table size 34679, node heap has 1 buffer(s)
      Hash table size 34679, node heap has 1 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      Hash table size 34679, node heap has 2 buffer(s)
      0.00 hash searches/s, 0.00 non-hash searches/s

      LOG

      Log sequence number 258253154540
      Log flushed up to 258253154540
      Last checkpoint at 258246643046
      Max checkpoint age 434154333
      Checkpoint age target 420587011
      Modified age 6187681
      Checkpoint age 6511494
      0 pending log writes, 0 pending chkp writes
      8315740 log i/o's done, 0.00 log i/o's/second
      ----------------------
      BUFFER POOL AND MEMORY
      ----------------------
      Total memory allocated 1105723392; in additional pool allocated 0
      Total memory allocated by read views 456
      Internal hash tables (constant factor + variable factor)
      Adaptive hash index 19074048 (17755648 + 1318400)
      Page hash 1107208 (buffer pool 0 only)
      Dictionary cache 167603811 (4427312 + 163176499)
      File system 83536 (82672 + 864)
      Lock system 2662544 (2657176 + 5368)
      Recovery system 0 (0 + 0)
      Dictionary memory allocated 163176499
      Buffer pool size 65535
      Buffer pool size, bytes 1073725440
      Free buffers 0
      Database pages 57150
      Old database pages 21076
      Modified db pages 2790
      Pending reads 0
      Pending writes: LRU 0, flush list 2, single page 0
      Pages made young 1474128, not young 0
      0.00 youngs/s, 0.00 non-youngs/s
      Pages read 1606609, created 54891, written 2880442
      0.00 reads/s, 0.00 creates/s, 0.00 writes/s
      No buffer pool page gets since the last printout
      Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
      LRU len: 57150, unzip_LRU len: 0
      I/O sum[3472]:cur[0], unzip sum[0]:cur[0]
      --------------
      ROW OPERATIONS
      --------------
      0 queries inside InnoDB, 0 queries in queue
      3 read views open inside InnoDB
      4 transactions active inside InnoDB
      4 out of 1000 descriptors used
      --OLDEST VIEW--
      Normal read view
      Read view low limit trx n:o 9F00B98
      Read view up limit trx id 9F00B97
      Read view low limit trx id 9F00B98
      Read view individually stored trx ids:
      Read view trx id 9F00B97
      -----------------
      Main thread process no. 26237, id 140411918075648, state: flushing log
      Number of rows inserted 5268202, updated 2815261, deleted 1057257, read 847095697
      0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
      ..
      ...
      ...
      ....
      END OF INNODB MONITOR OUTPUT
      ============================

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              elenst Elena Stepanova added a comment -

              Hi,

              Is it a production environment or a test setup?
              Can you provide any information about what kind of workload you have on the server, what is the amount of the data, and also attach the complete cnf file?

              Thanks.

              Show
              elenst Elena Stepanova added a comment - Hi, Is it a production environment or a test setup? Can you provide any information about what kind of workload you have on the server, what is the amount of the data, and also attach the complete cnf file? Thanks.
              Hide
              nethalo Daniel Guzman Burgos added a comment -

              Hi!
              It's a production environment.

              The workload is mostly reads. A lot (A LOT!) of complex queries with lots of joins. Very lightly IO. Sometimes kind-of CPU Bound.

              cnf file attached

              Thanks!

              Show
              nethalo Daniel Guzman Burgos added a comment - Hi! It's a production environment. The workload is mostly reads. A lot (A LOT!) of complex queries with lots of joins. Very lightly IO. Sometimes kind-of CPU Bound. cnf file attached Thanks!
              Show
              pomyk Patryk Pomykalski added a comment - Looks like: https://bugs.launchpad.net/percona-server/+bug/1199213
              Hide
              elenst Elena Stepanova added a comment -

              Daniel, thanks.

              Patryk, yes, that's exactly the problem – it looks like all those random InnoDB failures which end up with the long semaphore wait, and are never reliably reproducible... As I understand, the Percona bug is in the same state.

              Show
              elenst Elena Stepanova added a comment - Daniel, thanks. Patryk, yes, that's exactly the problem – it looks like all those random InnoDB failures which end up with the long semaphore wait, and are never reliably reproducible... As I understand, the Percona bug is in the same state.
              Hide
              nethalo Daniel Guzman Burgos added a comment -

              Well, thank you. It seems we have to wait for the Percona guys to come out with a fix.
              In the meanwhile, i'll go with the workaround of turning off the adaptive hash index.

              Show
              nethalo Daniel Guzman Burgos added a comment - Well, thank you. It seems we have to wait for the Percona guys to come out with a fix. In the meanwhile, i'll go with the workaround of turning off the adaptive hash index.
              Hide
              elenst Elena Stepanova added a comment -

              Percona eventually closed the bug as Incomplete, I'm afraid we have to do the same.
              That said, there have been many fixes regarding InnoDB hang ups over this time, it's possible that this issue was fixed as well.

              Show
              elenst Elena Stepanova added a comment - Percona eventually closed the bug as Incomplete, I'm afraid we have to do the same. That said, there have been many fixes regarding InnoDB hang ups over this time, it's possible that this issue was fixed as well.

                People

                • Assignee:
                  Unassigned
                  Reporter:
                  nethalo Daniel Guzman Burgos
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  5 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: