Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8098

when sysbench stress test, mariadb server hung

    Details

    • Type: Bug
    • Status: Stalled
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 10.0.16, 10.0.17
    • Fix Version/s: None
    • Labels:
    • Environment:

      Description

      We used sysbench test performance, test scripts is:
      sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --oltp-tables-count=8 --mysql-host=localhost --mysql-port=3306 --mysql-socket=/ssd/mysql3306/data/mysql.sock --mysql-user=root --mysql-password=123456 --mysql-db=test prepare

      sysbench --db-driver=mysql --test=/usr/share/doc/sysbench/tests/db/insert.lua --num-threads=8 --max-requests=0 --mysql-host=localhost --mysql-port=3306 --mysql-socket=/ssd/mysql3306/data/mysql.sock --mysql-user=root --mysql-password=123456 --mysql-db=test --max-time=0 --report-interval=1 --oltp-tables-count=8 run

      after run about 50 seconds, mysql stop write and hunged, See below:
      Running the test with following options:
      Number of threads: 8
      Report intermediate results every 1 second(s)
      Random number generator seed is 0 and will be ignored

      Threads started!
      [ 1s] threads: 8, tps: 0.00, reads: 0.00, writes: 47510.95, response time: 0.24ms (95%), errors: 0.00, reconnects: 0.00
      [ 2s] threads: 8, tps: 0.00, reads: 0.00, writes: 49085.07, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 3s] threads: 8, tps: 0.00, reads: 0.00, writes: 49759.05, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 4s] threads: 8, tps: 0.00, reads: 0.00, writes: 49685.02, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 5s] threads: 8, tps: 0.00, reads: 0.00, writes: 49274.95, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 6s] threads: 8, tps: 0.00, reads: 0.00, writes: 49619.04, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 7s] threads: 8, tps: 0.00, reads: 0.00, writes: 49430.88, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 8s] threads: 8, tps: 0.00, reads: 0.00, writes: 49578.98, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 9s] threads: 8, tps: 0.00, reads: 0.00, writes: 49544.02, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 10s] threads: 8, tps: 0.00, reads: 0.00, writes: 49387.11, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 11s] threads: 8, tps: 0.00, reads: 0.00, writes: 49451.95, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 12s] threads: 8, tps: 0.00, reads: 0.00, writes: 49237.96, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 13s] threads: 8, tps: 0.00, reads: 0.00, writes: 49800.00, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 14s] threads: 8, tps: 0.00, reads: 0.00, writes: 49296.99, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 15s] threads: 8, tps: 0.00, reads: 0.00, writes: 49360.08, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 16s] threads: 8, tps: 0.00, reads: 0.00, writes: 49480.71, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 17s] threads: 8, tps: 0.00, reads: 0.00, writes: 49598.06, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 18s] threads: 8, tps: 0.00, reads: 0.00, writes: 49374.20, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 19s] threads: 8, tps: 0.00, reads: 0.00, writes: 49658.95, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 20s] threads: 8, tps: 0.00, reads: 0.00, writes: 49320.93, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 21s] threads: 8, tps: 0.00, reads: 0.00, writes: 49569.03, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 22s] threads: 8, tps: 0.00, reads: 0.00, writes: 49374.03, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 23s] threads: 8, tps: 0.00, reads: 0.00, writes: 49448.00, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 24s] threads: 8, tps: 0.00, reads: 0.00, writes: 49090.02, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 25s] threads: 8, tps: 0.00, reads: 0.00, writes: 49602.00, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 26s] threads: 8, tps: 0.00, reads: 0.00, writes: 49435.01, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 27s] threads: 8, tps: 0.00, reads: 0.00, writes: 49374.13, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 28s] threads: 8, tps: 0.00, reads: 0.00, writes: 49537.91, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 29s] threads: 8, tps: 0.00, reads: 0.00, writes: 49417.95, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 30s] threads: 8, tps: 0.00, reads: 0.00, writes: 49680.04, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 31s] threads: 8, tps: 0.00, reads: 0.00, writes: 49535.99, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 32s] threads: 8, tps: 0.00, reads: 0.00, writes: 49341.03, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 33s] threads: 8, tps: 0.00, reads: 0.00, writes: 49661.02, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 34s] threads: 8, tps: 0.00, reads: 0.00, writes: 49278.02, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 35s] threads: 8, tps: 0.00, reads: 0.00, writes: 49508.91, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 36s] threads: 8, tps: 0.00, reads: 0.00, writes: 49985.01, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 37s] threads: 8, tps: 0.00, reads: 0.00, writes: 49132.97, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 38s] threads: 8, tps: 0.00, reads: 0.00, writes: 49532.03, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 39s] threads: 8, tps: 0.00, reads: 0.00, writes: 49514.00, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 40s] threads: 8, tps: 0.00, reads: 0.00, writes: 49173.08, response time: 0.23ms (95%), errors: 0.00, reconnects: 0.00
      [ 41s] threads: 8, tps: 0.00, reads: 0.00, writes: 49718.98, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 42s] threads: 8, tps: 0.00, reads: 0.00, writes: 49538.95, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 43s] threads: 8, tps: 0.00, reads: 0.00, writes: 49319.96, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 44s] threads: 8, tps: 0.00, reads: 0.00, writes: 49652.04, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 45s] threads: 8, tps: 0.00, reads: 0.00, writes: 49211.96, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 46s] threads: 8, tps: 0.00, reads: 0.00, writes: 49638.06, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 47s] threads: 8, tps: 0.00, reads: 0.00, writes: 46919.54, response time: 0.22ms (95%), errors: 0.00, reconnects: 0.00
      [ 48s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00
      [ 49s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00
      [ 50s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00
      [ 51s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00
      ...............................................................

      *show engine innodb status contents at below :*

      BACKGROUND THREAD
      -----------------
      srv_master_thread loops: 49 srv_active, 0 srv_shutdown, 1084 srv_idle
      srv_master_thread log flush and writes: 1132
      ----------
      SEMAPHORES
      ----------
      OS WAIT ARRAY INFO: reservation count 7022
      --Thread 139710954862336 has waited at log0log.ic line 358 for 282.00 seconds the semaphore:
      Mutex at 0x7f10fe0308b8 '&log_sys->mutex', lock var 1
      waiters flag 1
      --Thread 139710954563328 has waited at log0log.ic line 358 for 282.00 seconds the semaphore:
      Mutex at 0x7f10fe0308b8 '&log_sys->mutex', lock var 1
      waiters flag 1
      --Thread 139710804371200 has waited at log0log.ic line 358 for 282.00 seconds the semaphore:
      Mutex at 0x7f10fe0308b8 '&log_sys->mutex', lock var 1
      waiters flag 1
      --Thread 139690453731072 has waited at log0log.ic line 461 for 281.00 seconds the semaphore:
      Mutex at 0x7f10fe0308b8 '&log_sys->mutex', lock var 1
      waiters flag 1
      --Thread 139690506180352 has waited at log0log.cc line 1803 for 281.00 seconds the semaphore:
      Mutex at 0x7f10fe0308b8 '&log_sys->mutex', lock var 1
      waiters flag 1
      OS WAIT ARRAY INFO: signal count 171294
      Mutex spin waits 3491278, rounds 22606058, OS waits 3668
      RW-shared spins 48111, rounds 141880, OS waits 1827
      RW-excl spins 180519, rounds 780043, OS waits 657
      Spin rounds per wait: 6.48 mutex, 2.95 RW-shared, 4.32 RW-excl

      ------------
      TRANSACTIONS
      ------------
      Trx id counter 306500602
      Purge done for trx's n:o < 304158823 undo n:o < 0 state: running but idle
      History list length 469
      LIST OF TRANSACTIONS FOR EACH SESSION:
      ---TRANSACTION 306500597, not started flushing log
      mysql tables in use 1, locked 1
      MySQL thread id 11, OS thread handle 0x7f10f4cff700, query id 2320632 localhost root query end
      INSERT INTO sbtest3 (id, k, c, pad) VALUES (0, 5024, '59822754713-21504320006-70402977880-52533779811-55443850956-75663501138-32952798122-52984185374-40063414595-30990348181', '00149832870-23643902826-53824533577-44914074721-61304247725')
      ---TRANSACTION 306500595, not started flushing log
      mysql tables in use 1, locked 1
      MySQL thread id 10, OS thread handle 0x7f10f4d48700, query id 2320630 localhost root query end
      INSERT INTO sbtest3 (id, k, c, pad) VALUES (0, 5240, '76543084107-59746395204-79243311540-11836470234-37353949646-01770990785-75455199727-65275840224-15238662556-35807951565', '50044502986-25226433799-47797793636-03022954115-60278688538')
      ---TRANSACTION 306500594, not started flushing log
      mysql tables in use 1, locked 1
      MySQL thread id 9, OS thread handle 0x7f10f4d91700, query id 2320629 localhost root query end
      INSERT INTO sbtest5 (id, k, c, pad) VALUES (0, 5058, '37756273936-19470060108-05348482238-36788386100-65707124278-00778599700-58513907809-00382830729-81260231488-24197577440', '96968918924-48962654427-86527086044-70524670701-49888977143')
      ---TRANSACTION 306500596, not started flushing log
      mysql tables in use 1, locked 1
      MySQL thread id 6, OS thread handle 0x7f10fdccd700, query id 2320631 localhost root query end
      INSERT INTO sbtest6 (id, k, c, pad) VALUES (0, 4958, '42595431418-54160031986-75414205416-45839511846-81285571692-56084379278-23293939469-87089281598-19233348060-94919714254', '30515557351-39440033873-59801946865-60824112594-14420176186')
      ---TRANSACTION 306500598, not started flushing log
      mysql tables in use 1, locked 1
      MySQL thread id 7, OS thread handle 0x7f10fdc84700, query id 2320633 localhost root query end
      INSERT INTO sbtest2 (id, k, c, pad) VALUES (0, 5010, '62845186706-10817751210-53424088184-51434127394-72460917740-75469062308-83337747236-08014652804-91497207912-11117706272', '86108388613-12443483580-78147342437-57233103465-54986820553')
      ---TRANSACTION 306500601, ACTIVE 282 sec inserting
      mysql tables in use 1, locked 1
      1 lock struct(s), heap size 360, 0 row lock(s)
      MySQL thread id 8, OS thread handle 0x7f10f4dda700, query id 2320636 localhost root update
      INSERT INTO sbtest7 (id, k, c, pad) VALUES (0, 5012, '91772122081-60473838408-64691507610-84945227497-11920568393-31710259244-24773254948-34705940136-89996219460-36245238979', '08842506806-76703742979-33613751407-86512328179-69715058019')
      Trx #rec lock waits 0 #table lock waits 0
      Trx total rec lock wait time 0 SEC
      Trx total table lock wait time 0 SEC
      TABLE LOCK table `test`.`sbtest7` trx id 306500601 lock mode IX lock hold time 282 wait time before grant 0
      ---TRANSACTION 306500600, ACTIVE 282 sec inserting
      mysql tables in use 1, locked 1
      1 lock struct(s), heap size 360, 0 row lock(s)
      MySQL thread id 4, OS thread handle 0x7f10fdd5f700, query id 2320635 localhost root update
      INSERT INTO sbtest1 (id, k, c, pad) VALUES (0, 4991, '20573335662-30783472294-91392812538-64279544640-03952693010-49516267500-37153587475-29426794802-75715869054-36500529893', '64279858973-94356763024-24019976507-71917495604-46993976375')
      Trx #rec lock waits 0 #table lock waits 0
      Trx total rec lock wait time 0 SEC
      Trx total table lock wait time 0 SEC
      TABLE LOCK table `test`.`sbtest1` trx id 306500600 lock mode IX lock hold time 282 wait time before grant 0
      ---TRANSACTION 306500599, ACTIVE 282 sec inserting
      mysql tables in use 1, locked 1
      1 lock struct(s), heap size 360, 0 row lock(s)
      MySQL thread id 5, OS thread handle 0x7f10fdd16700, query id 2320634 localhost root update
      INSERT INTO sbtest2 (id, k, c, pad) VALUES (0, 5049, '14034900987-16912674470-19094580238-83057733456-69854491333-00792552128-12159215864-22282265446-84407740021-51783026756', '72866547564-65646294243-67915491544-80988475271-52586539055')
      Trx #rec lock waits 0 #table lock waits 0
      Trx total rec lock wait time 0 SEC
      Trx total table lock wait time 0 SEC
      TABLE LOCK table `test`.`sbtest2` trx id 306500599 lock mode IX lock hold time 282 wait time before grant 0

      MySQL error log at below:
      InnoDB: ###### Diagnostic info printed to the standard error stream
      InnoDB: Error: semaphore wait has lasted > 600 seconds
      InnoDB: We intentionally crash the server, because it appears to be hung.
      2015-05-04 19:22:23 7fbe627fd700 InnoDB: Assertion failure in thread 140455673059072 in file srv0srv.cc line 2196
      InnoDB: We intentionally generate a memory trap.
      InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
      InnoDB: If you get repeated assertion failures or crashes, even
      InnoDB: immediately after the mysqld startup, there may be
      InnoDB: corruption in the InnoDB tablespace. Please refer to
      InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
      InnoDB: about forcing recovery.
      150504 19:22:24 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.

      To report this bug, see http://kb.askmonty.org/en/reporting-bugs

      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.

      Server version: 10.0.17-MariaDB-log
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=18
      max_threads=4098
      thread_count=18
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 9133140 K bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.

      Thread pointer: 0x0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x0 thread_stack 0x48000
      /usr/sbin/mysqld(my_print_stacktrace+0x2b)[0xb743db]
      /usr/sbin/mysqld(handle_fatal_signal+0x398)[0x726a38]
      /lib64/libpthread.so.0[0x368960f710]
      /lib64/libc.so.6(gsignal+0x35)[0x3689232625]
      /lib64/libc.so.6(abort+0x175)[0x3689233e05]
      /usr/sbin/mysqld[0x9376fc]
      /lib64/libpthread.so.0[0x36896079d1]
      /lib64/libc.so.6(clone+0x6d)[0x36892e89dd]
      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.
      150504 19:22:24 mysqld_safe Number of processes running now: 0
      150504 19:22:24 mysqld_safe mysqld restarted

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            elenst Elena Stepanova added a comment -

            Hi Axel,

            Could you please check it out since you're running sysbench anyway?

            Thanks.

            Show
            elenst Elena Stepanova added a comment - Hi Axel, Could you please check it out since you're running sysbench anyway? Thanks.
            Hide
            axel Axel Schwenke added a comment -

            Hi,

            I could not reproduce this issue. I used our current mariaDB-10.0.21 and sysbench-0.5 like so:

            sysbench-0.5 --test=lua/oltp.lua --oltp-tables-count=8 --mysql-socket=/tmp/mysql.sock --mysql-user=axel --mysql-db=test prepare
            sysbench-0.5 --test=lua/insert.lua --num-threads=8 --max-requests=0 --mysql-socket=/tmp/mysql.sock --mysql-user=axel --mysql-db=test --max-time=0 --report-interval=1 --oltp-tables-count=8 run
            

            This was running for more than 10 minutes without showing any anomalies.

            Questions:

            1. could you reproduce this problem?
            2. what my.cnf did you use?
            you seem to use some kind of flash storage card
            3. what product is it? What driver is it using? What file system?

            The reason of the crash is a long semaphore wait. More specifically the semaphore is the one protecting the redo log. So it seems a write request to the redo log was stuck for >600 seconds before MariaDB crashed. This could well be just a symptom of a problem with either hardware or drivers.

            BR, Axel

            Show
            axel Axel Schwenke added a comment - Hi, I could not reproduce this issue. I used our current mariaDB-10.0.21 and sysbench-0.5 like so: sysbench-0.5 --test=lua/oltp.lua --oltp-tables-count=8 --mysql-socket=/tmp/mysql.sock --mysql-user=axel --mysql-db=test prepare sysbench-0.5 --test=lua/insert.lua --num-threads=8 --max-requests=0 --mysql-socket=/tmp/mysql.sock --mysql-user=axel --mysql-db=test --max-time=0 --report-interval=1 --oltp-tables-count=8 run This was running for more than 10 minutes without showing any anomalies. Questions: 1. could you reproduce this problem? 2. what my.cnf did you use? you seem to use some kind of flash storage card 3. what product is it? What driver is it using? What file system? The reason of the crash is a long semaphore wait. More specifically the semaphore is the one protecting the redo log. So it seems a write request to the redo log was stuck for >600 seconds before MariaDB crashed. This could well be just a symptom of a problem with either hardware or drivers. BR, Axel

              People

              • Assignee:
                axel Axel Schwenke
                Reporter:
                sysdljr yjj
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 35 minutes
                  35m