Race in InnoDB/XtraDB mutex implementation can stall or hang the server

Description

This failure is seen in Buildbot. It is a race that occurs somewhat rarely,
but it can be consistently reproduced manually by running tests in a loop.

The failure can occur in different tests, for example like this:

http://buildbot.askmonty.org/buildbot/builders/bintar-trusty-p8-debug/builds/173

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 worker[2] mysql-test-run: WARNING: Timeout after mysql-test-run waited 180 seconds for the process [mysqld.2 - pid: 31323, winpid: 31323] to create a pid file. rpl.rpl_incident 'stmt' w2 [ fail ] Test ended at 2014-10-31 12:04:27 CURRENT_TEST: rpl.rpl_incident Failed to start mysqld.2 mysqltest failed but provided no output The result from queries just before the failure was: < snip > call mtr.add_suppression("Slave SQL.*The incident LOST_EVENTS occured on the master.* 1590"); include/wait_for_slave_sql_error.inc [errno=1590] Last_SQL_Error = 'The incident LOST_EVENTS occured on the master. Message: <none>' **** On Slave **** SELECT * FROM t1; a 1 2 3 SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE; SELECT * FROM t1; a 1 2 3 4 include/check_slave_is_running.inc DROP TABLE t1; include/rpl_end.inc

More failures are seen in a cross-reference search:

http://buildbot.askmonty.org/buildbot/reports/cross_reference#branch=&revision=&platform=-p8&dt=&bbnum=&typ=&info=&fail_name=&fail_variant=&fail_info_short=&fail_info_full=Failed+to+start+mysqld&limit=100

It looks like it only fails in debug builds on Power8, but this is not 100%
confirmed.

Looking into the error log, we consistently see that it hangs during creation
of the Doublewrite buffer:

1 InnoDB: Doublewrite buffer not found: creating new

Normally there should be a message after this that the doublewrite buffer was
created successfully, but when the error occurs, instead the server hangs at
this point.

Here is a GDB stacktrace from when it hangs:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 Thread 1 (Thread 0x3fff82fb5d90 (LWP 74233)): #0 0x00003fff82f22190 in __pthread_cond_wait (cond=0x10029bf3150, mutex=0x10029bf30c8) at pthread_cond_wait.c:187 #1 0x0000000010b4f0ac in safe_cond_wait (cond=0x10029bf3150, mp=0x10029bf30a0, file=0x10d1a638 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/os/os0sync.c", line=207) at /home/skysql/bzr/knielsen/5.5/mysys/thr_mutex.c:493 #2 0x0000000010a7eae4 in os_cond_wait (cond=0x10029bf3150, mutex=0x10029bf30a0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/os/os0sync.c:207 #3 0x0000000010a7f6d8 in os_event_wait_low (event=0x10029bf30a0, reset_sig_count=2) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/os/os0sync.c:609 #4 0x00000000108f544c in sync_array_wait_event (arr=0x100292584d0, index=0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/sync/sync0arr.c:459 #5 0x00000000108f9a20 in mutex_spin_wait (mutex=0x10029bf2ed8, file_name=0x10d01150 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c", line=985) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/sync/sync0sync.c:636 #6 0x00000000109e48d4 in mutex_enter_func (mutex=0x10029bf2ed8, file_name=0x10d01150 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c", line=985) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/include/sync0sync.ic:222 #7 0x00000000109e4a18 in pfs_mutex_enter_func (mutex=0x10029bf2ed8, file_name=0x10d01150 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c", line=985) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/include/sync0sync.ic:251 #8 0x00000000109ee980 in fil_mutex_enter_and_prepare_for_io (space_id=0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c:985 #9 0x00000000109f97c8 in _fil_io (type=11, sync=0, space_id=0, zip_size=0, block_offset=157, byte_offset=0, len=16384, buf=0x3fff7b538000, message=0x3fff7af542c0, trx=0x0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c:5446 #10 0x00000000109933fc in buf_flush_write_block_low (bpage=0x3fff7af542c0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1192 #11 0x0000000010993c64 in buf_flush_page (buf_pool=0x10029c07798, bpage=0x3fff7af542c0, flush_type=BUF_FLUSH_LIST) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1400 #12 0x0000000010994010 in buf_flush_try_neighbors (space=0, offset=157, flush_type=BUF_FLUSH_LIST, n_flushed=48, n_to_flush=18446744073709551614) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1541 #13 0x00000000109944c8 in buf_flush_page_and_try_neighbors (bpage=0x3fff7af542c0, flush_type=BUF_FLUSH_LIST, n_to_flush=18446744073709551614, count=0x3fffd3831ed0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1638 #14 0x00000000109949dc in buf_flush_flush_list_batch (buf_pool=0x10029c07798, min_n=18446744073709551614, lsn_limit=18446744073709551615) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1768 #15 0x0000000010994b48 in buf_flush_batch (buf_pool=0x10029c07798, flush_type=BUF_FLUSH_LIST, min_n=18446744073709551614, lsn_limit=18446744073709551615) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1848 #16 0x00000000109952a8 in buf_flush_list (min_n=18446744073709551614, lsn_limit=18446744073709551615) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:2065 #17 0x0000000010a53a1c in log_preflush_pool_modified_pages (new_oldest=18446744073709551615, sync=1) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/log/log0log.c:1874 #18 0x0000000010a548c8 in log_make_checkpoint_at (lsn=18446744073709551615, write_always=1) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/log/log0log.c:2313 #19 0x000000001091f0e8 in trx_sys_create_doublewrite_buf () at /home/skysql/bzr/knielsen/5.5/storage/xtradb/trx/trx0sys.c:407 #20 0x00000000108f2fbc in innobase_start_or_create_for_mysql () at /home/skysql/bzr/knielsen/5.5/storage/xtradb/srv/srv0start.c:2080 #21 0x0000000010867b90 in innobase_init (p=0x100292581c0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/handler/ha_innodb.cc:3225 #22 0x000000001047364c in ha_initialize_handlerton (plugin=0x100291a3d68) at /home/skysql/bzr/knielsen/5.5/sql/handler.cc:472 #23 0x0000000010255b68 in plugin_initialize (tmp_root=0x3fffd3833690, plugin=0x100291a3d68, argc=0x11126e70 <remaining_argc>, argv=0x10029128880, options_only=false) at /home/skysql/bzr/knielsen/5.5/sql/sql_plugin.cc:1372 #24 0x0000000010256718 in plugin_init (argc=0x11126e70 <remaining_argc>, argv=0x10029128880, flags=0) at /home/skysql/bzr/knielsen/5.5/sql/sql_plugin.cc:1613 #25 0x000000001015a080 in init_server_components () at /home/skysql/bzr/knielsen/5.5/sql/mysqld.cc:4337 #26 0x000000001015b808 in mysqld_main (argc=100, argv=0x10029128880) at /home/skysql/bzr/knielsen/5.5/sql/mysqld.cc:4934 #27 0x00000000101509d8 in main (argc=15, argv=0x3fffd3833f98) at /home/skysql/bzr/knielsen/5.5/sql/main.cc:25

Here is how I reproduced it, running the test in a loop and waiting for it to
hang:

1 2 3 4 5 6 bzr branch lp:maria/5.5 cd 5.5 export PATH=/opt/at7.0/bin:$PATH EXTRA_FLAGS="-O2 -Wuninitialized -DFORCE_INIT_OF_VARS" EXTRA_CONFIGS="--with-embedded-privilege-control" AM_EXTRA_MAKEFLAGS="VERBOSE=1" cmake . -DCMAKE_BUILD_TYPE=Debug -DMYSQL_MAINTAINER_MODE=ON time make -j4 package (cd mysql-test && for i in `seq 40000` ; do echo XXX $i XXX ; time ./mtr --parallel=4 rpl.last_insert_id rpl.rpl_row_blob_innodb --verbose-restart || exit 1 ; done)

Environment

None

Status

Assignee

Sergey Vojtovich

Reporter

Kristian Nielsen

Labels

External issue ID

None

External issue ID

None

Fix versions

Affects versions

5.5.40
10.0.14

Priority

Blocker
Configure