Details
-
Type:
Bug
-
Status: Closed
-
Priority:
Blocker
-
Resolution: Fixed
-
Affects Version/s: 5.5.40, 10.0.14
-
Component/s: Platform Power, Storage Engine - InnoDB, Storage Engine - XtraDB
-
Labels:
Description
This failure is seen in Buildbot. It is a race that occurs somewhat rarely,
but it can be consistently reproduced manually by running tests in a loop.
The failure can occur in different tests, for example like this:
http://buildbot.askmonty.org/buildbot/builders/bintar-trusty-p8-debug/builds/173
worker[2] mysql-test-run: WARNING: Timeout after mysql-test-run waited 180 seconds for the process [mysqld.2 - pid: 31323, winpid: 31323] to create a pid file.
rpl.rpl_incident 'stmt' w2 [ fail ]
Test ended at 2014-10-31 12:04:27
CURRENT_TEST: rpl.rpl_incident
Failed to start mysqld.2
mysqltest failed but provided no output
The result from queries just before the failure was:
< snip >
call mtr.add_suppression("Slave SQL.*The incident LOST_EVENTS occured on the master.* 1590");
include/wait_for_slave_sql_error.inc [errno=1590]
Last_SQL_Error = 'The incident LOST_EVENTS occured on the master. Message: <none>'
**** On Slave ****
SELECT * FROM t1;
a
1
2
3
SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
START SLAVE;
SELECT * FROM t1;
a
1
2
3
4
include/check_slave_is_running.inc
DROP TABLE t1;
include/rpl_end.inc
More failures are seen in a cross-reference search:
It looks like it only fails in debug builds on Power8, but this is not 100%
confirmed.
Looking into the error log, we consistently see that it hangs during creation
of the Doublewrite buffer:
InnoDB: Doublewrite buffer not found: creating new
Normally there should be a message after this that the doublewrite buffer was
created successfully, but when the error occurs, instead the server hangs at
this point.
Here is a GDB stacktrace from when it hangs:
Thread 1 (Thread 0x3fff82fb5d90 (LWP 74233)):
#0 0x00003fff82f22190 in __pthread_cond_wait (cond=0x10029bf3150, mutex=0x10029bf30c8) at pthread_cond_wait.c:187
#1 0x0000000010b4f0ac in safe_cond_wait (cond=0x10029bf3150, mp=0x10029bf30a0, file=0x10d1a638 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/os/os0sync.c", line=207)
at /home/skysql/bzr/knielsen/5.5/mysys/thr_mutex.c:493
#2 0x0000000010a7eae4 in os_cond_wait (cond=0x10029bf3150, mutex=0x10029bf30a0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/os/os0sync.c:207
#3 0x0000000010a7f6d8 in os_event_wait_low (event=0x10029bf30a0, reset_sig_count=2) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/os/os0sync.c:609
#4 0x00000000108f544c in sync_array_wait_event (arr=0x100292584d0, index=0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/sync/sync0arr.c:459
#5 0x00000000108f9a20 in mutex_spin_wait (mutex=0x10029bf2ed8, file_name=0x10d01150 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c", line=985)
at /home/skysql/bzr/knielsen/5.5/storage/xtradb/sync/sync0sync.c:636
#6 0x00000000109e48d4 in mutex_enter_func (mutex=0x10029bf2ed8, file_name=0x10d01150 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c", line=985)
at /home/skysql/bzr/knielsen/5.5/storage/xtradb/include/sync0sync.ic:222
#7 0x00000000109e4a18 in pfs_mutex_enter_func (mutex=0x10029bf2ed8, file_name=0x10d01150 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c", line=985)
at /home/skysql/bzr/knielsen/5.5/storage/xtradb/include/sync0sync.ic:251
#8 0x00000000109ee980 in fil_mutex_enter_and_prepare_for_io (space_id=0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c:985
#9 0x00000000109f97c8 in _fil_io (type=11, sync=0, space_id=0, zip_size=0, block_offset=157, byte_offset=0, len=16384, buf=0x3fff7b538000, message=0x3fff7af542c0, trx=0x0)
at /home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c:5446
#10 0x00000000109933fc in buf_flush_write_block_low (bpage=0x3fff7af542c0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1192
#11 0x0000000010993c64 in buf_flush_page (buf_pool=0x10029c07798, bpage=0x3fff7af542c0, flush_type=BUF_FLUSH_LIST) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1400
#12 0x0000000010994010 in buf_flush_try_neighbors (space=0, offset=157, flush_type=BUF_FLUSH_LIST, n_flushed=48, n_to_flush=18446744073709551614)
at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1541
#13 0x00000000109944c8 in buf_flush_page_and_try_neighbors (bpage=0x3fff7af542c0, flush_type=BUF_FLUSH_LIST, n_to_flush=18446744073709551614, count=0x3fffd3831ed0)
at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1638
#14 0x00000000109949dc in buf_flush_flush_list_batch (buf_pool=0x10029c07798, min_n=18446744073709551614, lsn_limit=18446744073709551615) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1768
#15 0x0000000010994b48 in buf_flush_batch (buf_pool=0x10029c07798, flush_type=BUF_FLUSH_LIST, min_n=18446744073709551614, lsn_limit=18446744073709551615)
at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1848
#16 0x00000000109952a8 in buf_flush_list (min_n=18446744073709551614, lsn_limit=18446744073709551615) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:2065
#17 0x0000000010a53a1c in log_preflush_pool_modified_pages (new_oldest=18446744073709551615, sync=1) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/log/log0log.c:1874
#18 0x0000000010a548c8 in log_make_checkpoint_at (lsn=18446744073709551615, write_always=1) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/log/log0log.c:2313
#19 0x000000001091f0e8 in trx_sys_create_doublewrite_buf () at /home/skysql/bzr/knielsen/5.5/storage/xtradb/trx/trx0sys.c:407
#20 0x00000000108f2fbc in innobase_start_or_create_for_mysql () at /home/skysql/bzr/knielsen/5.5/storage/xtradb/srv/srv0start.c:2080
#21 0x0000000010867b90 in innobase_init (p=0x100292581c0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/handler/ha_innodb.cc:3225
#22 0x000000001047364c in ha_initialize_handlerton (plugin=0x100291a3d68) at /home/skysql/bzr/knielsen/5.5/sql/handler.cc:472
#23 0x0000000010255b68 in plugin_initialize (tmp_root=0x3fffd3833690, plugin=0x100291a3d68, argc=0x11126e70 <remaining_argc>, argv=0x10029128880, options_only=false)
at /home/skysql/bzr/knielsen/5.5/sql/sql_plugin.cc:1372
#24 0x0000000010256718 in plugin_init (argc=0x11126e70 <remaining_argc>, argv=0x10029128880, flags=0) at /home/skysql/bzr/knielsen/5.5/sql/sql_plugin.cc:1613
#25 0x000000001015a080 in init_server_components () at /home/skysql/bzr/knielsen/5.5/sql/mysqld.cc:4337
#26 0x000000001015b808 in mysqld_main (argc=100, argv=0x10029128880) at /home/skysql/bzr/knielsen/5.5/sql/mysqld.cc:4934
#27 0x00000000101509d8 in main (argc=15, argv=0x3fffd3833f98) at /home/skysql/bzr/knielsen/5.5/sql/main.cc:25
Here is how I reproduced it, running the test in a loop and waiting for it to
hang:
bzr branch lp:maria/5.5 cd 5.5 export PATH=/opt/at7.0/bin:$PATH EXTRA_FLAGS="-O2 -Wuninitialized -DFORCE_INIT_OF_VARS" EXTRA_CONFIGS="--with-embedded-privilege-control" AM_EXTRA_MAKEFLAGS="VERBOSE=1" cmake . -DCMAKE_BUILD_TYPE=Debug -DMYSQL_MAINTAINER_MODE=ON time make -j4 package (cd mysql-test && for i in `seq 40000` ; do echo XXX $i XXX ; time ./mtr --parallel=4 rpl.last_insert_id rpl.rpl_row_blob_innodb --verbose-restart || exit 1 ; done)
Gliffy Diagrams
Attachments
Issue Links
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
The problem is apparently that a waiting thread is not woken up when an InnoDB mutes is released:
The wakeup condition is not true...
(gdb) p *mutex $8 = {event = 0x10029bf30a0, lock_word = 0 '\000', waiters = 1, list = {prev = 0x10029bf2f70, next = 0x11953928 <srv_misc_tmpfile_mutex>}, cfile_name = 0x10d01150 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c", cline = 1673, thread_id = 18446744073709551615, magic_n = 979585, count_os_wait = 1, count_using = 1611, count_spin_loop = 6, count_spin_rounds = 175, count_os_yield = 3, lspent_time = 0, lmax_spent_time = 0, mutex_type = 0, cmutex_name = 0x10d02228 "&fil_system->mutex", pfs_psi = 0x3fff81ff2330} (gdb) p mutex->lock_word $9 = 0 '\000'But the mutex is not locked (lock_word == 0).
Looks like some issue with the InnoDB mutex implementation...