Details
-
Type:
Bug
-
Status: Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: None
-
Labels:None
Description
show_explain.test fails, times out or crashes. The issue is not easily repeatable. It shows up in buildbot every once in a while.
Gliffy Diagrams
Attachments
- mdev-3892-try1.diff
- 0.9 kB
Issue Links
- relates to
-
MDEV-4393 show_explain.test times out randomly
-
- Closed
-
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
... The above output is followed by:
Server [mysqld.1 - pid: 30135, winpid: 30135, exit: 256] failed during test run
Server log from this test:
---------SERVER LOG START----------
InnoDB: DEBUG: update_statistics for test/t1.
InnoDB: DEBUG: update_statistics for test/t1.
...
InnoDB: DEBUG: update_statistics for test/t1.
==30136== Thread 9:
==30136== Invalid read of size 8
==30136== at 0x6DB0D2: sql_alloc(unsigned long) (thr_malloc.cc:74)
==30136== by 0x56F5F7: Item::operator new(unsigned long) (item.h:572)
==30136== by 0x655ECB: JOIN::print_explain(select_result_sink*, unsigned char, bool, bool, bool, bool, char const*) (sql_select.cc:21692)
==30136== by 0x5FA78E: st_select_lex::print_explain(select_result_sink*, unsigned char, bool*) (sql_lex.cc:4199)
==30136== by 0x5FA99D: st_select_lex_unit::print_explain(select_result_sink*, unsigned char, bool*) (sql_lex.cc:4261)
==30136== by 0x68C303: Show_explain_request::call_in_target_thread() (sql_show.cc:2336)
==30136== by 0x75E364: Apc_target::process_apc_requests() (my_apc.cc:260)
==30136== by 0x5CFE3F: thd_killed (sql_class.cc:3869)
==30136== by 0xB022AE: trx_is_interrupted (ha_innodb.cc:2385)
==30136== by 0xB4C93C: srv_lock_timeout_thread (srv0srv.c:2718)
==30136== by 0x4E3506F: start_thread (in /lib64/libpthread-2.9.so)
==30136== by 0x62F713C: clone (in /lib64/libc-2.9.so)
==30136== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==30136==
121224 13:41:05 [ERROR] mysqld got signal 11 ;
It seems, it is trying to get EXPLAIN for srv_lock_timeout_thread... And that fails, because that thread doesn't have a THD.
Need to find out, how did SHOW EXPLAIN request end up being delivered to thread that runs srv_lock_timeout_thread().
I've figured it out.
Consider this example:
Put a breakpoint on trx_is_interrupted() calls in srv_lock_timeout_thread().
Then run:
client1> connect
client2> connect
client1> set autocommit=0;
client1> select * from t1 where pk between 10 and 20 for update;
client2> set autocommit=0;
client2> select * from t1 where pk between 10 and 20 for update;
The server will stop in a state where its 'thread apply all where' will show
output like below.
Things to note:
- Thread 15 runs Apc_target::have_apc_requests for Apc_target belonging to THD=0x9e3ecad0.
- THD=0x9e3ecad0 actually belongs to Thread 24 (which is currently running a
select command and so has EXPLAIN enabled)
SHOW EXPLAIN system doesn't really support.
== thread apply all where ==
Thread 15 (Thread 0x97a38b90 (LWP 8000)):
#0 Apc_target::have_apc_requests (this=0x9e3ee18c) at /home/psergey/dev2/10.0-base/sql/my_apc.h:66
#1 0x081f201f in thd_killed (thd=0x9e3ecad0) at /home/psergey/dev2/10.0-base/sql/sql_class.cc:3869
#2 0x086eeca6 in trx_is_interrupted (trx=0x977c120) at /home/psergey/dev2/10.0-base/storage/xtradb/handler/ha_innodb.cc:2385
#3 0x0873bb21 in srv_lock_timeout_thread (arg=0x0) at /home/psergey/dev2/10.0-base/storage/xtradb/srv/srv0srv.c:2718
#4 0xb7fb44ff in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#5 0xb7db149e in clone () from /lib/tls/i686/cmov/libc.so.6
...
Thread 24 (Thread 0x98745b90 (LWP 8063)):
#0 0xb7fe1430 in __kernel_vsyscall ()
#1 0xb7fb80e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/i686/cmov/libpthread.so.0
#2 0x0890d76a in safe_cond_wait (cond=0xa71d351c, mp=0xa71d34b0, file=0x8a905ac "/home/psergey/dev2/10.0-base/storage/xtradb/os/os0sync.c", line=207) at /home/psergey/dev2/10.0-base/mysys/thr_mutex.c:493
#3 0x08812321 in os_cond_wait (cond=0xa71d351c, mutex=0xa71d34b0) at /home/psergey/dev2/10.0-base/storage/xtradb/os/os0sync.c:207
#4 0x08812be1 in os_event_wait_low (event=0xa71d34b0, reset_sig_count=1) at /home/psergey/dev2/10.0-base/storage/xtradb/os/os0sync.c:609
#5 0x08739dc0 in srv_suspend_mysql_thread (thr=0x9779f90) at /home/psergey/dev2/10.0-base/storage/xtradb/srv/srv0srv.c:1854
#6 0x0871fa3d in row_mysql_handle_errors (new_err=0x987436c8, trx=0x977c120, thr=0x9779f90, savept=0x0) at /home/psergey/dev2/10.0-base/storage/xtradb/row/row0mysql.c:599
#7 0x0873140d in row_search_for_mysql (buf=0x9778f50 "\377", mode=2, prebuilt=0x9779be0, match_mode=0, direction=0) at /home/psergey/dev2/10.0-base/storage/xtradb/row/row0sel.c:4941
#8 0x086fe819 in ha_innobase::index_read (this=0x97771f8, buf=0x9778f50 "\377", key_ptr=0x977d988 "\n", key_len=4, find_flag=HA_READ_KEY_OR_NEXT) at /home/psergey/dev2/10.0-base/storage/xtradb/handler/ha_innodb.cc:7036
#9 0x08402b0b in handler::index_read_map (this=0x97771f8, buf=0x9778f50 "\377", key=0x977d988 "\n", keypart_map=1, find_flag=HA_READ_KEY_OR_NEXT) at /home/psergey/dev2/10.0-base/sql/handler.h:2149
#10 0x081c4956 in handler::ha_index_read_map (this=0x97771f8, buf=0x9778f50 "\377", key=0x977d988 "\n", keypart_map=1, find_flag=HA_READ_KEY_OR_NEXT) at /home/psergey/dev2/10.0-base/sql/sql_class.h:4295
#11 0x083facb1 in handler::read_range_first (this=0x97771f8, start_key=0x977729c, end_key=0x97772ac, eq_range_arg=false, sorted=true) at /home/psergey/dev2/10.0-base/sql/handler.cc:4608
#12 0x083751b7 in handler::multi_range_read_next (this=0x97771f8, range_info=0x98743dd0) at /home/psergey/dev2/10.0-base/sql/multi_range_read.cc:296
#13 0x083752ee in Mrr_simple_index_reader::get_next (this=0x9778e64, range_info=0x98743dd0) at /home/psergey/dev2/10.0-base/sql/multi_range_read.cc:328
#14 0x08374d13 in DsMrr_impl::dsmrr_next (this=0x9778dc0, range_info=0x98743dd0) at /home/psergey/dev2/10.0-base/sql/multi_range_read.cc:1391
#15 0x086eee5d in ha_innobase::multi_range_read_next (this=0x97771f8, range_info=0x98743dd0) at /home/psergey/dev2/10.0-base/storage/xtradb/handler/ha_innodb.cc:13647
#16 0x0850d05e in QUICK_RANGE_SELECT::get_next (this=0x977b598) at /home/psergey/dev2/10.0-base/sql/opt_range.cc:11030
#17 0x08530fb8 in rr_quick (info=0x9772df0) at /home/psergey/dev2/10.0-base/sql/records.cc:339
#18 0x0829226f in join_init_read_record (tab=0x9772d98) at /home/psergey/dev2/10.0-base/sql/sql_select.cc:17162
#19 0x082924ed in sub_select (join=0x9772088, join_tab=0x9772d98, end_of_records=false) at /home/psergey/dev2/10.0-base/sql/sql_select.cc:16281
#20 0x08292e7a in do_select (join=0x9772088, fields=0x9e3ee758, table=0x0, procedure=0x0) at /home/psergey/dev2/10.0-base/sql/sql_select.cc:15954
#21 0x082965c8 in JOIN::exec_inner (this=0x9772088) at /home/psergey/dev2/10.0-base/sql/sql_select.cc:2966
#22 0x08293182 in JOIN::exec (this=0x9772088) at /home/psergey/dev2/10.0-base/sql/sql_select.cc:2258
#23 0x082935c9 in mysql_select (thd=0x9e3ecad0, rref_pointer_array=0x9e3ee804, tables=0x9771948, wild_num=1, fields=..., conds=0x9771f08, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2148272640, result=0x9772078, unit=0x9e3ee210, select_lex=0x9e3ee6c0) at /home/psergey/dev2/10.0-base/sql/sql_select.cc:3194
#24 0x082969a8 in handle_select (thd=0x9e3ecad0, lex=0x9e3ee1ac, result=0x9772078, setup_tables_done_option=0) at /home/psergey/dev2/10.0-base/sql/sql_select.cc:368
#25 0x0822d451 in execute_sqlcom_select (thd=0x9e3ecad0, all_tables=0x9771948) at /home/psergey/dev2/10.0-base/sql/sql_parse.cc:4759
#26 0x0822f84c in mysql_execute_command (thd=0x9e3ecad0) at /home/psergey/dev2/10.0-base/sql/sql_parse.cc:2246
#27 0x08237b80 in mysql_parse (thd=0x9e3ecad0, rawbuf=0x97717b0 "select * from t1 where pk between 10 and 20 for update", length=54, parser_state=0x98744d70) at /home/psergey/dev2/10.0-base/sql/sql_parse.cc:5874
#28 0x08238833 in dispatch_command (command=COM_QUERY, thd=0x9e3ecad0, packet=0x9e3f0209 "", packet_length=54) at /home/psergey/dev2/10.0-base/sql/sql_parse.cc:1072
#29 0x08239fda in do_command (thd=0x9e3ecad0) at /home/psergey/dev2/10.0-base/sql/sql_parse.cc:811
#30 0x083388b8 in do_handle_one_connection (thd_arg=0x9e3ecad0) at /home/psergey/dev2/10.0-base/sql/sql_connect.cc:1253
#31 0x083389b6 in handle_one_connection (arg=0x9e3ecad0) at /home/psergey/dev2/10.0-base/sql/sql_connect.cc:1168
#32 0xb7fb44ff in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#33 0xb7db149e in clone () from /lib/tls/i686/cmov/libc.so.6
Experimental fix.
Comment correction:
> Things to note:
> - Thread 15 runs Apc_target::have_apc_requests for Apc_target belonging to THD=0x9e3ecad0.
> - THD=0x9e3ecad0 actually belongs to Thread 24 (which is currently running a
> select command and so has EXPLAIN enabled)
> SHOW EXPLAIN system doesn't really support.
^^ I've meant:
SHOW EXPLAIN assumes that apc_target->process_apc_requests() is called by the target SQL thread. That thread is expected to own the target THD, memory arena, and thread-local variables.
Instead, InnoDB calls thd_killed() in some internal InnoDB thread. That thread has current_thd == NULL, for instance. The patch attached to this issue adds a check that current_thd!=NULL, and prevents process_apc_requests() from being called out of internal InnoDB threads.
What I would like to discuss
- Is this solution complete? Perhaps there are other scenarios where thd_killed() is called from something that's not an SQL thread.
- If we block process_apc_requests() in certain cases, does it mean SHOW EXPLAIN will not be instant for certain kinds of queries?
It seems to be fixed in 10.0 tree.
I must've done it during 10.0-base merge and forgot about it.
Sorry about it.
now, the answers:
- perhaps. there's no way to know. the engine can call thd_killed() in
any context, the server does not control that.
- it's never instant anyway. but yes the response time will be larger in
these cases.
For example, on work.askmonty.org:
./mysql-test-run --valgrind --repeat=10 t/show_explain.test
main.show_explain 'xtradb' [ pass ] 16629
main.show_explain 'xtradb' [ pass ] 17134
main.show_explain 'xtradb' [ fail ]
Test ended at 2012-12-24 13:41:08
CURRENT_TEST: main.show_explain
mysqltest: At line 857: query 'reap' failed with wrong errno 2013: 'Lost connection to MySQL server during query', instead of 1317...
The result from queries just before the failure was:
< snip >
insert into t1 select A.a + 10 * B.a + 100 * C.a, 'data1' from t0 A, t0 B, t0 C;
set autocommit=0;
select * from t1 where pk between 10 and 20 for update;
pk data
10 data1
11 data1
12 data1
13 data1
14 data1
15 data1
16 data1
17 data1
18 data1
19 data1
20 data1
set autocommit=0;
select * from t1 where pk between 10 and 20 for update;
kill query $thr_default;