Details
-
Type:
Bug
-
Status: Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 5.5.37, 5.5.39
-
Fix Version/s: 5.5.43
-
Component/s: Replication, Storage Engine - InnoDB
-
Labels:
-
Environment:Linux 2.6.32-5-amd64 #1 SMP Fri May 10 08:43:19 UTC 2013 x86_64 GNU/Linux
> show global variables like '%version%';
+-------------------------+---------------------+
| Variable_name | Value |
+-------------------------+---------------------+
| innodb_version | 5.5.38-MariaDB-35.2 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 5.5.39-MariaDB-log |
| version_comment | MariaDB Server |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
Linux 2.6.32-5-amd64 #1 SMP Fri May 10 08:43:19 UTC 2013 x86_64 GNU/Linux > show global variables like '%version%'; +-------------------------+---------------------+ | Variable_name | Value | +-------------------------+---------------------+ | innodb_version | 5.5.38-MariaDB-35.2 | | protocol_version | 10 | | slave_type_conversions | | | version | 5.5.39-MariaDB-log | | version_comment | MariaDB Server | | version_compile_machine | x86_64 | | version_compile_os | Linux |
Description
Hello,
This problem I noticed after replacing MySQL 5.5.x with MariaDB 5.5.
It occured on both MariaDB 5.5.39, 5.5.37 as well as some older MariaDB 5.5 versions.
It is possible to render MariaDB instance unusable by putting it into some kind of internal "deadlock state":
How to reproduce:
1. Use master-slave setup with statement based replication and InnoDB as storage engine.
2. Wait until slave server replicates and executes a DELETE query replicated from master.
3. Issue "STOP SLAVE" command on the slave server before execution of the DELETE query is completed by the replication sql_thread.
Effect:
MariaDB on slave server will enter "deadlock state" (kind of):
- STOP SLAVE stalls on State: "Killing slave"
- DELETE query stalls on State: "Freeing items"
- CPU load generated by mysqld drops to nearly 0,
- IO load generated by mysqld drops to 0,
Last entry in mysq.err:
140909 3:05:02 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.034913' at position 53766638
Output head from "SHOW PROCESSLIST" (after "deadlock state" is in place for a few hours here, normally our DELETE queries take much less time to complete):
Id User Host db Command Time State Info Progress 24611 system user NULL Connect 89203 Waiting for master to send event NULL 0.000 24612 system user NULL Connect 17161 freeing items NULL 0.000 35399 [user1-namechanged] [ip-removed]:43814 NULL Query 15421 Killing slave stop slave 0.000 35400 [user2-namechanged] [ip-removed]:50631 NULL Query 15420 executing SHOW /*!50002 GLOBAL */ STATUS 0.000 35401 [user2-namechanged] [ip-removed]:51717 NULL Sleep 15396 NULL 0.000 35402 [user2-namechanged] [ip-removed]:51718 NULL Sleep 15371 NULL 0.000 35403 [user3-namechanged] [ip-removed]:52953 NULL Query 15347 executing show global status 0.000 35404 [user2-namechanged] [ip-removed]:51719 NULL Sleep 15346 NULL 0.000 35405 [user2-namechanged] [ip-removed]:51720 NULL Sleep 15321 NULL 0.000
(full processlist attached)
Output head from running Poor Man's Profiler on mysqld in the "deadlock state":
855 __lll_lock_wait,_L_lock_1172,pthread_mutex_lock,inline_mysql_mutex_lock,at,at,one_thread_per_connection_end,do_handle_one_connection,handle_one_connection,start_thread,clone,??
2 __lll_lock_wait,_L_lock_1172,pthread_mutex_lock,inline_mysql_mutex_lock,at,mysql_parse,dispatch_command,do_command,do_handle_one_connection,handle_one_connection,start_thread,clone,??
1 read,vio_read,vio_read_buff,my_real_read,my_net_read,cli_safe_read,handle_slave_io,start_thread,clone,??
1 pthread_cond_wait@@GLIBC_2.3.2,os_cond_wait,reset_sig_count=5),srv_purge_thread,start_thread,clone,??
1 pthread_cond_wait@@GLIBC_2.3.2,os_cond_wait,reset_sig_count=43),srv_master_thread,start_thread,clone,??
1 pthread_cond_timedwait@@GLIBC_2.3.2,os_cond_wait_timed,time_in_usec=<value,srv_monitor_thread,start_thread,clone,??
1 pthread_cond_timedwait@@GLIBC_2.3.2,os_cond_wait_timed,time_in_usec=<value,srv_LRU_dump_restore_thread,start_thread,clone,??
1 pthread_cond_timedwait@@GLIBC_2.3.2,os_cond_wait_timed,time_in_usec=<value,srv_lock_timeout_thread,start_thread,clone,??
1 pthread_cond_timedwait@@GLIBC_2.3.2,os_cond_wait_timed,time_in_usec=<value,srv_error_monitor_thread,start_thread,clone,??
1 pthread_cond_timedwait@@GLIBC_2.3.2,inline_mysql_cond_timedwait,term_lock=0x7f7a1cb26d48,,terminate_slave_threads,stop_slave,mysql_execute_command,mysql_parse,dispatch_command,do_command,do_handle_one_connection,handle_one_connection,start_thread,clone,??
1 pthread_cond_timedwait@@GLIBC_2.3.2,inline_mysql_cond_timedwait,sleep_time=<value,ma_checkpoint_background,start_thread,clone,??
1 poll,handle_connections_sockets,mysqld_main,__libc_start_main,_start
1 __lll_lock_wait,_L_lock_1172,pthread_mutex_lock,inline_mysql_mutex_lock,tables=0x7f8e36dd2338,,get_schema_tables_result,JOIN::exec,mysql_select,handle_select,execute_sqlcom_select,execute_show_status,at,mysql_parse,dispatch_command,do_command,do_handle_one_connection,handle_one_connection,start_thread,clone,??
1 __lll_lock_wait,_L_lock_1172,pthread_mutex_lock,inline_mysql_mutex_lock,tables=0x7f8e36cd5338,,get_schema_tables_result,JOIN::exec,mysql_select,handle_select,execute_sqlcom_select,execute_show_status,at,mysql_parse,dispatch_command,do_command,do_handle_one_connection,handle_one_connection,start_thread,clone,??
(full output attached)
Gliffy Diagrams
Attachments
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
Hi,
Could you please also attach your cnf files from the master and slave or output of SHOW GLOBAL VARIABLES if you so prefer?
Please also indicate with which binlog format the DELETE is written with, if you can.
If it happens on a particular table and/or with a particular DELETE query, the table structure and an idea of how many rows are being deleted could also help to reproduce the problem.
Thanks.