Details
-
Type:
Bug
-
Status: Closed
-
Priority:
Major
-
Resolution: Incomplete
-
Affects Version/s: 10.0.12
-
Fix Version/s: N/A
-
Component/s: Replication
-
Labels:
-
Environment:redhat 6.5
Description
When execute show slave status, It seems every thing is OK, but actually slave's sql thread was stuck! The Exec_Master_Log_Pos(mysql-bin.000160 2140880879) didn't change for several days.
MariaDB [(none)]> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.113
Master_User: repl
Master_Port: 5518
Connect_Retry: 60
Master_Log_File: mysql-bin.000282
Read_Master_Log_Pos: 531076432
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 691
Relay_Master_Log_File: mysql-bin.000160
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 2140880879
Relay_Log_Space: 124435203348
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 283562
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1895939662
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: Current_Pos
Gtid_IO_Pos: 0-1895939662-13106380
1 row in set (0.00 sec)
I use parallel replication, and set SLAVE_PARALLEL_THREADS=16, I found one work thread is always 'Waiting for prior transaction to start commit before starting next transaction', other 15 work threads is always 'Waiting for work from SQL thread',and show engine innodb status, there's no locks.
Then check error log, find a restart in log 'mysql-bin.000160' at position 2140880879 :
150606 10:32:34 [Note] /home/mariadb/bin/mysqld: Normal shutdown recording 60385,1 86% 150605 18:42:13 [Warning] IP address '192.168.1.117' could not be resolved: Temporary failure in name resolution 150606 10:32:34 [Note] /home/mariadb/bin/mysqld: Normal shutdown 150606 10:32:34 [Note] Event Scheduler: Purging the queue. 0 events 150606 10:36:51 [Note] Slave: received end packet from server, apparent master shutdown: 150606 10:36:51 [Note] Slave I/O thread killed while reading event 150606 10:36:51 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000260', position 353423150; GTID position 0-1895939662-1585424 150606 10:44:24 [Note] InnoDB: Using mutexes to ref count buffer pool pages 150606 10:44:24 [Note] InnoDB: The InnoDB memory heap is disabled 150606 10:44:24 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 150606 10:44:24 [Note] InnoDB: Compressed tables use zlib 1.2.3 150606 10:44:24 [Note] InnoDB: Using Linux native AIO 150606 10:44:24 [Note] InnoDB: Using CPU crc32 instructions 150606 10:44:24 [Note] InnoDB: Initializing buffer pool, size = 78.1G 150606 10:44:27 [Note] InnoDB: Completed initialization of buffer pool 150606 10:44:28 [Note] InnoDB: Highest supported file format is Barracuda. 150606 10:44:28 [Note] InnoDB: The log sequence numbers 70561972035 and 70561972035 in ibdata files do not match the log sequence number 121831407098 in the ib_logfiles! 150606 10:44:28 [Note] InnoDB: Database was not shutdown normally! 150606 10:44:28 [Note] InnoDB: Starting crash recovery. 150606 10:44:28 [Note] InnoDB: Reading tablespace information from the .ibd files... 150606 10:44:28 [Note] InnoDB: Restoring possible half-written data pages 150606 10:44:28 [Note] InnoDB: from the doublewrite buffer... InnoDB: 1 transaction(s) which must be rolled back or cleaned up InnoDB: in total 4554 row operations to undo InnoDB: Trx id counter is 28084736 InnoDB: Last MySQL binlog file position 0 708607516, file name ../binlog/mysql-bin.000179 150606 10:44:29 [Note] InnoDB: 128 rollback segment(s) are active. InnoDB: Starting in background the rollback of uncommitted transactions 2015-06-06 10:44:29 7f1c2d7ff700 InnoDB: Rolling back trx with id 28084351, 4554 rows to undo InnoDB: Progress in percents: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100150606 10:44:30 [Note] InnoDB: Rollback of trx with id 28084351 completed 2015-06-06 10:44:30 7f1c2d7ff700 InnoDB: Rollback of non-prepared transactions completed 150606 10:44:34 [Note] InnoDB: Waiting for purge to start 150606 10:44:34 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.17-65.0 started; log sequence number 121831407098 150606 10:44:34 [Note] Semi-sync replication initialized for transactions. 150606 10:44:34 [Note] Semi-sync replication enabled on the master. 150606 10:44:34 server_audit: MariaDB Audit Plugin version 1.1.7 STARTED. 150606 10:44:34 [Note] Recovering after a crash using ../binlog/mysql-bin 150606 10:44:34 [Note] Starting crash recovery... 150606 10:44:34 [Note] Crash recovery finished. 150606 10:44:34 [Note] Server socket created on IP: '192.168.1.114'. 150606 10:44:34 [Note] Event Scheduler: Loaded 0 events 150606 10:44:46 [Note] /home/mariadb/bin/mysqld: Normal shutdown 150606 10:44:50 [Note] /home/mariadb/bin/mysqld: ready for connections. Version: '10.0.12-MariaDB-log' socket: '/home/mariadb/bin/mysql1.sock' port: 5518 Source distribution 150606 10:44:50 [Note] Event Scheduler: Purging the queue. 0 events 150606 10:44:50 [Note] Slave I/O thread: Start semi-sync replication to master 'repl@192.168.1.113:5518' in log 'mysql-bin.000160' at position 2140880879 150606 10:44:50 [Note] Slave I/O thread killed while connecting to master 150606 10:44:50 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000160', position 2140880879; GTID position 0-1895939662-1128069 150606 10:44:50 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000160' at position 2140880879, relay log '../relaylog/relay-bin.000001' position: 4; GTID position '0-1895939662-1128069' 150606 10:44:50 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.000160' at position 2140880879; GTID position '0-1895939662-1128069' 150606 10:44:50 server_audit: STOPPED 150606 10:44:50 [Note] unregister_replicator OK 150606 10:44:50 [Note] InnoDB: FTS optimize thread exiting. 150606 10:44:50 [Note] InnoDB: Starting shutdown... 150606 10:44:55 [Note] InnoDB: Shutdown completed; log sequence number 121833466167 150606 10:44:55 [Note] /home/mariadb/bin/mysqld: Shutdown complete 150606 10:45:12 [Note] InnoDB: Using mutexes to ref count buffer pool pages 150606 10:45:12 [Note] InnoDB: The InnoDB memory heap is disabled 150606 10:45:12 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 150606 10:45:12 [Note] InnoDB: Compressed tables use zlib 1.2.3 150606 10:45:12 [Note] InnoDB: Using Linux native AIO 150606 10:45:12 [Note] InnoDB: Using CPU crc32 instructions 150606 10:45:12 [Note] InnoDB: Initializing buffer pool, size = 78.1G 150606 10:45:15 [Note] InnoDB: Completed initialization of buffer pool 150606 10:45:15 [Note] InnoDB: Highest supported file format is Barracuda. 150606 10:45:16 [Note] InnoDB: 128 rollback segment(s) are active. 150606 10:45:16 [Note] InnoDB: Waiting for purge to start 150606 10:45:16 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.17-65.0 started; log sequence number 121833466167 150606 10:45:16 [Note] Semi-sync replication initialized for transactions. 150606 10:45:16 [Note] Semi-sync replication enabled on the master. 150606 10:45:16 server_audit: MariaDB Audit Plugin version 1.1.7 STARTED. 150606 10:45:16 [Note] Server socket created on IP: '192.168.1.114'. 150606 10:45:16 [Note] Event Scheduler: Loaded 0 events 150606 10:45:16 [Note] /home/mariadb/bin/mysqld: ready for connections. Version: '10.0.12-MariaDB-log' socket: '/home/mariadb/bin/mysql1.sock' port: 5518 Source distribution 150606 10:45:16 [Note] Slave I/O thread: Start semi-sync replication to master 'repl@192.168.1.113:5518' in log 'mysql-bin.000160' at position 2140880879 150606 10:45:16 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000160' at position 2140880879, relay log '../relaylog/relay-bin.000001' position: 4; GTID position '0-1895939662-1128069' 150606 10:45:16 [Note] Slave I/O thread: connected to master 'repl@192.168.1.113:5518',replication starts at GTID position '0-1895939662-1128069'
Gliffy Diagrams
Attachments
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
Hi,
Thanks for the report.
10.0.12 is pretty old (1 year old), especially in terms of parallel replication which had a lot of fixes since then. At least these two seem to fit, but there might be others:
MDEV-7326MDEV-7929I recommend you to consider upgrading to the recent 10.0 version (currently 10.0.19, and there will be 10.0.20 released soon). Until then, because different failures in this area often have identical representation, it is going to be very difficult, if not impossible, to understand whether you are hitting an old bug or a new one.