Details
-
Type:
Bug
-
Status: Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: None
-
Labels:
Description
I run CHANGE MASTER ... master_gtid_pos=auto;
start slave;
execute some statements on master;
wait till slave catches up with the master;
stop slave (both threads or IO only);
start slave again
=> the slave attempts to re-execute previous statements.
revision-id: knielsen@knielsen-hq.org-20130311151655-yc1i3z72v6c00pfz revno: 3468 branch-nick: 10.0-mdev26
Test case:
--source include/master-slave.inc
--connection slave
STOP SLAVE;
--source include/wait_for_slave_to_stop.inc
RESET SLAVE ALL;
--connection master
RESET MASTER;
--connection slave
eval CHANGE MASTER TO master_host='127.0.0.1', master_port=$MASTER_MYPORT, master_user='root', master_gtid_pos=auto;
START SLAVE;
--source include/wait_for_slave_to_start.inc
--connection master
CREATE TABLE t1 (i INT);
INSERT INTO t1 VALUES (1);
--sync_slave_with_master
STOP SLAVE IO_THREAD;
--source include/wait_for_slave_io_to_stop.inc
START SLAVE IO_THREAD;
--source include/wait_for_slave_io_to_start.inc
--sync_with_master
Result:
=== SHOW SLAVE STATUS === ---- 1. ---- Slave_IO_State Waiting for master to send event Master_Host 127.0.0.1 Master_User root Master_Port 16000 Connect_Retry 1 Master_Log_File master-bin.000001 Read_Master_Log_Pos 311 Relay_Log_File slave-relay-bin.000002 Relay_Log_Pos 599 Relay_Master_Log_File master-bin.000001 Slave_IO_Running Yes Slave_SQL_Running No Replicate_Do_DB Replicate_Ignore_DB Replicate_Do_Table Replicate_Ignore_Table Replicate_Wild_Do_Table Replicate_Wild_Ignore_Table Last_Errno 1050 Last_Error Error 'Table 't1' already exists' on query. Default database: 'test'. Query: 'CREATE TABLE t1 (i INT)' Skip_Counter 0 Exec_Master_Log_Pos 311 Relay_Log_Space 1863 Until_Condition None Until_Log_File Until_Log_Pos 0 Master_SSL_Allowed No Master_SSL_CA_File Master_SSL_CA_Path Master_SSL_Cert Master_SSL_Cipher Master_SSL_Key Seconds_Behind_Master Master_SSL_Verify_Server_Cert No Last_IO_Errno 0 Last_IO_Error Last_SQL_Errno 1050 Last_SQL_Error Error 'Table 't1' already exists' on query. Default database: 'test'. Query: 'CREATE TABLE t1 (i INT)' Replicate_Ignore_Server_Ids Master_Server_Id 1 Using_Gtid 1 =========================
Gliffy Diagrams
Attachments
Issue Links
- relates to
-
MDEV-26 Global transaction ID
-
- Closed
-
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
Right, this is an important issue, thanks for catching.
The underlying issue here is that when IO thread connects (or re-connects), it needs to request position
by GTID, which is related to what the SQL thread has last executed, not to what the IO thread last fetched.
So there are several possibitilities for fetching again something that the SQL thread is in the middle of executing, or similar races. My current code does not handle this at all. It can be especially tricky as the SQL thread may be running while the IO thread loses the connection to the master and needs to automatically reconnect.
I think I need to make it so that the SQL thread remembers what it executed, so that it can skip stuff that gets duplicate-fetched into relay logs. This is not too hard, it only needs to be done in-memory. Whenever slave server is restarted or CHANGE MASTER is executed, we can just drop existing relay logs (which we need to do anyway).
Still, needs to be done carefully to handle all cases properly.