Details
-
Type:
Bug
-
Status: Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: 10.0.5
-
Component/s: None
-
Labels:None
Description
The test case runs a number ($n) of INSERTs on master, then flushes logs, and tries to synchronize the slave with the master.
If it's executed with slave_parallel_threads > 0, we can see in the slave status that the exec master position is the same as on the master right away, although the number of inserted values differs.
After a while, the position is still the same, but the count increases (up to the expected value, eventually).
The test case shows the problem reliably for me with $n = 100 (the first count on the slave is 20 or so), but if it does not for you, please try to increase $n, I suppose it should increase the probability.
--source include/master-slave.inc --source include/have_innodb.inc --source include/have_binlog_format_mixed.inc --enable_connect_log --connection slave --source include/stop_slave.inc --connection master --disable_warnings DROP TABLE IF EXISTS t1; --enable_warnings CREATE TABLE t1 (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY) ENGINE=MyISAM; let $n = 100; --disable_query_log --echo --echo # Running $n single inserts on master --echo while ($n) { INSERT INTO t1 VALUES(); dec $n; } --enable_query_log FLUSH LOGS; SHOW MASTER STATUS; --save_master_pos --connection slave --source include/start_slave.inc --sync_with_master --echo --echo # The slave thinks it's synchronized, but the count is off --echo query_vertical SHOW SLAVE STATUS; select count(*) from t1; --echo --echo # After waiting a bit, the position is the same, but the count increased --echo sleep 2; query_vertical SHOW SLAVE STATUS; select count(*) from t1; connection master; DROP TABLE t1; sync_slave_with_master; --disable_connect_log --source include/rpl_end.inc
Output:
CREATE TABLE t1 (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY) ENGINE=MyISAM; # Running 100 single inserts on master FLUSH LOGS; SHOW MASTER STATUS; File Position Binlog_Do_DB Binlog_Ignore_DB master-bin.000002 367 connection slave; include/start_slave.inc connection slave; # The slave thinks it's synchronized, but the count is off SHOW SLAVE STATUS; Slave_IO_State Waiting for master to send event Master_Host 127.0.0.1 Master_User root Master_Port 16000 Connect_Retry 1 Master_Log_File master-bin.000002 Read_Master_Log_Pos 367 Relay_Log_File slave-relay-bin.000005 Relay_Log_Pos 7051 Relay_Master_Log_File master-bin.000002 Slave_IO_Running Yes Slave_SQL_Running Yes Replicate_Do_DB Replicate_Ignore_DB Replicate_Do_Table Replicate_Ignore_Table Replicate_Wild_Do_Table Replicate_Wild_Ignore_Table Last_Errno 0 Last_Error Skip_Counter 0 Exec_Master_Log_Pos 367 Relay_Log_Space 996 Until_Condition None Until_Log_File Until_Log_Pos 0 Master_SSL_Allowed No Master_SSL_CA_File Master_SSL_CA_Path Master_SSL_Cert Master_SSL_Cipher Master_SSL_Key Seconds_Behind_Master 0 Master_SSL_Verify_Server_Cert No Last_IO_Errno 0 Last_IO_Error Last_SQL_Errno 0 Last_SQL_Error Replicate_Ignore_Server_Ids Master_Server_Id 1 Using_Gtid No select count(*) from t1; count(*) 29 # After waiting a bit, the position is the same, but the count increased SHOW SLAVE STATUS; Slave_IO_State Waiting for master to send event Master_Host 127.0.0.1 Master_User root Master_Port 16000 Connect_Retry 1 Master_Log_File master-bin.000002 Read_Master_Log_Pos 367 Relay_Log_File slave-relay-bin.000005 Relay_Log_Pos 22963 Relay_Master_Log_File master-bin.000002 Slave_IO_Running Yes Slave_SQL_Running Yes Replicate_Do_DB Replicate_Ignore_DB Replicate_Do_Table Replicate_Ignore_Table Replicate_Wild_Do_Table Replicate_Wild_Ignore_Table Last_Errno 0 Last_Error Skip_Counter 0 Exec_Master_Log_Pos 367 Relay_Log_Space 996 Until_Condition None Until_Log_File Until_Log_Pos 0 Master_SSL_Allowed No Master_SSL_CA_File Master_SSL_CA_Path Master_SSL_Cert Master_SSL_Cipher Master_SSL_Key Seconds_Behind_Master 2 Master_SSL_Verify_Server_Cert No Last_IO_Errno 0 Last_IO_Error Last_SQL_Errno 0 Last_SQL_Error Replicate_Ignore_Server_Ids Master_Server_Id 1 Using_Gtid No select count(*) from t1; count(*) 100 connection master; DROP TABLE t1; connection slave; include/rpl_end.inc
revision-id: knielsen@knielsen-hq.org-20131030065230-kp8dykgyeth6ma55 revno: 3690 branch-nick: 10.0-knielsen BUILD/compile-pentium-debug-max-no-ndb
Gliffy Diagrams
Attachments
Issue Links
- relates to
-
MDEV-4506 MWL#184: Parallel replication of group-committed transactions
-
- Closed
-
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
In parallel replication, there are two kinds of events which are
executed in different ways.
Normal events that are part of event groups/transactions are executed
asynchroneously by being queued for a worker thread.
Other events like format description and rotate and such are executed
directly in the driver SQL thread.
If the direct execution of the other events were to update the old-style
position, then the position gets updated too far ahead, before the normal
events that have been queued for a worker thread have been executed.
So I now pushed a patch that adds some special cases to prevent such position
updates ahead of time, and instead queues dummy events for the worker threads,
so that they will at an appropriate time do the position updates instead.
With this patch, the count is the correct one, 100, in both cases in the
reported test case.