We're updating the issue view to help you get more done. 

Multi-source: Semisync replication is not fully supported for multiple masters and can cause replication failure and relay log corruption

Description

Semisync replication doesn't properly distinguish multiple master connections, which causes different problems. For example, if one master has the semisync plugin, and another one doesn't, trying to enable semisync on slave makes replication from both masters abort. The actual errors vary. With the test case below, most often I'm getting

On the connection with the master which does not have the semisync plugin:

1 2 Last_IO_Errno 1593 Last_IO_Error Fatal error: Failed to run 'after_read_event' hook

On the connection with the master which has the semisync plugin:
either

1 2 Last_SQL_Errno 1594 Last_SQL_Error Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.

or

1 2 Last_IO_Errno 1595 Last_IO_Error Relay log write failure: could not queue event from master

If we decide to support it, the corresponding variables Rpl_semi_sync_slave_status and rpl_semi_sync_slave_enabled should probably be made session-aware.

The test case is draft, it should not be added to the suite as is. It contains sleeps that are unreliable and slow, when the problem is fixed, they should be replaced with proper waits and syncs.

Please make sure you have at least revno 3438 from 10.0-base, since the test uses reset_master_slave.inc include file which was added there.

If you haven't got the error on the first attempt, give it another try, sometimes it gets lucky and passes, apparently there is some kind of a race condition.

Test case:
cat semisync.test

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 # TODO: when the problem is fixed, # instead of the sleeps below there should be proper # waits for slaves to start, and also synchronization # with each master. For now, it will just make the test # hang for long time, so I won't put it here. # Also, an log error suppression will need to be added. --connect (master1,127.0.0.1,root,,,$SERVER_MYPORT_1) install soname 'semisync_master.so'; --connect (slave,127.0.0.1,root,,,$SERVER_MYPORT_3) install soname 'semisync_slave.so'; set global rpl_semi_sync_slave_enabled = 1; --replace_result $SERVER_MYPORT_1 MYPORT_1 eval change master 'master1' to master_port=$SERVER_MYPORT_1, master_host='127.0.0.1', master_user='root'; start slave 'master1'; --sleep 2 --replace_result $SERVER_MYPORT_2 MYPORT_2 eval change master 'master2' to master_port=$SERVER_MYPORT_2, master_host='127.0.0.1', master_user='root'; start slave 'master2'; --sleep 2 stop all slaves; --sleep 2 start all slaves; --sleep 3 --replace_result $SERVER_MYPORT_1 MYPORT_1 $SERVER_MYPORT_2 MYPORT_2 query_vertical show all slaves status; # Cleanup --source reset_master_slave.inc uninstall plugin rpl_semi_sync_slave; --disconnect slave --connection master1 --source reset_master_slave.inc uninstall plugin rpl_semi_sync_master; --disconnect master1

Environment

None

Status

Assignee

Michael Widenius

Reporter

Elena Stepanova

Time estimate

20h

Fix versions

Priority

Minor