Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-253 Multi-source replication
  3. MDEV-3793

Multi-source: Semisync replication is not fully supported for multiple masters and can cause replication failure and relay log corruption

    Details

    • Type: Technical task
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 10.1
    • Component/s: None
    • Labels:
      None

      Description

      Semisync replication doesn't properly distinguish multiple master connections, which causes different problems. For example, if one master has the semisync plugin, and another one doesn't, trying to enable semisync on slave makes replication from both masters abort. The actual errors vary. With the test case below, most often I'm getting

      On the connection with the master which does not have the semisync plugin:

      Last_IO_Errno   1593
      Last_IO_Error   Fatal error: Failed to run 'after_read_event' hook
      

      On the connection with the master which has the semisync plugin:
      either

      Last_SQL_Errno  1594
      Last_SQL_Error  Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
      

      or

      Last_IO_Errno   1595
      Last_IO_Error   Relay log write failure: could not queue event from master
      

      If we decide to support it, the corresponding variables Rpl_semi_sync_slave_status and rpl_semi_sync_slave_enabled should probably be made session-aware.

      The test case is draft, it should not be added to the suite as is. It contains sleeps that are unreliable and slow, when the problem is fixed, they should be replaced with proper waits and syncs.

      Please make sure you have at least revno 3438 from 10.0-base, since the test uses reset_master_slave.inc include file which was added there.

      If you haven't got the error on the first attempt, give it another try, sometimes it gets lucky and passes, apparently there is some kind of a race condition.

      Test case:
      cat semisync.test

      # TODO: when the problem is fixed,
      # instead of the sleeps below there should be proper
      # waits for slaves to start, and also synchronization
      # with each master. For now, it will just make the test
      # hang for long time, so I won't put it here.
      # Also, an log error suppression will need to be added.
      
      
      --connect (master1,127.0.0.1,root,,,$SERVER_MYPORT_1)
      install soname 'semisync_master.so';
      
      --connect (slave,127.0.0.1,root,,,$SERVER_MYPORT_3)
      
      install soname 'semisync_slave.so';
      set global rpl_semi_sync_slave_enabled = 1;
      
      --replace_result $SERVER_MYPORT_1 MYPORT_1
      eval change master 'master1' to
      master_port=$SERVER_MYPORT_1,
      master_host='127.0.0.1',
      master_user='root';
      
      start slave 'master1';
      --sleep 2
      
      --replace_result $SERVER_MYPORT_2 MYPORT_2
      eval change master 'master2' to
      master_port=$SERVER_MYPORT_2,
      master_host='127.0.0.1',
      master_user='root';
      
      start slave 'master2';
      --sleep 2
      
      stop all slaves;
      --sleep 2
      start all slaves;
      --sleep 3
      --replace_result $SERVER_MYPORT_1 MYPORT_1 $SERVER_MYPORT_2 MYPORT_2
      query_vertical show all slaves status;
      
      # Cleanup
      
      --source reset_master_slave.inc
      uninstall plugin rpl_semi_sync_slave;
      --disconnect slave
      
      --connection master1
      --source reset_master_slave.inc
      uninstall plugin rpl_semi_sync_master;
      --disconnect master1
      
      

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              monty Michael Widenius added a comment -

              Because semi-sync is a plugin, this is not a trivial task to fix.

              What would need to be done:

              • Change all global variables in semisync_slave.h and semsync_slave.h to be a dynamically allocated, hashed by connection name. The easiest way is probably to just move these to the ReplSemiSyncSlave structure.
              • Change initial allocation so that when semi sync starts, it will copy the initial values to all running multi-source instances as default values.
              • Change variable repl_semisync to be a dynamicly allocated variable, based on connection name.
              • Change the "semi_sync_slave_system_vars" and "semi_sync_slave_status_vars" variables to connection variables.
              • This is the hard part as don't have support for these kind of dynamic connection variables from a plugin.
              • Change fix_rpl_semi_sync_slave_enabled() to work with current connection name.

              Add connection name to Binlog_relay_IO_param; This is needed to be able to lockup the correct value for rpl_semisync for the current master.

              Show
              monty Michael Widenius added a comment - Because semi-sync is a plugin, this is not a trivial task to fix. What would need to be done: Change all global variables in semisync_slave.h and semsync_slave.h to be a dynamically allocated, hashed by connection name. The easiest way is probably to just move these to the ReplSemiSyncSlave structure. Change initial allocation so that when semi sync starts, it will copy the initial values to all running multi-source instances as default values. Change variable repl_semisync to be a dynamicly allocated variable, based on connection name. Change the "semi_sync_slave_system_vars" and "semi_sync_slave_status_vars" variables to connection variables. This is the hard part as don't have support for these kind of dynamic connection variables from a plugin. Change fix_rpl_semi_sync_slave_enabled() to work with current connection name. Add connection name to Binlog_relay_IO_param; This is needed to be able to lockup the correct value for rpl_semisync for the current master.
              Hide
              elenst Elena Stepanova added a comment -

              MDEV-4920 was marked a duplicate of this report.

              Show
              elenst Elena Stepanova added a comment - MDEV-4920 was marked a duplicate of this report.
              Hide
              knielsen Kristian Nielsen added a comment -

              I believe MySQL 5.7 has multi-source?
              So the necessary changes to the plugin should be available from there, I suppose - unless they also didn't care about making it work.

              Show
              knielsen Kristian Nielsen added a comment - I believe MySQL 5.7 has multi-source? So the necessary changes to the plugin should be available from there, I suppose - unless they also didn't care about making it work.

                People

                • Assignee:
                  monty Michael Widenius
                  Reporter:
                  elenst Elena Stepanova
                • Votes:
                  1 Vote for this issue
                  Watchers:
                  4 Start watching this issue

                  Dates

                  • Created:
                    Updated:

                    Time Tracking

                    Estimated:
                    Original Estimate - 2 days, 4 hours Original Estimate - 2 days, 4 hours
                    2d 4h
                    Remaining:
                    Time Spent - 45 minutes Remaining Estimate - 2 days, 4 hours
                    2d 4h
                    Logged:
                    Time Spent - 45 minutes Remaining Estimate - 2 days, 4 hours
                    45m