Details

    • Type: Task
    • Status: Stalled
    • Priority: Major
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      patch has been ported to 10.1
      you can use it under new (also called "3-clause") BSD license
      i have done internal benchmarks...but you are most welcome to do some too.

      DESCRIPTION:
      no slave left behind

      this patch implements master throttling based on slave lag,
      aka no slave left behind. the core feature works as follows
      1) the semi-sync-reply is ammended to also report back SQL-thread
      position (aka exec position)
      2) transactions are not removed from the "active-transaction-list"
      in the semi-sync-master plugin until atleast one slave has reported
      that it has executed this transaction. the slave lag can then
      be estimated by calculating how long the oldest transaction has been
      lingering in the active-transaction-list.
      3) client-threads are forced to wait before commit until slave lag
      has decreased to acceptable value.

      the following variables are introduced on master:

      • rpl_semi_sync_master_max_slave_lag (global)
      • rpl_semi_sync_master_slave_lag_wait_timeout (session)

      the following status variables are introduced on master:

      • rpl_semi_sync_master_slave_lag_wait_sessions
      • rpl_semi_sync_master_estimated_slave_lag
      • rpl_semi_sync_master_trx_slave_lag_wait_time
      • rpl_semi_sync_master_trx_slave_lag_wait_num
      • rpl_semi_sync_master_avg_trx_slave_lag_wait_time

      the following variables are introduced on slave:

      • rpl_semi_sync_slave_lag_enabled (global)

      in addition to this, 2 optimizations that decreases overhead of semi-sync
      is introduced.
      1) the idea of this is that if when a slave should send and transaction,
      it checks if it should be semi-synced, but rather
      than semi-sync:ing each transaction (which is done currently) the code
      will skip semi-syncing transaction if there already is newer transactions
      committed. But, since this can mean that semi-syncing is delayed indefinitely
      a cap is set using 2 new master variables:

      • rpl_semi_sync_master_max_unacked_event_bytes (global)
      • rpl_semi_sync_master_max_unacked_event_count (global)
        2) rpl_semi_sync_master_group_commit which makes the semi-sync
        plugin only semi-sync the last transaction in a group commit.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            jonaso Jonas Oreland added a comment -

            a comment is that I have not tested/considered parallel slave applier.
            but if the get_master_log_pos-function that I wrote works with parallel slave applier,
            it should work. feedback welcome

            Show
            jonaso Jonas Oreland added a comment - a comment is that I have not tested/considered parallel slave applier. but if the get_master_log_pos-function that I wrote works with parallel slave applier, it should work. feedback welcome
            Hide
            knielsen Kristian Nielsen added a comment -

            > a comment is that I have not tested/considered parallel slave applier. but
            > if the get_master_log_pos-function that I wrote works with parallel slave
            > applier, it should work.

            I think it should work. The rli->group_master_log_name and
            rli->group_master_log_pos fields are also updated in parallel replication.

            The update happens out-of-order though. Especially when using multiple
            replications domains and GTID, one domain can be quite a bit ahead of
            another. So the "no slave left behind" will use the position of the
            most-ahead worker thread to tell how far the slave has progressed, not the
            position of the most-behind worker. That seems fine, I think.

            Show
            knielsen Kristian Nielsen added a comment - > a comment is that I have not tested/considered parallel slave applier. but > if the get_master_log_pos-function that I wrote works with parallel slave > applier, it should work. I think it should work. The rli->group_master_log_name and rli->group_master_log_pos fields are also updated in parallel replication. The update happens out-of-order though. Especially when using multiple replications domains and GTID, one domain can be quite a bit ahead of another. So the "no slave left behind" will use the position of the most-ahead worker thread to tell how far the slave has progressed, not the position of the most-behind worker. That seems fine, I think.
            Hide
            knielsen Kristian Nielsen added a comment -
            Show
            knielsen Kristian Nielsen added a comment - Review sent on maria-developers@: https://lists.launchpad.net/maria-developers/msg08575.html

              People

              • Assignee:
                knielsen Kristian Nielsen
                Reporter:
                jonaso Jonas Oreland
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: