Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-5114

seconds_behind_master flips to 0 & spikes back, when running show slaves status

    Details

      Description

      we setup maria 10.0.4 for multisource replication.

      it was replicating from a percona 5.5 master. When we started replication, maria was a day behind its master.

      its working fine but we noticed this phenomenon:

      when we execute 'show all slaves status' consecutively the 'seconds_behind_master' flips from 0 to XXX and XXX to 0. XXX keeps decreasing though, which is good as its getting closer to the state of its master.

      But wanted to know why the 'seconds_behind_master' comes down to 0 then again sets to a positive number.

      Note: XXX is some positive number.

      Elena Stepanova asked to file a bug for the same.

      bugs related to this issue
      --------------------------------
      http://bugs.mysql.com/bug.php?id=52166
      http://bugs.mysql.com/bug.php?id=53167

      these were fixed in 5.6. We need to either merge the bugfixes or fix them on our own

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              elenst Elena Stepanova added a comment -

              I cannot reproduce Bug#53167 on the current 10.0 tree, but Bug#52166 is definitely there.

              Show
              elenst Elena Stepanova added a comment - I cannot reproduce Bug#53167 on the current 10.0 tree, but Bug#52166 is definitely there.
              Hide
              serg Sergei Golubchik added a comment -

              I cannot repeat it using the test case from bug#52166 fix.
              The (adapted to MariaDB) test is attached.

              Show
              serg Sergei Golubchik added a comment - I cannot repeat it using the test case from bug#52166 fix. The (adapted to MariaDB) test is attached.
              Hide
              elenst Elena Stepanova added a comment -

              I cannot repeat it using the test case from bug#52166 fix.

              That's because the test tests something different from what the fix fixes.
              Here is what they say in the commit comment:

              Introduction: seconds behind master is calculated by checking the
              time difference between the current time on the slave (time(0))
              and the creation timestamp, at the master, of the last executed
              event by the SQL thread. The last event timestamp is saved by the
              SQL thread in rli->last_master_timestamp. Different timezones for
              master and slave are taken into account when showing this
              delta. The rli->last_master_timestamp is updated when the SQL
              thread finishes the event execution at the slave.

              Problem: when the SQL thread has no events to replay in the relay
              log, it waits. Right before waiting, it saves the last
              event (lets call it e1) timestamp (rli->last_master_timestamp)
              set at the master (in save_last_master_timestamp), and sets
              rli->last_master_timestamp to zero. This makes 'SHOW SLAVE
              STATUS' show zero while the SQL thread is waiting. However, when
              the SQL thread awakes, most likely because of the fact that a new
              event was enqueued in the relay log, it restores the
              save_last_master_timestamp to rli->last_master_timestamp. Then it
              goes on to process the new event (lets name it e2). Given this
              and the fact that the SQL thread only updates the
              rli->last_master_timestamp at the end of execution, there is a
              time window that the user can notice a sudden spike on
              Seconds_Behind_Master, especially if e2 arrives a long time after
              e1 finished. The spike is actually enforced - value increases -
              during e2 execution time, which means that if e2 execution
              latency increases, then the likelihood that the user sees this
              also increases.

              That is (and the original complaint was really about it),
              actual behavior before the fix: when slave has started executing e2,
              Seconds_Behind_Master = Slave_time - e1_timestamp

              Expected result: when slave has started executing e2,
              Seconds_Behind_Master = Slave_time - e2_timestamp

              What they test instead:

              e1_timestamp: 10:00:00
              e2_timestamp: 10:00:00
              Slave_time: 11:00:00
              
              Slave has started executing e2: Seconds_Behind_Master > 3500
              

              But of course it is, either with the pre-fix behavior or with the fixed one, it will always be 3600 (or 1-2 seconds more).

              What they should have done instead is:
              1) reset debug_dbug before INSERT on master, so that the interval between two events is 1 hour;
              2) check that Seconds_Behind_Master is LESS than 3500, not greater

              Then it would have tested what they described.

              But the actual bug is really fixed in 5.6, while still exists in 10.0. Here is a low-tech dirty test which positively confirms it and is simple enough to read:

              --source include/master-slave.inc
              --source include/have_binlog_format_statement.inc
              
              CREATE TABLE t (a int);
              --sleep 10
              send INSERT INTO t VALUES(SLEEP(5));
              
              --connection slave
              let $run = 60;
              while ($run) 
              {
              	query_vertical show slave status ;
              	sleep 1;
              	dec $run;
              }
              

              With MySQL 5.6, Seconds_Behind_Master printed by multiple show slave status calls, are like this:

              Seconds_Behind_Master	0
              ...
              Seconds_Behind_Master	0
              Seconds_Behind_Master	1
              Seconds_Behind_Master	2
              Seconds_Behind_Master	3
              Seconds_Behind_Master	4
              Seconds_Behind_Master	0
              ...
              Seconds_Behind_Master	0
              

              While with MariaDB it's this:

              Seconds_Behind_Master	0
              ...
              Seconds_Behind_Master	0
              Seconds_Behind_Master	15
              Seconds_Behind_Master	16
              Seconds_Behind_Master	17
              Seconds_Behind_Master	18
              Seconds_Behind_Master	19
              Seconds_Behind_Master	0
              ...
              Seconds_Behind_Master	0
              
              Show
              elenst Elena Stepanova added a comment - I cannot repeat it using the test case from bug#52166 fix. That's because the test tests something different from what the fix fixes. Here is what they say in the commit comment: Introduction: seconds behind master is calculated by checking the time difference between the current time on the slave (time(0)) and the creation timestamp, at the master, of the last executed event by the SQL thread. The last event timestamp is saved by the SQL thread in rli->last_master_timestamp. Different timezones for master and slave are taken into account when showing this delta. The rli->last_master_timestamp is updated when the SQL thread finishes the event execution at the slave. Problem: when the SQL thread has no events to replay in the relay log, it waits. Right before waiting, it saves the last event (lets call it e1) timestamp (rli->last_master_timestamp) set at the master (in save_last_master_timestamp), and sets rli->last_master_timestamp to zero. This makes 'SHOW SLAVE STATUS' show zero while the SQL thread is waiting. However, when the SQL thread awakes, most likely because of the fact that a new event was enqueued in the relay log, it restores the save_last_master_timestamp to rli->last_master_timestamp. Then it goes on to process the new event (lets name it e2). Given this and the fact that the SQL thread only updates the rli->last_master_timestamp at the end of execution, there is a time window that the user can notice a sudden spike on Seconds_Behind_Master, especially if e2 arrives a long time after e1 finished. The spike is actually enforced - value increases - during e2 execution time, which means that if e2 execution latency increases, then the likelihood that the user sees this also increases. That is (and the original complaint was really about it), actual behavior before the fix: when slave has started executing e2, Seconds_Behind_Master = Slave_time - e1_timestamp Expected result: when slave has started executing e2, Seconds_Behind_Master = Slave_time - e2_timestamp What they test instead: e1_timestamp: 10:00:00 e2_timestamp: 10:00:00 Slave_time: 11:00:00 Slave has started executing e2: Seconds_Behind_Master > 3500 But of course it is, either with the pre-fix behavior or with the fixed one, it will always be 3600 (or 1-2 seconds more). What they should have done instead is: 1) reset debug_dbug before INSERT on master, so that the interval between two events is 1 hour; 2) check that Seconds_Behind_Master is LESS than 3500, not greater Then it would have tested what they described. But the actual bug is really fixed in 5.6, while still exists in 10.0. Here is a low-tech dirty test which positively confirms it and is simple enough to read: --source include/master-slave.inc --source include/have_binlog_format_statement.inc CREATE TABLE t (a int); --sleep 10 send INSERT INTO t VALUES(SLEEP(5)); --connection slave let $run = 60; while ($run) { query_vertical show slave status ; sleep 1; dec $run; } With MySQL 5.6, Seconds_Behind_Master printed by multiple show slave status calls, are like this: Seconds_Behind_Master 0 ... Seconds_Behind_Master 0 Seconds_Behind_Master 1 Seconds_Behind_Master 2 Seconds_Behind_Master 3 Seconds_Behind_Master 4 Seconds_Behind_Master 0 ... Seconds_Behind_Master 0 While with MariaDB it's this: Seconds_Behind_Master 0 ... Seconds_Behind_Master 0 Seconds_Behind_Master 15 Seconds_Behind_Master 16 Seconds_Behind_Master 17 Seconds_Behind_Master 18 Seconds_Behind_Master 19 Seconds_Behind_Master 0 ... Seconds_Behind_Master 0

                People

                • Assignee:
                  serg Sergei Golubchik
                  Reporter:
                  jaihind213 vishnu rao
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  6 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Time Tracking

                    Estimated:
                    Original Estimate - Not Specified
                    Not Specified
                    Remaining:
                    Remaining Estimate - 0 minutes
                    0m
                    Logged:
                    Time Spent - 4 hours
                    4h