Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7837

Seconds behind Master reports incorrect value when Parallel replication is used

    Details

    • Type: Bug
    • Status: Stalled
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 10.0
    • Fix Version/s: N/A
    • Component/s: Replication
    • Labels:
      None

      Description

      I am not sure how parallel replication impacts SBM calculation but the computation is obviously wrong when using parallel threads.
      SBM should be (imho) calculated by comparing the timestamp of the currently executed event in the binary log to the current datetime. I understand that there might be issues when executing events in parallel however that should work for most cases.

      simple reproducer: 
      <pre>
      #stop perfectly working slave
      STOP SLAVE;
      SELECT SLEEP(10);
      START SLAVE;
      SHOW PROCESSLIST\G
      MariaDB(db-02)[(none)]> SHOW PROCESSLIST\G
      *************************** 1. row ***************************
            Id: 3
          User: system user
          Host: 
            db: NULL
       Command: Connect
          Time: 10
         State: Waiting for work from SQL thread
          Info: NULL
      Progress: 0.000
      *************************** 2. row ***************************
            Id: 4
          User: system user
          Host: 
            db: NULL
       Command: Connect
          Time: 10
         State: Waiting for work from SQL thread
          Info: NULL
      Progress: 0.000
      *************************** 3. row ***************************
      SHOW SLAVE STATUS\G
      
              Seconds_Behind_Master: 0
      

      SBM first reports 0 then will increase monotonically from this point until it reaches the actual delay value (whatever is reported in "Time" for a worker thread in the processlist, for example).

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            elenst Elena Stepanova added a comment -

            Hi,

            From the description it sounds like regular behavior with somewhat slow-ish IO thread: the slave starts, it takes time to read an event from master during which periods the SQL thread(s) are idle, and thus SBM is 0, then an SQL thread gets the even and SBM gets updated and starts increasing.
            But I might have missed something in your description.
            Could you please paste, attach or upload the PROCESSLIST output for all replication-related threads, and also full SHOW SLAVE STATUS output?
            Thanks.

            Show
            elenst Elena Stepanova added a comment - Hi, From the description it sounds like regular behavior with somewhat slow-ish IO thread: the slave starts, it takes time to read an event from master during which periods the SQL thread(s) are idle, and thus SBM is 0, then an SQL thread gets the even and SBM gets updated and starts increasing. But I might have missed something in your description. Could you please paste, attach or upload the PROCESSLIST output for all replication-related threads, and also full SHOW SLAVE STATUS output? Thanks.
            Hide
            tanj Guillaume Lefranc added a comment -

            Yes, that is the case indeed. I am complaining of this behaviour because I happen to stop a slave fully, including the IO thread, then restart it a few seconds later, and it does not report the correct number of seconds behind master. As you said, that is more or less expected because it's based on events gotten from the IO thread.
            I am basically asking if there be would a way to make this indication more reliable, without relying on external tools (e.g. an heartbeat table being updated).

            Show
            tanj Guillaume Lefranc added a comment - Yes, that is the case indeed. I am complaining of this behaviour because I happen to stop a slave fully, including the IO thread, then restart it a few seconds later, and it does not report the correct number of seconds behind master. As you said, that is more or less expected because it's based on events gotten from the IO thread. I am basically asking if there be would a way to make this indication more reliable, without relying on external tools (e.g. an heartbeat table being updated).
            Hide
            elenst Elena Stepanova added a comment -

            I doubt that – not with the current design, anyway.
            But I'll assign it to Kristian Nielsen, not for fixing, but for the second opinion, maybe he can suggest some neat trick.

            Show
            elenst Elena Stepanova added a comment - I doubt that – not with the current design, anyway. But I'll assign it to Kristian Nielsen , not for fixing, but for the second opinion, maybe he can suggest some neat trick.
            Hide
            knielsen Kristian Nielsen added a comment -

            Hm, I'm always confused about the seconds-behind-master value.

            From how I read the comments, it seems the seconds-behind-master value is
            the difference between the last event executed by the SQL thread and the
            last event received by the IO thread?

            I suppose it was made that way to be able to report "0 seconds" for an
            up-to-date slave, even if there is clock skew between master and slave?

            Though I also vaguely remember reading some code that tries to detect and
            adjust for time skew between master and slave. As I said, I've always been
            confused with this part of the code.

            I guess it would be possible to send master's current time in the heartbeat
            messages (if enabled), or even in every event sent from master, but it's not
            something that I think anyone have planned...

            Another option to get an idea of how far behind the slave is, is to compare
            @@gtid_slave_pos on the slave with @@gtid_binlog_pos on the master...

            Show
            knielsen Kristian Nielsen added a comment - Hm, I'm always confused about the seconds-behind-master value. From how I read the comments, it seems the seconds-behind-master value is the difference between the last event executed by the SQL thread and the last event received by the IO thread? I suppose it was made that way to be able to report "0 seconds" for an up-to-date slave, even if there is clock skew between master and slave? Though I also vaguely remember reading some code that tries to detect and adjust for time skew between master and slave. As I said, I've always been confused with this part of the code. I guess it would be possible to send master's current time in the heartbeat messages (if enabled), or even in every event sent from master, but it's not something that I think anyone have planned... Another option to get an idea of how far behind the slave is, is to compare @@gtid_slave_pos on the slave with @@gtid_binlog_pos on the master...
            Hide
            elenst Elena Stepanova added a comment - - edited

            From how I read the comments, it seems the seconds-behind-master value is the difference between the last event executed by the SQL thread and the last event received by the IO thread?

            I think more accurate would be not "last event executed" but "current event that is being executed". Otherwise it sounds right. Here is a partial description from MySQL manual:

            When the slave is actively processing updates, this field shows the difference between the current timestamp on the slave and the original timestamp logged on the master for the event currently being processed on the slave.

            When no event is currently being processed on the slave, this value is 0.

            In essence, this field measures the time difference in seconds between the slave SQL thread and the slave I/O thread. If the network connection between master and slave is fast, the slave I/O thread is very close to the master, so this field is a good approximation of how late the slave SQL thread is compared to the master. If the network is slow, this is not a good approximation; the slave SQL thread may quite often be caught up with the slow-reading slave I/O thread, so Seconds_Behind_Master often shows a value of 0, even if the I/O thread is late compared to the master. In other words, this column is useful only for fast networks.

            This time difference computation works even if the master and slave do not have identical clock times, provided that the difference, computed when the slave I/O thread starts, remains constant from then on. Any changes—including NTP updates—can lead to clock skews that can make calculation of Seconds_Behind_Master less reliable.
            ...

            Another option to get an idea of how far behind the slave is, is to compare @@gtid_slave_pos on the slave with @@gtid_binlog_pos on the master...

            I'm afraid it will be even less informative in terms of time delay, because there might be 1mln tiny little events which will execute in a minute, or there might be a single event that will take hours...

            Anyway, Guillaume Lefranc, if you want to submit a feature request for a new delay indicator, please do so, but as Kristian said above, it's not something that's currently planned.

            Show
            elenst Elena Stepanova added a comment - - edited From how I read the comments, it seems the seconds-behind-master value is the difference between the last event executed by the SQL thread and the last event received by the IO thread? I think more accurate would be not "last event executed" but "current event that is being executed". Otherwise it sounds right. Here is a partial description from MySQL manual : When the slave is actively processing updates, this field shows the difference between the current timestamp on the slave and the original timestamp logged on the master for the event currently being processed on the slave. When no event is currently being processed on the slave, this value is 0. In essence, this field measures the time difference in seconds between the slave SQL thread and the slave I/O thread. If the network connection between master and slave is fast, the slave I/O thread is very close to the master, so this field is a good approximation of how late the slave SQL thread is compared to the master. If the network is slow, this is not a good approximation; the slave SQL thread may quite often be caught up with the slow-reading slave I/O thread, so Seconds_Behind_Master often shows a value of 0, even if the I/O thread is late compared to the master. In other words, this column is useful only for fast networks. This time difference computation works even if the master and slave do not have identical clock times, provided that the difference, computed when the slave I/O thread starts, remains constant from then on. Any changes—including NTP updates—can lead to clock skews that can make calculation of Seconds_Behind_Master less reliable. ... Another option to get an idea of how far behind the slave is, is to compare @@gtid_slave_pos on the slave with @@gtid_binlog_pos on the master... I'm afraid it will be even less informative in terms of time delay, because there might be 1mln tiny little events which will execute in a minute, or there might be a single event that will take hours... Anyway, Guillaume Lefranc , if you want to submit a feature request for a new delay indicator, please do so, but as Kristian said above, it's not something that's currently planned.
            Hide
            stephane@skysql.com VAROQUI Stephane added a comment - - edited

            Using 10.0.16 GTID SBR we stop a slave for 3 days .

            We start the replication that get a correct value for second_behind_master 160K

            stop slave;
            set global slave-parallel-threads=10;
            start slave;

            second_behind_master report 10 and slowly increase to 50 .
            The difference is for sure a regression.

            Show
            stephane@skysql.com VAROQUI Stephane added a comment - - edited Using 10.0.16 GTID SBR we stop a slave for 3 days . We start the replication that get a correct value for second_behind_master 160K stop slave; set global slave-parallel-threads=10; start slave; second_behind_master report 10 and slowly increase to 50 . The difference is for sure a regression.
            Hide
            stephane@skysql.com VAROQUI Stephane added a comment -

            All events have UTC timestamp , so sql thread can get local UTC timestamp and compute a diff with the event one.
            Can you point me what is wrong with such computation of course it does not take care of the time to process events.
            but report delay between the executions of that event , this delay will potentially change for each event , making it hard using it to predict a catch up time.

            Show
            stephane@skysql.com VAROQUI Stephane added a comment - All events have UTC timestamp , so sql thread can get local UTC timestamp and compute a diff with the event one. Can you point me what is wrong with such computation of course it does not take care of the time to process events. but report delay between the executions of that event , this delay will potentially change for each event , making it hard using it to predict a catch up time.

              People

              • Assignee:
                Unassigned
                Reporter:
                tanj Guillaume Lefranc
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: