Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6981

[PATCH] feature request MASTER_GTID_WAIT status variables

    Details

    • Type: Task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: 10.1.4
    • Component/s: Replication
    • Labels:
      None

      Description

      With MASTER_GTID_WAIT I wouldn't mind a global (and optionally session) indication of:

      • total time spent in this wait state in including timeouts
      • total time spent in the function excluding timeouts
      • total number of timeouts occurred in this function

      I'd find this quite useful for general health monitoring of this function, particularly when graphed over time.

      Thanks for your consideration

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            danblack Daniel Black added a comment -

            attached patch implements three status variables

            master_gtid_wait_count (# of times called)
            master_gtid_wait_timeouts (# of time it was called and timed out)
            master_gtid_wait_time (time in microseconds waiting for results)

            Show
            danblack Daniel Black added a comment - attached patch implements three status variables master_gtid_wait_count (# of times called) master_gtid_wait_timeouts (# of time it was called and timed out) master_gtid_wait_time (time in microseconds waiting for results)
            Hide
            danblack Daniel Black added a comment -

            made output more readable by only replacing $wait_time with MASTER_GTID_WAIT_TIME, leaving expression, and using more accurate labels on output.

            Show
            danblack Daniel Black added a comment - made output more readable by only replacing $wait_time with MASTER_GTID_WAIT_TIME, leaving expression, and using more accurate labels on output.
            Hide
            knielsen Kristian Nielsen added a comment -

            Looks good.
            Only, I think the variables need to be declared ulonglong? Because they are used with SHOW_LONGLONG_STATUS ? (and otherwise timeout counter would wrap after 4000 seconds)

            Show
            knielsen Kristian Nielsen added a comment - Looks good. Only, I think the variables need to be declared ulonglong? Because they are used with SHOW_LONGLONG_STATUS ? (and otherwise timeout counter would wrap after 4000 seconds)
            Hide
            danblack Daniel Black added a comment -

            fixed

            Show
            danblack Daniel Black added a comment - fixed
            Hide
            knielsen Kristian Nielsen added a comment -

            Bummer, I don't know how this has stalled for so long

            I've pushed it to a feature tree for buildbot testing, once that has run I will push it to 10.1.

            Show
            knielsen Kristian Nielsen added a comment - Bummer, I don't know how this has stalled for so long I've pushed it to a feature tree for buildbot testing, once that has run I will push it to 10.1.
            Hide
            knielsen Kristian Nielsen added a comment - - edited

            Ehm, this really does not work:

            # This one completes immediately ( < 1 ms).
            SELECT master_gtid_wait('1-1-1');
            let $wait_time = query_get_value(SHOW STATUS LIKE 'Master_gtid_wait_time', Value, 1);
            eval SELECT floor($wait_time / 1000) AS Master_gtid_wait_time_milliseconds;
            
            SELECT master_gtid_wait('2-1-2', 0.5);
            # (0.5-0.6 seconds)
            eval SELECT floor($wait_time / 100000) AS Master_gtid_wait_time_tenths_of_a_second;
            

            Those times will fluctuate a lot depending on load on the test machine (as
            seen in buildbot).

            And did you test that the counters are reset to 0 when the test starts, so
            that they values do not depend on what happened in earlier tests that ran
            before this test?

            Also, main.max_statement_time fails (but that's easy to fix).

            Show
            knielsen Kristian Nielsen added a comment - - edited Ehm, this really does not work: # This one completes immediately ( < 1 ms). SELECT master_gtid_wait('1-1-1'); let $wait_time = query_get_value(SHOW STATUS LIKE 'Master_gtid_wait_time', Value, 1); eval SELECT floor($wait_time / 1000) AS Master_gtid_wait_time_milliseconds; SELECT master_gtid_wait('2-1-2', 0.5); # (0.5-0.6 seconds) eval SELECT floor($wait_time / 100000) AS Master_gtid_wait_time_tenths_of_a_second; Those times will fluctuate a lot depending on load on the test machine (as seen in buildbot). And did you test that the counters are reset to 0 when the test starts, so that they values do not depend on what happened in earlier tests that ran before this test? Also, main.max_statement_time fails (but that's easy to fix).
            Hide
            danblack Daniel Black added a comment -

            https://github.com/MariaDB/server/pull/21

            > Those times will fluctuate a lot depending on load on the test machine (as seen in buildbot).

            Increased time windows a lot. The immediate case is a event is pushed to the master before master_gtid_wait is called on the slave. The 0.5 case is the timeout occurs on the slave master_gtid_wait. Is there a way to write these sort of test cases?

            > And did you test that the counters are reset to 0

            I'm only looking at the session status variables on a new connection.

            > Also, main.max_statement_time fails (but that's easy to fix).

            I don't see that in my patch.

            Show
            danblack Daniel Black added a comment - https://github.com/MariaDB/server/pull/21 > Those times will fluctuate a lot depending on load on the test machine (as seen in buildbot). Increased time windows a lot. The immediate case is a event is pushed to the master before master_gtid_wait is called on the slave. The 0.5 case is the timeout occurs on the slave master_gtid_wait. Is there a way to write these sort of test cases? > And did you test that the counters are reset to 0 I'm only looking at the session status variables on a new connection. > Also, main.max_statement_time fails (but that's easy to fix). I don't see that in my patch.
            Hide
            knielsen Kristian Nielsen added a comment -

            I've merged it into 10.1 and pushed to bb-10.1-knielsen for a buidbot run.
            If everything looks ok, I'll push to main 10.1 tomorrow.

            Show
            knielsen Kristian Nielsen added a comment - I've merged it into 10.1 and pushed to bb-10.1-knielsen for a buidbot run. If everything looks ok, I'll push to main 10.1 tomorrow.
            Hide
            knielsen Kristian Nielsen added a comment -

            Pushed to 10.1.4, thanks Daniel!

            Show
            knielsen Kristian Nielsen added a comment - Pushed to 10.1.4, thanks Daniel!

              People

              • Assignee:
                knielsen Kristian Nielsen
                Reporter:
                danblack Daniel Black
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: