Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4473

GTID doesn't work in face of master failovers

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 10.0.2
    • Fix Version/s: 10.0.3
    • Component/s: None
    • Labels:
      None

      Description

      I've attached the test exposing the problem.

      What test does: sets up rpl topology 1->2,1->3, executes some statements, shuts down server 3, performs failover to make topology 2->1, executes some more statements, rotates binlogs, connects server 3 as a slave to 2. At this point server 2 hits ASSERT at sql/sql_repl.cc:1001.

      The problem is GTID implementation has an inconsistent view of what is replication domain and what data constitutes the replication domain state. E.g. function contains_all_slave_gtid() assumes that Gtid_list contains only one gtid by domain yet compares seq_no only if gtid contains the same server_id as requested by slave which is wrong. And this "wrong" is what I wanted to catch with my test but discovered that Gtid_list actually contains one gtid for each domain-server pair which seems to be even more wrong – for how long non-existent old server_id will be stored and passed from one Gtid_list to another on each binlog rotation?

      mysqld hits ASSERT in the test because gtid_find_binlog_file() makes the same assumptions as contains_all_slave_gtid() but is written in a way prohibiting any work if domain_id is repeated in Gtid_list.

      I think the fix should be to remove server_id comparisons from the places like gtid_find_binlog_file() and contains_all_slave_gtid() and make sure that Gtid_list contains only one gtid per domain. After all server_id exists only for de-duplication of events with the same seq_no belonging to alternate futures.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            knielsen Kristian Nielsen added a comment -

            Hm, that assert is wrong:

            const rpl_gtid *gtid= state->find(glev->list[i].domain_id);
            if (!gtid)

            { /* contains_all_slave_gtid() would have returned false if so. */ DBUG_ASSERT(0); continue; }

            It is correct in the first iteration of the loop, but as we delete from the
            `state' hash in the loop, the condition may no longer hold in subsequent
            iterations of the loop. So the assert should just be removed.

            I will check that everything else is ok with the supplied test case (thanks
            for supplying that, made it very easy to reproduce the problem).

            Show
            knielsen Kristian Nielsen added a comment - Hm, that assert is wrong: const rpl_gtid *gtid= state->find(glev->list [i] .domain_id); if (!gtid) { /* contains_all_slave_gtid() would have returned false if so. */ DBUG_ASSERT(0); continue; } It is correct in the first iteration of the loop, but as we delete from the `state' hash in the loop, the condition may no longer hold in subsequent iterations of the loop. So the assert should just be removed. I will check that everything else is ok with the supplied test case (thanks for supplying that, made it very easy to reproduce the problem).
            Hide
            pivanof Pavel Ivanov added a comment -

            Note though contains_all_slave_gtid() returns true for slave_log.00003 when server 3 tries to connect to server 2 which is wrong because some events become skipped in slave_log.00002.

            Show
            pivanof Pavel Ivanov added a comment - Note though contains_all_slave_gtid() returns true for slave_log.00003 when server 3 tries to connect to server 2 which is wrong because some events become skipped in slave_log.00002.
            Hide
            knielsen Kristian Nielsen added a comment -

            Yes, I've noticed, thanks.

            I'll fix.

            Show
            knielsen Kristian Nielsen added a comment - Yes, I've noticed, thanks. I'll fix.
            Hide
            knielsen Kristian Nielsen added a comment -

            BTW, I noticed that your test has --sync_with_master in server_3, but not server_2 (for the initial events on server 1). I assume this is just an oversight, because without it the test becomes non-deterministic (it will fail if server 2 slave threads happen to stop before they replicated everything from server 1)

            Show
            knielsen Kristian Nielsen added a comment - BTW, I noticed that your test has --sync_with_master in server_3, but not server_2 (for the initial events on server 1). I assume this is just an oversight, because without it the test becomes non-deterministic (it will fail if server 2 slave threads happen to stop before they replicated everything from server 1)
            Hide
            knielsen Kristian Nielsen added a comment -

            Ok, so there was a deeper issue here.

            Suppose binlog file X has in its Gtid_list_event: 0-1-3,0-2-5, and suppose the
            slave requests to start replicating after 0-1-3.

            In this case the bug was that master would start sending events from the start
            of X. This is wrong, because 0-2-4 and 0-2-5 are contained in X-1, and are
            needed by the slave. So these events were lost.

            On the other hand, if the slave requested 0-2-5, then it is correct to start
            sending from the beginning of binlog file X, because 0-2-5 is the last GTID
            logged in earlier binlogs. The difference is that 0-2-5 is the last of the
            GTIDs in the Gtid_list_event. The problem was that the code did not check that
            the matched GTID was the last one in the list.

            Fixed by checking if the gtid requested by slave that matches a gtid in the
            Gtid_list_event is the last event for that domain in the list. If not, go back
            to a prior binlog to ensure all needed events are sent to slave.

            I pushed the fix to 10.0-base.
            Patch: revid:knielsen@knielsen-hq.org-20130503092729-gedp152b19k5wdnj

            Show
            knielsen Kristian Nielsen added a comment - Ok, so there was a deeper issue here. Suppose binlog file X has in its Gtid_list_event: 0-1-3,0-2-5, and suppose the slave requests to start replicating after 0-1-3. In this case the bug was that master would start sending events from the start of X. This is wrong, because 0-2-4 and 0-2-5 are contained in X-1, and are needed by the slave. So these events were lost. On the other hand, if the slave requested 0-2-5, then it is correct to start sending from the beginning of binlog file X, because 0-2-5 is the last GTID logged in earlier binlogs. The difference is that 0-2-5 is the last of the GTIDs in the Gtid_list_event. The problem was that the code did not check that the matched GTID was the last one in the list. Fixed by checking if the gtid requested by slave that matches a gtid in the Gtid_list_event is the last event for that domain in the list. If not, go back to a prior binlog to ensure all needed events are sent to slave. I pushed the fix to 10.0-base. Patch: revid:knielsen@knielsen-hq.org-20130503092729-gedp152b19k5wdnj
            Hide
            pivanof Pavel Ivanov added a comment -

            Kristian,
            So why do you think it's necessary to store gtid for each server_id in gtid_list? Why not store one gtid per domain?

            And am I right that if some server was master, then failed over to another master and then ceased to exist altogether gtid with its server_id will exist indefinitely in the gtid_list in the binlogs of every server? Don't you think it's a problem and a way to accumulate a lot of unnecessary garbage? How will everything behave if gtid_list contains 1000 or 10,000 different gtid records?

            Show
            pivanof Pavel Ivanov added a comment - Kristian, So why do you think it's necessary to store gtid for each server_id in gtid_list? Why not store one gtid per domain? And am I right that if some server was master, then failed over to another master and then ceased to exist altogether gtid with its server_id will exist indefinitely in the gtid_list in the binlogs of every server? Don't you think it's a problem and a way to accumulate a lot of unnecessary garbage? How will everything behave if gtid_list contains 1000 or 10,000 different gtid records?
            Hide
            pivanof Pavel Ivanov added a comment -

            OK, I see your response on maria-developers list. I'll move the discussions there.

            Show
            pivanof Pavel Ivanov added a comment - OK, I see your response on maria-developers list. I'll move the discussions there.

              People

              • Assignee:
                knielsen Kristian Nielsen
                Reporter:
                pivanof Pavel Ivanov
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: