Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7121

Parallel slave may hang if master crashes in the middle of writing transaction to binlog

    Details

      Description

      This bug happens on the slave, when a binlog from a master ends with a
      partially written event group (that has BEGIN but is missing COMMIT, eg).
      Such partial event group occurs if the master crashes in the middle of writing
      to the binlog.

      The slave detects this when the restart format description event in the
      following binlog file is received. A worker thread that is in the middle of
      replicating the partial event group must be notified so that it can roll back
      the transaction.

      The bug was that this notification could be lost, depending on thread
      scheduling. If lost, the worker thread would then wait indefinitely for the
      rest of the transaction to arrive, and the SQL thread in turn would wait for
      the worker thread to complete the rollback, deadlocking the slave.

      This bug is likely what was seen by a user in a hard-to-reproduce hang.

      It is also the cause of the sporadic failure in Buildbot in MDEV-7079.

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Show
              knielsen Kristian Nielsen added a comment - http://lists.askmonty.org/pipermail/commits/2014-November/007034.html

                People

                • Assignee:
                  knielsen Kristian Nielsen
                  Reporter:
                  knielsen Kristian Nielsen
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  1 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: