Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6321

close_temporary_tables() in format description event not serialised correctly

    Details

      Description

      In Start_log_event_v3::do_apply_event(), there is code to detect when the
      format description event is the first one logged after master restart. In this
      case, it calls rli->close_temporary_tables() and
      cleanup_load_tmpdir(&rli->mi->cmp_connection_name) to cleanup any temporary
      tables left dangling.

      This is a problem in parallel replication, as the code currently does not have
      any serialisation for this case.

      The code needs to handle this:

      • before processing the master restart event, wait until all prior events
        have completed, in case they are referencing temporary tables.
      • Delay processing of all later events, in case they will open new temporary
        tables.

        Gliffy Diagrams

          Attachments

            Activity

            Show
            knielsen Kristian Nielsen added a comment - Patch for review: http://lists.askmonty.org/pipermail/commits/2014-July/006275.html
            Hide
            knielsen Kristian Nielsen added a comment -

            Hm, I think that patch is not enough.

            We also need to handle the case where the master crashes after writing only part of a transaction into the binlog. In this case, the following format description event is used to notify of the partial event group that must be rolled back.

            So I think when the FD event is encountered, we need to queue some marker event that tells the worker thread with the partial event group to roll back, otherwise we will deadlock with FD event waiting for group to complete and group waiting for FD event to know it must roll back...

            Show
            knielsen Kristian Nielsen added a comment - Hm, I think that patch is not enough. We also need to handle the case where the master crashes after writing only part of a transaction into the binlog. In this case, the following format description event is used to notify of the partial event group that must be rolled back. So I think when the FD event is encountered, we need to queue some marker event that tells the worker thread with the partial event group to roll back, otherwise we will deadlock with FD event waiting for group to complete and group waiting for FD event to know it must roll back...
            Show
            knielsen Kristian Nielsen added a comment - Patches for review: http://lists.askmonty.org/pipermail/commits/2014-August/006421.html http://lists.askmonty.org/pipermail/commits/2014-August/006423.html
            Hide
            knielsen Kristian Nielsen added a comment -

            Assigning to Monty for review, as agreed on the IRC

            Show
            knielsen Kristian Nielsen added a comment - Assigning to Monty for review, as agreed on the IRC
            Hide
            monty Michael Widenius added a comment -

            Review for patch 1

            void
            +rpl_parallel::wait_for_workers_idle(THD *thd)
            +{
            + uint32 i, max_i;
            +

            Add a comment why we don't need a mutex while we loop over
            domain_hash. (I noticed we do this a lot in the code already, but
            didn't find any comment why this is safe).

            I assume this is because this is called by the driver thread and it's
            only the driver thread that can cause things to be inserted into
            domain_hash?

            + max_i= domain_hash.records;
            + for (i= 0; i < max_i; ++i)
            + {
            + bool active;
            + wait_for_commit my_orderer;
            + struct rpl_parallel_entry *e;
            +
            + e= (struct rpl_parallel_entry *)my_hash_element(&domain_hash, i);
            + mysql_mutex_lock(&e->LOCK_parallel_entry);
            + if ((active= (e->current_sub_id > e->last_committed_sub_id)))
            +

            { + wait_for_commit *waitee= &e->current_group_info->commit_orderer; + my_orderer.register_wait_for_prior_commit(waitee); + thd->wait_for_commit_ptr= &my_orderer; + }

            + mysql_mutex_unlock(&e->LOCK_parallel_entry);
            + if (active)
            +

            { + my_orderer.wait_for_prior_commit(thd); + thd->wait_for_commit_ptr= NULL; + }

            + }
            +}

            <cut>

            + if (typ == FORMAT_DESCRIPTION_EVENT)
            + {
            + Format_description_log_event *fdev=
            + static_cast<Format_description_log_event *>(ev);
            + if (fdev->created)
            +

            { + /* + This format description event marks a new binlog after a master server + restart. We are going to close all temporary tables to clean up any + possible left-overs after a prior master crash. + + Thus we need to wait for all prior events to execute to completion, + in case they need access to any of the temporary tables. + */ + wait_for_workers_idle(rli->sql_driver_thd); + }

            + }

            What happens if one of the thread we are waiting for fails and has to
            rollback or if wait_for_workers_idle() is interupted by a kill while we are
            in wait_for_prior_commit() ?

            Will rli->abort_slave or sql_thread_stopping be set in this case so
            that we don't drop the temporary tables while the other thread is
            retrying the statement ?

            Appart for the above questions, I don't have any other comments.

            Regards,
            Monty

            Show
            monty Michael Widenius added a comment - Review for patch 1 void +rpl_parallel::wait_for_workers_idle(THD *thd) +{ + uint32 i, max_i; + Add a comment why we don't need a mutex while we loop over domain_hash. (I noticed we do this a lot in the code already, but didn't find any comment why this is safe). I assume this is because this is called by the driver thread and it's only the driver thread that can cause things to be inserted into domain_hash? + max_i= domain_hash.records; + for (i= 0; i < max_i; ++i) + { + bool active; + wait_for_commit my_orderer; + struct rpl_parallel_entry *e; + + e= (struct rpl_parallel_entry *)my_hash_element(&domain_hash, i); + mysql_mutex_lock(&e->LOCK_parallel_entry); + if ((active= (e->current_sub_id > e->last_committed_sub_id))) + { + wait_for_commit *waitee= &e->current_group_info->commit_orderer; + my_orderer.register_wait_for_prior_commit(waitee); + thd->wait_for_commit_ptr= &my_orderer; + } + mysql_mutex_unlock(&e->LOCK_parallel_entry); + if (active) + { + my_orderer.wait_for_prior_commit(thd); + thd->wait_for_commit_ptr= NULL; + } + } +} <cut> + if (typ == FORMAT_DESCRIPTION_EVENT) + { + Format_description_log_event *fdev= + static_cast<Format_description_log_event *>(ev); + if (fdev->created) + { + /* + This format description event marks a new binlog after a master server + restart. We are going to close all temporary tables to clean up any + possible left-overs after a prior master crash. + + Thus we need to wait for all prior events to execute to completion, + in case they need access to any of the temporary tables. + */ + wait_for_workers_idle(rli->sql_driver_thd); + } + } What happens if one of the thread we are waiting for fails and has to rollback or if wait_for_workers_idle() is interupted by a kill while we are in wait_for_prior_commit() ? Will rli->abort_slave or sql_thread_stopping be set in this case so that we don't drop the temporary tables while the other thread is retrying the statement ? Appart for the above questions, I don't have any other comments. Regards, Monty
            Hide
            monty Michael Widenius added a comment -

            Done. See comments

            Show
            monty Michael Widenius added a comment - Done. See comments
            Hide
            knielsen Kristian Nielsen added a comment - - edited

            > What happens if one of the thread we are waiting for fails and has to
            > rollback or if wait_for_workers_idle() is interupted by a kill while we are
            > in wait_for_prior_commit() ?

            Good catch, if the SQL driver thread gets killed while in this wait, the code
            would drop temporary tables without completing the wait. I've fixed it to
            catch the kill and return an error in this case (normally the SQL driver
            thread should not be killed, but better to handle this correctly in any case).

            If one of the threads we are waiting for returns failure from
            wait_for_prior_commit(), then it means that this thread has received a fatal
            error and will no longer execute any events, so that's fine.

            (If a thread gets a temporary error and needs to retry a transaction, it will
            not wake up any wait_for_prior_commit() until the retry succeeded or a fatal
            error occured).

            Thanks for review!

            Show
            knielsen Kristian Nielsen added a comment - - edited > What happens if one of the thread we are waiting for fails and has to > rollback or if wait_for_workers_idle() is interupted by a kill while we are > in wait_for_prior_commit() ? Good catch, if the SQL driver thread gets killed while in this wait, the code would drop temporary tables without completing the wait. I've fixed it to catch the kill and return an error in this case (normally the SQL driver thread should not be killed, but better to handle this correctly in any case). If one of the threads we are waiting for returns failure from wait_for_prior_commit(), then it means that this thread has received a fatal error and will no longer execute any events, so that's fine. (If a thread gets a temporary error and needs to retry a transaction, it will not wake up any wait_for_prior_commit() until the retry succeeded or a fatal error occured). Thanks for review!
            Hide
            knielsen Kristian Nielsen added a comment -

            Pushed to 10.0

            Show
            knielsen Kristian Nielsen added a comment - Pushed to 10.0

              People

              • Assignee:
                knielsen Kristian Nielsen
                Reporter:
                knielsen Kristian Nielsen
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: