Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6489

rpl.rpl_insert, rpl.rpl_insert_delayed and main.mysqlslap fail on PPC64

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 10.0.12
    • Fix Version/s: 10.0.13
    • Component/s: None
    • Labels:
      None

      Description

      All of these tests execute mysqlslap, which deadlocks. Below is simplified code from mysqlslap which also deadlocks on PPC64:

      #include <pthread.h>
      
      pthread_mutex_t mutex;
      pthread_cond_t cond;
      int master_wakeup;
      
      static void *thread_start(void *arg)
      {
        pthread_mutex_lock(&mutex);
        while (master_wakeup)
          pthread_cond_wait(&cond, &mutex);
        pthread_mutex_unlock(&mutex);
      
        return 0;
      }
      
      int main(void)
      {
        int i, t;
        pthread_t thread_id[5];
      
        pthread_mutex_init(&mutex, 0);
        pthread_cond_init(&cond, 0);
      
        for (i= 0; i < 1000; i++)
        {
          master_wakeup= 1;
      
          for (t= 0; t < 5; t++)
            if (pthread_create(&thread_id[t], 0, thread_start, 0))
              return 1;
      
          pthread_mutex_lock(&mutex);
          master_wakeup= 0;
          pthread_mutex_unlock(&mutex);
          pthread_cond_broadcast(&cond);
      
          for (t= 0; t < 5; t++)
            pthread_join(thread_id[t], 0);
        }
      
        pthread_mutex_destroy(&mutex);
        pthread_cond_destroy(&cond);
      
        return 0;
      }
      

      If we move broadcast call up one line so that it is protected by the mutex, the program won't deadlock. I believe there should be no difference when we call broadcase, because the manual says:

        These functions atomically release mutex and cause the calling thread to block
        on the condition variable cond; atomically here means "atomically with respect
        to access by another thread to the mutex and then the condition variable".
        That is, if another thread is able to acquire the mutex after the
        about-to-block thread has released it, then a subsequent call to
        pthread_cond_broadcast() or pthread_cond_signal() in that thread shall behave
        as if it were issued after the about-to-block thread has blocked.
      

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              svoj Sergey Vojtovich added a comment -

              Sergei, please review fix for this bug.

              A patch has been pushed to 10.0.13:

              revno: 4306
              revision-id: svoj@mariadb.org-20140725130247-cl64fv8g6g2ydbq7
              parent: jplindst@mariadb.org-20140725073016-8y0e2u8zxd0x4z7t
              committer: Sergey Vojtovich <svoj@mariadb.org>
              branch nick: 10.0
              timestamp: Fri 2014-07-25 17:02:47 +0400
              message:
                MDEV-6489 - rpl.rpl_insert, rpl.rpl_insert_delayed and
                            main.mysqlslap fail on PPC64
                
                There seem to be a bug on Power8 which doesn't guarantee
                a signal to be delivered to waiting thread if broadcast
                is called outside of mutex.
                
                For now workaround it by calling broadcast while mutex is
                still held.
              
              Show
              svoj Sergey Vojtovich added a comment - Sergei, please review fix for this bug. A patch has been pushed to 10.0.13: revno: 4306 revision-id: svoj@mariadb.org-20140725130247-cl64fv8g6g2ydbq7 parent: jplindst@mariadb.org-20140725073016-8y0e2u8zxd0x4z7t committer: Sergey Vojtovich <svoj@mariadb.org> branch nick: 10.0 timestamp: Fri 2014-07-25 17:02:47 +0400 message: MDEV-6489 - rpl.rpl_insert, rpl.rpl_insert_delayed and main.mysqlslap fail on PPC64 There seem to be a bug on Power8 which doesn't guarantee a signal to be delivered to waiting thread if broadcast is called outside of mutex. For now workaround it by calling broadcast while mutex is still held.
              Hide
              serg Sergei Golubchik added a comment -

              Also, the manpage for pthread_cond_broadcast() is very explicit about it:

              The pthread_cond_broadcast() or pthread_cond_signal() functions may be called by a thread whether or not it currently owns the mutex that threads calling pthread_cond_wait() or pthread_cond_timedwait() have associated with the condition variable during their waits; however, if predictable scheduling behavior is required, then that mutex shall be locked by the thread calling pthread_cond_broadcast() or pthread_cond_signal().

              Show
              serg Sergei Golubchik added a comment - Also, the manpage for pthread_cond_broadcast() is very explicit about it: The pthread_cond_broadcast() or pthread_cond_signal() functions may be called by a thread whether or not it currently owns the mutex that threads calling pthread_cond_wait() or pthread_cond_timedwait() have associated with the condition variable during their waits; however, if predictable scheduling behavior is required, then that mutex shall be locked by the thread calling pthread_cond_broadcast() or pthread_cond_signal() .
              Hide
              serg Sergei Golubchik added a comment -

              ok to push

              Show
              serg Sergei Golubchik added a comment - ok to push

                People

                • Assignee:
                  svoj Sergey Vojtovich
                  Reporter:
                  svoj Sergey Vojtovich
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  2 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Time Tracking

                    Estimated:
                    Original Estimate - Not Specified
                    Not Specified
                    Remaining:
                    Remaining Estimate - 0 minutes
                    0m
                    Logged:
                    Time Spent - 15 minutes
                    15m