Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8251

MariaDB server hang with all threads idle

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 10.0.17
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Environment:
      FreeBSD 10.1-RELEASE-p7 / x86-64 / MariaDB 10.0.17

      Description

      I have a production MariaDB server which is occasionally experiencing a hang; it refuses new connections, does not create any more threads, and stays at 0% cpu until forced to restart.

      I'm using threadpool:
      thread_handling=pool-of-threads
      thread_pool_size=48
      thread_pool_max_threads=128
      thread_pool_idle_timeout=30

      and xtrabackup. The issue always occurs around the same time of day (once every week or two) so I suspect it may be related to xtrabackup's pausing the server to backup.

      Captured a gdb backtrace in this state, but imposed a workaround of restarting the process rather than try to debug further, since the server is being used.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            elenst Elena Stepanova added a comment -

            Hi,

            Did you, by any chance, happen to capture two consequent backtraces in this state, so we could be sure it totally froze and wasn't doing anything at all?

            In the provided stack trace, at least thread 88 does not appear to be waiting on anything, so it would be useful to understand whether it was really stuck in this strange place, or it was still doing something, even though so lazily and slowly that the CPU usage appeared to be 0.

            Show
            elenst Elena Stepanova added a comment - Hi, Did you, by any chance, happen to capture two consequent backtraces in this state, so we could be sure it totally froze and wasn't doing anything at all? In the provided stack trace, at least thread 88 does not appear to be waiting on anything, so it would be useful to understand whether it was really stuck in this strange place, or it was still doing something, even though so lazily and slowly that the CPU usage appeared to be 0.
            Hide
            elenst Elena Stepanova added a comment -

            Jan Lindström,

            Could you please take a look at the stack trace – does it look like a hang to you? And if it does, maybe you've seen it before?

            Show
            elenst Elena Stepanova added a comment - Jan Lindström , Could you please take a look at the stack trace – does it look like a hang to you? And if it does, maybe you've seen it before?
            Hide
            jplindst Jan Lindström added a comment -

            Hi,

            Firstly, I suggest that migrate to latest MariaDB version 10.0.20, if this problem repeats I would need output from following

            (1) show processlist; (do this several times e.g. every 1 minute)
            (2) show innodb status; (do this also several times e.g. every 1 minute)
            (3) wait at least 600s
            (4) provide full unedited error log (make sure you have that enabled)
            (5) do several gdb stack outputs e.g. 3 on 1 minute wait between
            (7) see top or similar several times e.g. every 1 minute

            From current stack trace it does not look like a hang.

            R: Jan

            Show
            jplindst Jan Lindström added a comment - Hi, Firstly, I suggest that migrate to latest MariaDB version 10.0.20, if this problem repeats I would need output from following (1) show processlist; (do this several times e.g. every 1 minute) (2) show innodb status; (do this also several times e.g. every 1 minute) (3) wait at least 600s (4) provide full unedited error log (make sure you have that enabled) (5) do several gdb stack outputs e.g. 3 on 1 minute wait between (7) see top or similar several times e.g. every 1 minute From current stack trace it does not look like a hang. R: Jan
            Hide
            elenst Elena Stepanova added a comment -

            Alex Strange,

            Did you have a chance to upgrade? Are you still experiencing the problem?

            Show
            elenst Elena Stepanova added a comment - Alex Strange , Did you have a chance to upgrade? Are you still experiencing the problem?
            Hide
            astrange Alex Strange added a comment -

            We're running the latest 10.0 version, but have a workaround in place where it gets restarted on a schedule, which is preventing it from reoccurring. I'll disable that.

            Show
            astrange Alex Strange added a comment - We're running the latest 10.0 version, but have a workaround in place where it gets restarted on a schedule, which is preventing it from reoccurring. I'll disable that.
            Hide
            elenst Elena Stepanova added a comment -

            Alex Strange,

            Did you disable the workaround? What was the result?

            Show
            elenst Elena Stepanova added a comment - Alex Strange , Did you disable the workaround? What was the result?

              People

              • Assignee:
                Unassigned
                Reporter:
                astrange Alex Strange
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: