Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Incomplete
    • Affects Version/s: 5.5.36
    • Fix Version/s: N/A
    • Component/s: None
    • Labels:
      None
    • Environment:
      Linux FC11

      Description

      Hi Guys, im having a query that takes like 2-5 seconds to execute, but sometimes is causing a deadlock...

      Like 1 thread is "Sending data" indefinitely (>3 hours) and everything else is waiting in "Waiting for table metadata lock", this query would take only a couple of seconds, it's issued eg. 5000 times in short period of time (with very short wait between the queries)

      From spec i'm seeing this:
      >To ensure transaction serializability, the server must not permit one session to perform a data definition language (DDL) statement on a table that is used in an uncompleted transaction in another session.

      Question is why all threads are waiting for "DDL" lock while no DDL operations are running (does lock table for write requires metadata lock?)

      (A) Here is the offending query, after killing it everything gets processed.
      http://screencast.com/t/Pg7fbNXZo

      (B) There is another query like this, this was issued after the first one...
      http://screencast.com/t/0QqRi8fZMWXQ

      General states:
      http://screencast.com/t/XOvtlqp4egB

      Everything is waiting for metadata, one replication, one send thread.

      The questions are ...
      1. Bug?
      2. Could the second query (B) issued after (A) cause deadlock because it's accessing similar tables?
      3. CREATE TEMPORARY ... SELECT ... FROM A, B, C will grab DDL WRITE (not read) lock no only on the temp table but also on ABC?
      4. If server picked only one thread to "run" why it isn't able to get any data (if it's sending, all locks should be acquired already?)

      For reproducing, i can only send SHOW PROCESSLIST dump, because this happens like once every month, so no idea how more info about the issue can be extracted...

      Thanks as always...

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            pslawek83 Slawomir Pryczek added a comment - - edited

            I was able to fix it by issuing
            SET GLOBAL table_open_cache=64 (lowering from 400)
            There was like 900 queries, all waiting for 1200 seconds, after issuing that, all the queries were instantly executed (after 2-3 seconds there were no waiting queries)

            When it was at 400 - just one thread from 32 was at 100% CPU usage, with vmstat giving no read / writes ... is that possible that some OS limit prevented files from being open and mysql isn't reporting it in error log / dmesg?

            Or it may be some problem with table cache (as changing it caused the problem to perish)? Is there possibility to add some timeout for opening files?

            Show
            pslawek83 Slawomir Pryczek added a comment - - edited I was able to fix it by issuing SET GLOBAL table_open_cache=64 (lowering from 400) There was like 900 queries, all waiting for 1200 seconds, after issuing that, all the queries were instantly executed (after 2-3 seconds there were no waiting queries) When it was at 400 - just one thread from 32 was at 100% CPU usage, with vmstat giving no read / writes ... is that possible that some OS limit prevented files from being open and mysql isn't reporting it in error log / dmesg? Or it may be some problem with table cache (as changing it caused the problem to perish)? Is there possibility to add some timeout for opening files?
            Hide
            pslawek83 Slawomir Pryczek added a comment -

            Actually it's worse, after increasing number of files it seems that there is a deadlock:

            77.93% mysqld [.] MDL_lock::visit_subgraph(MDL_ticket*, MDL_wait_for_graph_visitor*)
            6.59% mysqld [.] flush_pagecache_blocks_int
            6.10% mysqld [.]

            Nothing gets processed... any advices on that?

            Show
            pslawek83 Slawomir Pryczek added a comment - Actually it's worse, after increasing number of files it seems that there is a deadlock: 77.93% mysqld [.] MDL_lock::visit_subgraph(MDL_ticket*, MDL_wait_for_graph_visitor*) 6.59% mysqld [.] flush_pagecache_blocks_int 6.10% mysqld [.] Nothing gets processed... any advices on that?
            Hide
            elenst Elena Stepanova added a comment -

            Hi Slawomir,

            Regarding the last comment:
            What is this last deadlock?
            After increasing which number of files?
            Where does the info come from?
            What query is it happening on?

            For the previous occasions – does the error log say anything?
            Did you try to execute this big SELECT (the offending one), without CREATE TEMPORARY TABLE part, just the SELECT – how fast is it and how big the result set is?

            Thanks.

            Show
            elenst Elena Stepanova added a comment - Hi Slawomir, Regarding the last comment: What is this last deadlock? After increasing which number of files? Where does the info come from? What query is it happening on? For the previous occasions – does the error log say anything? Did you try to execute this big SELECT (the offending one), without CREATE TEMPORARY TABLE part, just the SELECT – how fast is it and how big the result set is? Thanks.
            Hide
            pslawek83 Slawomir Pryczek added a comment -

            Hi Elena... thanks for the response

            I'm using aria engine tables, there is no deadlock per-se, like innodb deadlock... just all threads are waiting in "opening tables" state, then they all, go into "waiting for metadata lock" state then after like 1h everything gets back to normal if we're lucky if not it'll stay like that for more time.

            What i realized, i killed all ariadb queries, one by one... and some processes just lingered in "Killed" state like for half hour. New processes went straight to table lock state, like opening tables was somehow locked globally

            I realized innodb queries seem to be processed at that time, so just like aria is locked up...

            The query takes about 1-2 seconds, rather small result set - 2k rows always (we process in 2k batches)

            The error is very hard to get... like the server is working ok for 1-2 days and then it locks suddenly for a couple of hours. Nothing in error log / dmesg. We simplified that create table and it helped a little, so this error doesn't appear that often. Is that possible something with aria engine is broken, like i don't know it's unable to process many threads waiting for locks?

            77.93% mysqld [.] MDL_lock::visit_subgraph(MDL_ticket*, MDL_wait_for_graph_visitor*)

            Thanks...

            Show
            pslawek83 Slawomir Pryczek added a comment - Hi Elena... thanks for the response I'm using aria engine tables, there is no deadlock per-se, like innodb deadlock... just all threads are waiting in "opening tables" state, then they all, go into "waiting for metadata lock" state then after like 1h everything gets back to normal if we're lucky if not it'll stay like that for more time. What i realized, i killed all ariadb queries, one by one... and some processes just lingered in "Killed" state like for half hour. New processes went straight to table lock state, like opening tables was somehow locked globally I realized innodb queries seem to be processed at that time, so just like aria is locked up... The query takes about 1-2 seconds, rather small result set - 2k rows always (we process in 2k batches) The error is very hard to get... like the server is working ok for 1-2 days and then it locks suddenly for a couple of hours. Nothing in error log / dmesg. We simplified that create table and it helped a little, so this error doesn't appear that often. Is that possible something with aria engine is broken, like i don't know it's unable to process many threads waiting for locks? 77.93% mysqld [.] MDL_lock::visit_subgraph(MDL_ticket*, MDL_wait_for_graph_visitor*) Thanks...
            Hide
            pslawek83 Slawomir Pryczek added a comment -

            Is that possible that innodb took some shared resources (like some mutex or file handles) and that made aria unable to process requests?

            Show
            pslawek83 Slawomir Pryczek added a comment - Is that possible that innodb took some shared resources (like some mutex or file handles) and that made aria unable to process requests?
            Hide
            pslawek83 Slawomir Pryczek added a comment -

            There are some warnings in error log, but these seem unrelated

            [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. INSERT... ON DUPLICATE KEY UPDATE on a table with more than one UNIQUE KEY is unsafe Statement

            Show
            pslawek83 Slawomir Pryczek added a comment - There are some warnings in error log, but these seem unrelated [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. INSERT... ON DUPLICATE KEY UPDATE on a table with more than one UNIQUE KEY is unsafe Statement
            Hide
            elenst Elena Stepanova added a comment - - edited

            Hi Slawomir,

            For reproducing, i can only send SHOW PROCESSLIST dump

            Please do, you can attach it to this report, or upload to the private section of our ftp (ftp.askmonty.org/private), whichever you prefer.
            Are there any hanging (waiting for the lock) queries involving tables from `stats` schema, those that are also involved into the offending query?

            Do you happen to also have SHOW STATUS and/or SHOW GLOBAL STATUS in your monitoring? If you do, please attach it is well.

            A generalized answer to your questions of the "is it possible" kind is that what you observe shouldn't be happening, but everything is possible if there is a bug, so it would be irresponsible to say a definite "no" without deeper investigation.

            I was able to fix it by issuing
            SET GLOBAL table_open_cache=64

            Have you kept the new value? Did the problem ever re-appear with this new value?

            after increasing number of files it seems that there is a deadlock

            77.93% mysqld [.] MDL_lock::visit_subgraph(MDL_ticket*, MDL_wait_for_graph_visitor*)
            6.59% mysqld [.] flush_pagecache_blocks_int
            6.10% mysqld [.]

            Which number of files did you increase?
            Which query did you get this percentage from?

            any advices on that?

            You can try to decrease the value of lock_wait_timeout (by default it's 1 year). Set it to lets say 1 hour, or 10 min, or 1 min, or whichever you think is reasonable with your data. It will mean that whatever is waiting for any table lock will hit the timeout and fail with the timeout error, thus the deadlock, unless it's a true permanent deadlock, should resolve itself. If having some queries fail with the timeout suits your workflow, you can use it as a workaround.

            Show
            elenst Elena Stepanova added a comment - - edited Hi Slawomir, For reproducing, i can only send SHOW PROCESSLIST dump Please do, you can attach it to this report, or upload to the private section of our ftp (ftp.askmonty.org/private), whichever you prefer. Are there any hanging (waiting for the lock) queries involving tables from `stats` schema, those that are also involved into the offending query? Do you happen to also have SHOW STATUS and/or SHOW GLOBAL STATUS in your monitoring? If you do, please attach it is well. A generalized answer to your questions of the "is it possible" kind is that what you observe shouldn't be happening, but everything is possible if there is a bug, so it would be irresponsible to say a definite "no" without deeper investigation. I was able to fix it by issuing SET GLOBAL table_open_cache=64 Have you kept the new value? Did the problem ever re-appear with this new value? after increasing number of files it seems that there is a deadlock 77.93% mysqld [.] MDL_lock::visit_subgraph(MDL_ticket*, MDL_wait_for_graph_visitor*) 6.59% mysqld [.] flush_pagecache_blocks_int 6.10% mysqld [.] Which number of files did you increase? Which query did you get this percentage from? any advices on that? You can try to decrease the value of lock_wait_timeout (by default it's 1 year). Set it to lets say 1 hour, or 10 min, or 1 min, or whichever you think is reasonable with your data. It will mean that whatever is waiting for any table lock will hit the timeout and fail with the timeout error, thus the deadlock, unless it's a true permanent deadlock, should resolve itself. If having some queries fail with the timeout suits your workflow, you can use it as a workaround.
            Hide
            elenst Elena Stepanova added a comment -

            Closing as incomplete for now, if you have additional information, please comment to re-open.

            Show
            elenst Elena Stepanova added a comment - Closing as incomplete for now, if you have additional information, please comment to re-open.

              People

              • Assignee:
                elenst Elena Stepanova
                Reporter:
                pslawek83 Slawomir Pryczek
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Due:
                  Created:
                  Updated:
                  Resolved: