Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-3891

Deadlock with statistics tables and ANALYZE

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      (Initially found by Elena)

      I'll attach a testcase, which eventually gets three threads to be stuck in "Waiting for table level lock" state.

      The testcase must be run like this: ./mysql-test-run t/t45.test --repeat=100 (can also add --testcase-timeout=600 to ease debugging). Execution will get stuck after a minute. On my machine (pylon-fedora15), only release build gets stuck, and only when the server is started by MTR.

      Being stuck looks like this:

      +----+------+-----------------+------+---------+------+------------------------------+------------------------------------------------+----------+
      | Id | User | Host            | db   | Command | Time | State                        | Info                                           | Progress |
      +----+------+-----------------+------+---------+------+------------------------------+------------------------------------------------+----------+
      | 23 | root | localhost       | test | Sleep   |   17 |                              | NULL                                           |    0.000 |
      | 24 | root | localhost       | test | Query   |   17 | Waiting for table level lock | ANALYZE TABLE E                                |    0.000 |
      | 25 | root | localhost       | test | Query   |   17 | Waiting for table level lock | ANALYZE TABLE A                                |    0.000 |
      | 26 | root | localhost       | test | Query   |   17 | Waiting for table level lock | SELECT * FROM mysql.index_stat, test.E LIMIT 1 |    0.000 |
      | 27 | root | localhost:52086 | NULL | Query   |    0 | NULL                         | show processlist                               |    0.000 |
      +----+------+-----------------+------+---------+------+------------------------------+------------------------------------------------+----------+
      

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              psergey Sergei Petrunia added a comment -

              Debugging results #1

              *****************************************************************************

                • Thread 4
                  *****************************************************************************
                  "ANALYZE TABLE A"
              • Has a table lock on table A << How do we kow that??
                Tries to get three locks: { table_stat, column_stat, index_stat }, all for
                TL_WRITE..

                We're hung on getting the 3rd one.. it's a TL_WRITE lock on index_stat.

                #2 wait_for_lock (wait=0x7f868c038580, data=0x7f86800371a0, in_wait_list=0 '\000', lock_wait_timeout=<optimized out>) at /home/psergey/dev2/maria-5.5-mwl248/mysys/thr_lock.c:569
                (gdb) p data->lock->name
                $374 = 0x7f868c02a580 "index_stat"

                We are:
                (gdb) p thd->lock_info
                $405 = {thread = 140216257513216, thread_id = 25}

                We're waiting on: index_stat,:

                (gdb) p * data->lock->read->data->owner
                $415 = {thread = 140216257214208, thread_id = 26}

                *****************************************************************************
                ** Thread 3
                *****************************************************************************
                "SELECT * FROM mysql.index_stat, test.E LIMIT 1"

                #3 in thr_lock ()
                #4 thr_multi_lock ()
                #5 in mysql_lock_tables ()
                #6 in mysql_lock_tables ()
                #7 in lock_tables ()
                #8 in open_and_lock_tables ()
                #9 in open_and_lock_tables ()
                #10 open_system_tables_for_read ()
                #11 in read_statistics_for_tables_if_needed ()

                Here:
                lock_count=3, { table_stat, column_stat, index_stat }


                type=TL_READ on every one.

              We're stuck when locking table #0, table_stat.
              #2 wait_for_lock (wait=0x7f868c02d620, data=0x7f868002d510, in_wait_list=0 '\000', lock_wait_timeout=<optimized out>) at /home/psergey/dev2/maria-5.5-mwl248/mysys/thr_lock.c:569
              (gdb) p data->lock->name
              $376 = 0x7f868c02a510 "table_stat"

              We are:
              (gdb) p thd->lock_info
              $421 =

              {thread = 140216257214208, thread_id = 26}

              We're waiting on: table_stat, READ lock.

              (gdb) p * data->lock->write->data->owner
              $435 =

              {thread = 140216257513216, thread_id = 25}


              *****************************************************************************
              ** Thread 5
              *****************************************************************************
              "ANALYZE TABLE E"

              lock_count=3, all TL_WRITE, { table_stat, column_stat, index_stat },
              we're locking 0th table.

              (gdb) up
              #2 wait_for_lock (wait=0x7f868c02d640, data=0x7f868c023ce0, in_wait_list=0 '\000', lock_wait_timeout=<optimized out>) at /home/psergey/dev2/maria-5.5-mwl248/mysys/thr_lock.c:569
              (gdb) p data->lock->name
              $378 = 0x7f868c02a510 "table_stat"

              (gdb) p thd->lock_info
              $400 = {thread = 140216257812224, thread_id = 24}

              (gdb) p *data->lock->write->data->owner
              $397 = {thread = 140216257513216, thread_id = 25}

              Ok, we're thread 24 and we're waiting for "table_stat" because thread 25 has it.

              Show
              psergey Sergei Petrunia added a comment - Debugging results #1 ***************************************************************************** Thread 4 ***************************************************************************** "ANALYZE TABLE A" Has a table lock on table A << How do we kow that?? Tries to get three locks: { table_stat, column_stat, index_stat }, all for TL_WRITE.. We're hung on getting the 3rd one.. it's a TL_WRITE lock on index_stat. #2 wait_for_lock (wait=0x7f868c038580, data=0x7f86800371a0, in_wait_list=0 '\000', lock_wait_timeout=<optimized out>) at /home/psergey/dev2/maria-5.5-mwl248/mysys/thr_lock.c:569 (gdb) p data->lock->name $374 = 0x7f868c02a580 "index_stat" We are: (gdb) p thd->lock_info $405 = {thread = 140216257513216, thread_id = 25} We're waiting on: index_stat,: (gdb) p * data->lock->read->data->owner $415 = {thread = 140216257214208, thread_id = 26} ***************************************************************************** ** Thread 3 ***************************************************************************** "SELECT * FROM mysql.index_stat, test.E LIMIT 1" #3 in thr_lock () #4 thr_multi_lock () #5 in mysql_lock_tables () #6 in mysql_lock_tables () #7 in lock_tables () #8 in open_and_lock_tables () #9 in open_and_lock_tables () #10 open_system_tables_for_read () #11 in read_statistics_for_tables_if_needed () Here: lock_count=3, { table_stat, column_stat, index_stat } type=TL_READ on every one. We're stuck when locking table #0, table_stat. #2 wait_for_lock (wait=0x7f868c02d620, data=0x7f868002d510, in_wait_list=0 '\000', lock_wait_timeout=<optimized out>) at /home/psergey/dev2/maria-5.5-mwl248/mysys/thr_lock.c:569 (gdb) p data->lock->name $376 = 0x7f868c02a510 "table_stat" We are: (gdb) p thd->lock_info $421 = {thread = 140216257214208, thread_id = 26} We're waiting on: table_stat, READ lock. (gdb) p * data->lock->write->data->owner $435 = {thread = 140216257513216, thread_id = 25} ***************************************************************************** ** Thread 5 ***************************************************************************** "ANALYZE TABLE E" lock_count=3, all TL_WRITE, { table_stat, column_stat, index_stat }, we're locking 0th table. (gdb) up #2 wait_for_lock (wait=0x7f868c02d640, data=0x7f868c023ce0, in_wait_list=0 '\000', lock_wait_timeout=<optimized out>) at /home/psergey/dev2/maria-5.5-mwl248/mysys/thr_lock.c:569 (gdb) p data->lock->name $378 = 0x7f868c02a510 "table_stat" (gdb) p thd->lock_info $400 = {thread = 140216257812224, thread_id = 24} (gdb) p *data->lock->write->data->owner $397 = {thread = 140216257513216, thread_id = 25} Ok, we're thread 24 and we're waiting for "table_stat" because thread 25 has it.
              Hide
              psergey Sergei Petrunia added a comment -

              A hypothesis:

              Deadlock is caused by this sequence of events:

              SELECT takes a read lock on

              {mysql.index_stat, E}

              ANALYZE TABLE A tries to take write locks on

              {table_stat, column_stat, index_stat}

              it gets locks on table_stat and column_stat
              it doesn't get lock on index_stat, because it is locked.
              It ends up waiting.

              SELECT tries to load statistics data for the tables it is using.
              table_stat is locked by the ANALYZE command, so it is waiting.

              (The other ANALYZE command is also waiting, but it not actually pariticipating
              in the deadlock. Perhaps, it is necessary so that SELECT will try to load
              statistics data)

              Show
              psergey Sergei Petrunia added a comment - A hypothesis: Deadlock is caused by this sequence of events: SELECT takes a read lock on {mysql.index_stat, E} ANALYZE TABLE A tries to take write locks on {table_stat, column_stat, index_stat} it gets locks on table_stat and column_stat it doesn't get lock on index_stat, because it is locked. It ends up waiting. SELECT tries to load statistics data for the tables it is using. table_stat is locked by the ANALYZE command, so it is waiting. (The other ANALYZE command is also waiting, but it not actually pariticipating in the deadlock. Perhaps, it is necessary so that SELECT will try to load statistics data)
              Hide
              elenst Elena Stepanova added a comment -

              Info requested by Igor: I observed the problem on maria-5.5-mwl248 revno 3569 (and AFAIR also double-checked later on 3570).

              Show
              elenst Elena Stepanova added a comment - Info requested by Igor: I observed the problem on maria-5.5-mwl248 revno 3569 (and AFAIR also double-checked later on 3570).
              Hide
              elenst Elena Stepanova added a comment -

              The bug was fixed in maria-5.5-mwl248 by the following revision:

              revno: 3577
              revision-id: igor@askmonty.org-20121213071654-k1ghi7u5s10nvnno
              parent: igor@askmonty.org-20121210053308-o0qpwt4upw7e9740
              committer: Igor Babaev <igor@askmonty.org>
              branch nick: maria-5.5-mwl248
              timestamp: Wed 2012-12-12 23:16:54 -0800
              message:
                Fixed bug mdev-3891.
                If a query referenced some system statistical tables, but not all of them,
                then executing an ANALYZE command simultaneously with this query could
                lead to a deadlock.
                The fix prohibited reading statistics from system statistical tables
                for such queries.
                
                Removed the function unlock_tables_n_open_system_tables_for_write()
                as not used anymore.
                Performed some minor refactoring of the code in sql_statistics.cc. 
              
              Show
              elenst Elena Stepanova added a comment - The bug was fixed in maria-5.5-mwl248 by the following revision: revno: 3577 revision-id: igor@askmonty.org-20121213071654-k1ghi7u5s10nvnno parent: igor@askmonty.org-20121210053308-o0qpwt4upw7e9740 committer: Igor Babaev <igor@askmonty.org> branch nick: maria-5.5-mwl248 timestamp: Wed 2012-12-12 23:16:54 -0800 message: Fixed bug mdev-3891. If a query referenced some system statistical tables, but not all of them, then executing an ANALYZE command simultaneously with this query could lead to a deadlock. The fix prohibited reading statistics from system statistical tables for such queries. Removed the function unlock_tables_n_open_system_tables_for_write() as not used anymore. Performed some minor refactoring of the code in sql_statistics.cc.

                People

                • Assignee:
                  psergey Sergei Petrunia
                  Reporter:
                  psergey Sergei Petrunia
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  2 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Time Tracking

                    Estimated:
                    Original Estimate - Not Specified
                    Not Specified
                    Remaining:
                    Remaining Estimate - 0 minutes
                    0m
                    Logged:
                    Time Spent - 4 hours, 30 minutes
                    4h 30m