Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8325

Server crash in pagecache_fwrite after killing query consuming a lot of tmp disk space

    Details

      Description

      killing a long running query caused MySQL to crash:

      | 7552649 | ttahon          | 10.10.0.55:49241  | PRODUCTION | Killed  |   56220 | Copying to tmp table on disk | SELECT
                 PROD_PRODUITS.LIBELLE
      
                AS libelle,
             PROD_COMMANDES.D |    0.000 |
      

      The above query created a very large temporary file and hence filled disk:

      150616 20:13:41 [Warning] mysqld: Disk is full writing '/tmp/#sql_2741_0.MAD' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue afte
      r freeing disk space)
      150616 20:13:41 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs
      150616 20:14:41 [Warning] mysqld: Disk is full writing '/tmp/#sql_2741_0.MAD' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue afte
      r freeing disk space)
      150616 20:14:41 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs
      150616 20:15:41 [Warning] mysqld: Disk is full writing '/tmp/#sql_2741_0.MAD' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue afte
      r freeing disk space)
      150616 20:15:41 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs
      150616 20:16:26 [Warning] mysqld: Disk is full writing '/tmp/#sql_2741_1.MAD' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue afte
      r freeing disk space)
      150616 20:16:26 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs
      150616 20:25:41 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs
      150616 20:26:26 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs
      150616 20:35:41 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs
      150616 20:36:26 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs
      150616 20:45:41 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs
      

      When we killed long running query:

      MariaDB [PRODUCTION]> kill 7552649;
      Query OK, 0 rows affected (0.00 sec)
      
      MariaDB [PRODUCTION]> show processlist;
      +---------+-----------------+-------------------+------------+---------+---------+------------------------------+------------------------------------------------------------------------------------------------------+----------+
      | Id      | User            | Host              | db         | Command | Time    | State                        | Info                                                                                                 | Progress |
      +---------+-----------------+-------------------+------------+---------+---------+------------------------------+------------------------------------------------------------------------------------------------------+----------+
      |       1 | system user     |                   | NULL       | Sleep   | 3698081 | wsrep aborter idle           | NULL                                                                                                 |    0.000 |
      |       2 | system user     |                   | NULL       | Sleep   |       0 | closing tables               | NULL                                                                                                 |    0.000 |
      |       4 | event_scheduler | localhost         | NULL       | Daemon  | 3698078 | Waiting on empty queue       | NULL                                                                                                 |    0.000 |
      | 7552649 | ttahon          | 10.10.0.55:49241  | PRODUCTION | Killed  |   56220 | Copying to tmp table on disk | SELECT
                 PROD_PRODUITS.LIBELLE
      
                AS libelle,
             PROD_COMMANDES.D |    0.000 |
      | 7562009 | user4qlickview  | 10.10.0.55:42983  | PRODUCTION | Query   |   50335 | converting HEAP to Aria      | SELECT distinct
              ID_PROD_ITEM as 'ID PROD ITEM',
              PAGES
      

      Few minutes later MySQL server crashed:

      150617 10:15:42 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
      
      To report this bug, see http://kb.askmonty.org/en/reporting-bugs
      
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.
      
      Server version: 10.0.14-MariaDB-1~precise-wsrep-log
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=15
      max_threads=502
      thread_count=10
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 5346411 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
      
      Thread pointer: 0x0x7f0adfa93008
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7f11c1557da0 thread_stack 0x48000
      /usr/sbin/mysqld(my_print_stacktrace+0x2b)[0xb9025b]
      /usr/sbin/mysqld(handle_fatal_signal+0x398)[0x741088]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f11c12dfcb0]
      
      Trying to get some variables.
      
      Some pointers may be invalid and cause the dump to abort.
      Query (0x7f0ac2c8b020): is an invalid pointer
      Connection ID (thread ID): 7552649
      Status: KILL_CONNECTION
      
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on
      
      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.
      150617 10:15:43 mysqld_safe Number of processes running now: 0
      150617 10:15:43 mysqld_safe WSREP: not restarting wsrep node automatically
      150617 10:15:43 mysqld_safe mysqld from pid file /data/mysql/data/ifactory-sart-db-node-03.pid ended
      150617 10:20:54 mysqld_safe Starting mysqld daemon with databases from /data/mysql/data
      150617 10:20:54 mysqld_safe WSREP: Running position recovery with --log_error='/data/mysql/data/wsrep_recovery.l0XA1P' --pid-file='/data/mysql/data/ifactory-sart-db-node-03-recover.pid'
      150617 10:21:14 mysqld_safe WSREP: Recovered position 10f20163-668b-11e4-bfda-7ef6da2a3a6b:408209687
      150617 10:21:14 [Note] WSREP: wsrep_start_position var submitted: '10f20163-668b-11e4-bfda-7ef6da2a3a6b:408209687'
      150617 10:21:14 [Note] WSREP: Read nil XID from storage engines, skipping position init
      150617 10:21:14 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
      150617 10:21:14 [Note] WSREP: wsrep_load(): Galera 25.3.5-wheezy(rXXXX) by Codership Oy <info@codership.com> loaded successfully.
      150617 10:21:14 [Note] WSREP: CRC-32C: using hardware acceleration.
      150617 10:21:14 [Note] WSREP: Found saved state: 10f20163-668b-11e4-bfda-7ef6da2a3a6b:-1
      150617 10:21:14 [Note] WSREP: Passing config to GCS: base_host = 10.10.19.3; base_port = 4567; cert.log_conflicts = no; debug = no; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /data/mysql/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/mysql/data//galera.cache; gcache.page_size = 128M; gcache.size = 10G; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = P30S; pc.weight = 1; proton
      150617 10:21:14 [Note] WSREP: Service thread queue flushed.
      
      

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              elenst Elena Stepanova added a comment -

              Regarding the query, you don't need to execute it, just paste it – as you can see, it's truncated in the processlist output.
              EXPLAIN also shouldn't be heavy or long, but if you are concerned about it, lets look at the query and structures first.

              Show
              elenst Elena Stepanova added a comment - Regarding the query, you don't need to execute it, just paste it – as you can see, it's truncated in the processlist output. EXPLAIN also shouldn't be heavy or long, but if you are concerned about it, lets look at the query and structures first.
              Hide
              aftab.khan aftab khan added a comment -

              Complete query (and EXPLAIN output) is appended to the attached file
              already.
              On 17 Jun 2015 17:01, "Elena Stepanova (JIRA)" <jira@mariadb.atlassian.net>

              Show
              aftab.khan aftab khan added a comment - Complete query (and EXPLAIN output) is appended to the attached file already. On 17 Jun 2015 17:01, "Elena Stepanova (JIRA)" <jira@mariadb.atlassian.net>
              Hide
              elenst Elena Stepanova added a comment -

              aftab khan,

              Thanks for the info, I was able to reproduce the problem with your table structures and query, and my artificially generated data.

              While looking into it, I noticed something suspicious about your query. It's not an excuse for the crash, but a side note which you might want to consider.
              The WHERE clause looks like this (I obfuscated it in case it's confidential):
              WHERE cond1 AND cond2 ... AND cond9 AND colX LIKE '%foo%' OR colX LIKE '%bar%'
              Now, I would expect that the last (and only) OR applies to the last two conditions on colX, i.e. WHERE cond1 AND cond2 ... AND cond9 AND (colX LIKE '%foo%' OR colX LIKE '%bar%'). Only there are no brackets in the query, so in fact it's WHERE (cond1 AND cond2 ... AND cond9 AND colX LIKE '%foo%') OR colX LIKE '%bar%' – which, of course, technically correct, but I'm not sure that's what you meant.

              Show
              elenst Elena Stepanova added a comment - aftab khan , Thanks for the info, I was able to reproduce the problem with your table structures and query, and my artificially generated data. While looking into it, I noticed something suspicious about your query. It's not an excuse for the crash, but a side note which you might want to consider. The WHERE clause looks like this (I obfuscated it in case it's confidential): WHERE cond1 AND cond2 ... AND cond9 AND colX LIKE '%foo%' OR colX LIKE '%bar%' Now, I would expect that the last (and only) OR applies to the last two conditions on colX, i.e. WHERE cond1 AND cond2 ... AND cond9 AND (colX LIKE '%foo%' OR colX LIKE '%bar%') . Only there are no brackets in the query, so in fact it's WHERE (cond1 AND cond2 ... AND cond9 AND colX LIKE '%foo%') OR colX LIKE '%bar%' – which, of course, technically correct, but I'm not sure that's what you meant.
              Hide
              elenst Elena Stepanova added a comment -

              I've uploaded the artificial data dump and the query to ftp.askmonty.org/private.

              To reproduce the problem:

              • start the server (all default options are enough, just make sure that tmpdir can be safely filled up);
              • feed the data dump to it (MDEV-8325-test-data.dump);
              • run the query (MDEV-8325-query.sql);
              • observe the tmp table growing;
              • after the tmp disk space is exhausted, kill the query;
              • wait for a while.
              MariaDB [test]> select @@version;
              +-----------------------+
              | @@version             |
              +-----------------------+
              | 10.0.20-MariaDB-debug |
              +-----------------------+
              1 row in set (0.00 sec)
              
              MariaDB [test]> show processlist;
              +----+------+-----------------+------+---------+------+------------------------------+-------------------------------------------------------------+----------+
              | Id | User | Host            | db   | Command | Time | State                        | Info                                                        | Progress |
              +----+------+-----------------+------+---------+------+------------------------------+-------------------------------------------------------------+----------+
              |  3 | root | localhost:34462 | test | Query   | 7173 | Copying to tmp table on disk | SELECT PROD_PRODUITS.LIBELLE AS libelle,   PROD_COMMANDES.D |    0.000 |
              |  4 | root | localhost:34463 | test | Query   |    0 | init                         | show processlist                                            |    0.000 |
              +----+------+-----------------+------+---------+------+------------------------------+-------------------------------------------------------------+----------+
              2 rows in set (0.00 sec)
              
              MariaDB [test]> select @@tmpdir;
              +----------------+
              | @@tmpdir       |
              +----------------+
              | /data/logs/tmp |
              +----------------+
              1 row in set (0.00 sec)
              
              MariaDB [test]> system df -k /data/logs/tmp/
              Filesystem     1K-blocks     Used Available Use% Mounted on
              /dev/sdb        61927420 58781912         0 100% /data/logs
              MariaDB [test]> system ls -ltr /data/logs/tmp/\#*
              -rw-rw---- 1 elenst disk        8192 Jun 19 01:23 /data/logs/tmp/#sql_1fdf_0.MAI
              -rw-rw---- 1 elenst disk 15826214912 Jun 19 03:22 /data/logs/tmp/#sql_1fdf_0.MAD
              
              MariaDB [test]> select @@log_error;
              +------------------------------------+
              | @@log_error                        |
              +------------------------------------+
              | /home/elenst/git/10.0/data/log.err |
              +------------------------------------+
              1 row in set (0.00 sec)
              
              MariaDB [test]> system tail /home/elenst/git/10.0/data/log.err
              150619  1:21:06 [Note] InnoDB: Highest supported file format is Barracuda.
              150619  1:21:07 [Note] InnoDB: 128 rollback segment(s) are active.
              150619  1:21:07 [Note] InnoDB: Waiting for purge to start
              150619  1:21:07 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.24-72.2 started; log sequence number 762743680
              150619  1:21:07 [Note] Server socket created on IP: '::'.
              150619  1:21:08 [Note] Event Scheduler: Loaded 0 events
              150619  1:21:08 [Note] /home/elenst/git/10.0/sql/mysqld: ready for connections.
              Version: '10.0.20-MariaDB-debug'  socket: '/data/logs/tmp/mysql.sock'  port: 3306  Source distribution
              150619  3:21:55 [Warning] mysqld: Disk is full writing '/data/logs/tmp/#sql_1fdf_0.MAD' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)
              150619  3:21:55 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs
              
              MariaDB [test]> kill query 3;
              Query OK, 0 rows affected (0.00 sec)
              
              MariaDB [test]> show processlist;
              +----+------+-----------------+------+---------+------+------------------------------+-------------------------------------------------------------+----------+
              | Id | User | Host            | db   | Command | Time | State                        | Info                                                        | Progress |
              +----+------+-----------------+------+---------+------+------------------------------+-------------------------------------------------------------+----------+
              |  3 | root | localhost:34462 | test | Killed  | 7209 | Copying to tmp table on disk | SELECT PROD_PRODUITS.LIBELLE AS libelle,   PROD_COMMANDES.D |    0.000 |
              |  4 | root | localhost:34463 | test | Query   |    0 | init                         | show processlist                                            |    0.000 |
              +----+------+-----------------+------+---------+------+------------------------------+-------------------------------------------------------------+----------+
              2 rows in set (0.00 sec)
              
              -- wait a few minutes...
              
              MariaDB [test]> show processlist;
              ERROR 2013 (HY000): Lost connection to MySQL server during query
              

              So, apparently there are two problems here.

              First, there is a query which creates an enormous on-disk tmp table, while the actual size of the data involved is relatively small.
              Second, the server crashes after this query gets killed, when the tmp disk is filled up.

              Stack trace from 10.0 commit a6087e7dc1ef3561d8189c8db15e9591d0f9b520
              #3  <signal handler called>
              #4  0x0000000000000000 in ?? ()
              #5  0x0000000000d1d6b4 in pagecache_fwrite (pagecache=0x20d6540, filedesc=0x7fd1f9380060, buffer=0x7fd1fca6c070 "", pageno=1922961, type=PAGECACHE_PLAIN_PAGE, flags=36) at 10.0/storage/maria/ma_pagecache.c:665
              #6  0x0000000000d20c15 in find_block (pagecache=0x20d6540, file=0x7fd1f6bf6498, pageno=1943376, init_hits_left=0, wrmode=1 '\001', block_is_copied=0 '\000', reg_req=1 '\001', page_st=0x7fd222f4f42c) at 10.0/storage/maria/ma_pagecache.c:2133
              #7  0x0000000000d25bf3 in pagecache_write_part (pagecache=0x20d6540, file=0x7fd1f6bf6498, pageno=1943376, level=0, buff=0x7fd1f6bf67a0 "", type=PAGECACHE_PLAIN_PAGE, lock=PAGECACHE_LOCK_READ, pin=PAGECACHE_PIN, write_mode=PAGECACHE_WRITE_DELAY, page_link=0x7fd222f4f960, first_REDO_LSN_for_page=0, offset=0, size=8192) at 10.0/storage/maria/ma_pagecache.c:4011
              #8  0x0000000000d5a7c8 in write_block_record (info=0x7fd1f6bf6070, old_record=0x0, record=0x7fd1f6846c88 "\351\017mydad3105554432", row=0x7fd1f6bf60c8, bitmap_blocks=0x7fd1f6bf60c8, head_block_is_read=0 '\000', row_pos=0x7fd222f4fbf0, undo_lsn=1, old_record_checksum=0) at 10.0/storage/maria/ma_blockrec.c:3153
              #9  0x0000000000d5b6e2 in allocate_and_write_block_record (info=0x7fd1f6bf6070, record=0x7fd1f6846c88 "\351\017mydad3105554432", row=0x7fd1f6bf60c8, undo_lsn=1) at 10.0/storage/maria/ma_blockrec.c:3528
              #10 0x0000000000d5b7f6 in _ma_write_init_block_record (info=0x7fd1f6bf6070, record=0x7fd1f6846c88 "\351\017mydad3105554432") at 10.0/storage/maria/ma_blockrec.c:3568
              #11 0x0000000000d6c0d6 in maria_write (info=0x7fd1f6bf6070, record=0x7fd1f6846c88 "\351\017mydad3105554432") at 10.0/storage/maria/ma_write.c:156
              #12 0x0000000000cfc23d in ha_maria::write_row (this=0x7fd1f6846288, buf=0x7fd1f6846c88 "\351\017mydad3105554432") at 10.0/storage/maria/ha_maria.cc:1280
              #13 0x00000000006f30c4 in handler::ha_write_tmp_row (this=0x7fd1f6846288, buf=0x7fd1f6846c88 "\351\017mydad3105554432") at 10.0/sql/sql_class.h:4970
              #14 0x00000000006e0cff in end_write (join=0x7fd1f6943698, join_tab=0x7fd1f686dd28, end_of_records=false) at 10.0/sql/sql_select.cc:19149
              #15 0x00000000007d2d68 in JOIN_CACHE::generate_full_extensions (this=0x7fd1f686f498, rec_ptr=0x7fd1f69a2043 "U\022P\200\377i\217\060\033~") at 10.0/sql/sql_join_cache.cc:2403
              #16 0x00000000007d2aa6 in JOIN_CACHE::join_matching_records (this=0x7fd1f686f498, skip_last=false) at 10.0/sql/sql_join_cache.cc:2295
              #17 0x00000000007d24e5 in JOIN_CACHE::join_records (this=0x7fd1f686f498, skip_last=false) at 10.0/sql/sql_join_cache.cc:2092
              #18 0x00000000006dd4fe in sub_select_cache (join=0x7fd1f6943698, join_tab=0x7fd1f686da00, end_of_records=false) at 10.0/sql/sql_select.cc:17619
              #19 0x00000000007d2d68 in JOIN_CACHE::generate_full_extensions (this=0x7fd1f686f2e8, rec_ptr=0x7fd1f6988a8c "\f7]\001\003\030\203U\200\025") at 10.0/sql/sql_join_cache.cc:2403
              #20 0x00000000007d2aa6 in JOIN_CACHE::join_matching_records (this=0x7fd1f686f2e8, skip_last=false) at 10.0/sql/sql_join_cache.cc:2295
              #21 0x00000000007d24e5 in JOIN_CACHE::join_records (this=0x7fd1f686f2e8, skip_last=false) at 10.0/sql/sql_join_cache.cc:2092
              #22 0x00000000006dd4fe in sub_select_cache (join=0x7fd1f6943698, join_tab=0x7fd1f686d6d8, end_of_records=false) at 10.0/sql/sql_select.cc:17619
              #23 0x00000000007d2d68 in JOIN_CACHE::generate_full_extensions (this=0x7fd1f686f110, rec_ptr=0x7fd1f695d52e "") at 10.0/sql/sql_join_cache.cc:2403
              #24 0x00000000007d2aa6 in JOIN_CACHE::join_matching_records (this=0x7fd1f686f110, skip_last=false) at 10.0/sql/sql_join_cache.cc:2295
              #25 0x00000000007d24e5 in JOIN_CACHE::join_records (this=0x7fd1f686f110, skip_last=false) at 10.0/sql/sql_join_cache.cc:2092
              #26 0x00000000006dd41b in sub_select_cache (join=0x7fd1f6943698, join_tab=0x7fd1f686d3b0, end_of_records=true) at 10.0/sql/sql_select.cc:17599
              #27 0x00000000006dd628 in sub_select (join=0x7fd1f6943698, join_tab=0x7fd1f686d088, end_of_records=true) at 10.0/sql/sql_select.cc:17769
              #28 0x00000000006dd110 in do_select (join=0x7fd1f6943698, fields=0x0, table=0x7fd1f694b088, procedure=0x0) at 10.0/sql/sql_select.cc:17480
              #29 0x00000000006b88d6 in JOIN::exec_inner (this=0x7fd1f6943698) at 10.0/sql/sql_select.cc:2622
              #30 0x00000000006b79c6 in JOIN::exec (this=0x7fd1f6943698) at 10.0/sql/sql_select.cc:2368
              #31 0x00000000006bad5b in mysql_select (thd=0x7fd2086ff070, rref_pointer_array=0x7fd208703330, tables=0x7fd1f6823368, wild_num=0, fields=..., conds=0x7fd1f69431e0, og_num=2, order=0x7fd1f6943400, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fd1f6943678, unit=0x7fd2087029c8, select_lex=0x7fd2087030b8) at 10.0/sql/sql_select.cc:3308
              #32 0x00000000006b1345 in handle_select (thd=0x7fd2086ff070, lex=0x7fd208702900, result=0x7fd1f6943678, setup_tables_done_option=0) at 10.0/sql/sql_select.cc:373
              #33 0x0000000000685a93 in execute_sqlcom_select (thd=0x7fd2086ff070, all_tables=0x7fd1f6823368) at 10.0/sql/sql_parse.cc:5274
              #34 0x000000000067ddc4 in mysql_execute_command (thd=0x7fd2086ff070) at 10.0/sql/sql_parse.cc:2562
              #35 0x00000000006885eb in mysql_parse (thd=0x7fd2086ff070, rawbuf=0x7fd1f6822088 "SELECT PROD_PRODUITS.LIBELLE AS libelle,   PROD_COMMANDES.DATE_ESTIMATION_BABEL,   PROD_COMMANDES.DATE_PRISE,   PROD_ITEMS.ID_PROD_ITEM,   PROD_COMMANDES.ID_COMMANDE_PARTENAIRE,   PROD_COMMANDES.ID_CO"..., length=936, parser_state=0x7fd222f51650) at 10.0/sql/sql_parse.cc:6529
              #36 0x000000000067afd7 in dispatch_command (command=COM_QUERY, thd=0x7fd2086ff070, packet=0x7fd20861a071 "SELECT PROD_PRODUITS.LIBELLE AS libelle,   PROD_COMMANDES.D"..., packet_length=936) at 10.0/sql/sql_parse.cc:1308
              #37 0x000000000067a2bd in do_command (thd=0x7fd2086ff070) at 10.0/sql/sql_parse.cc:999
              #38 0x00000000007982a2 in do_handle_one_connection (thd_arg=0x7fd2086ff070) at 10.0/sql/sql_connect.cc:1378
              #39 0x0000000000798001 in handle_one_connection (arg=0x7fd2086ff070) at 10.0/sql/sql_connect.cc:1293
              #40 0x00007fd222b8bb50 in start_thread (arg=<optimized out>) at pthread_create.c:304
              
              Show
              elenst Elena Stepanova added a comment - I've uploaded the artificial data dump and the query to ftp.askmonty.org/private. To reproduce the problem: start the server (all default options are enough, just make sure that tmpdir can be safely filled up); feed the data dump to it ( MDEV-8325 -test-data.dump); run the query ( MDEV-8325 -query.sql); observe the tmp table growing; after the tmp disk space is exhausted, kill the query; wait for a while. MariaDB [test]> select @@version; +-----------------------+ | @@version | +-----------------------+ | 10.0.20-MariaDB-debug | +-----------------------+ 1 row in set (0.00 sec) MariaDB [test]> show processlist; +----+------+-----------------+------+---------+------+------------------------------+-------------------------------------------------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+------+-----------------+------+---------+------+------------------------------+-------------------------------------------------------------+----------+ | 3 | root | localhost:34462 | test | Query | 7173 | Copying to tmp table on disk | SELECT PROD_PRODUITS.LIBELLE AS libelle, PROD_COMMANDES.D | 0.000 | | 4 | root | localhost:34463 | test | Query | 0 | init | show processlist | 0.000 | +----+------+-----------------+------+---------+------+------------------------------+-------------------------------------------------------------+----------+ 2 rows in set (0.00 sec) MariaDB [test]> select @@tmpdir; +----------------+ | @@tmpdir | +----------------+ | /data/logs/tmp | +----------------+ 1 row in set (0.00 sec) MariaDB [test]> system df -k /data/logs/tmp/ Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb 61927420 58781912 0 100% /data/logs MariaDB [test]> system ls -ltr /data/logs/tmp/\#* -rw-rw---- 1 elenst disk 8192 Jun 19 01:23 /data/logs/tmp/#sql_1fdf_0.MAI -rw-rw---- 1 elenst disk 15826214912 Jun 19 03:22 /data/logs/tmp/#sql_1fdf_0.MAD MariaDB [test]> select @@log_error; +------------------------------------+ | @@log_error | +------------------------------------+ | /home/elenst/git/10.0/data/log.err | +------------------------------------+ 1 row in set (0.00 sec) MariaDB [test]> system tail /home/elenst/git/10.0/data/log.err 150619 1:21:06 [Note] InnoDB: Highest supported file format is Barracuda. 150619 1:21:07 [Note] InnoDB: 128 rollback segment(s) are active. 150619 1:21:07 [Note] InnoDB: Waiting for purge to start 150619 1:21:07 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.24-72.2 started; log sequence number 762743680 150619 1:21:07 [Note] Server socket created on IP: '::'. 150619 1:21:08 [Note] Event Scheduler: Loaded 0 events 150619 1:21:08 [Note] /home/elenst/git/10.0/sql/mysqld: ready for connections. Version: '10.0.20-MariaDB-debug' socket: '/data/logs/tmp/mysql.sock' port: 3306 Source distribution 150619 3:21:55 [Warning] mysqld: Disk is full writing '/data/logs/tmp/#sql_1fdf_0.MAD' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space) 150619 3:21:55 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs MariaDB [test]> kill query 3; Query OK, 0 rows affected (0.00 sec) MariaDB [test]> show processlist; +----+------+-----------------+------+---------+------+------------------------------+-------------------------------------------------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+------+-----------------+------+---------+------+------------------------------+-------------------------------------------------------------+----------+ | 3 | root | localhost:34462 | test | Killed | 7209 | Copying to tmp table on disk | SELECT PROD_PRODUITS.LIBELLE AS libelle, PROD_COMMANDES.D | 0.000 | | 4 | root | localhost:34463 | test | Query | 0 | init | show processlist | 0.000 | +----+------+-----------------+------+---------+------+------------------------------+-------------------------------------------------------------+----------+ 2 rows in set (0.00 sec) -- wait a few minutes... MariaDB [test]> show processlist; ERROR 2013 (HY000): Lost connection to MySQL server during query So, apparently there are two problems here. First, there is a query which creates an enormous on-disk tmp table, while the actual size of the data involved is relatively small. Second, the server crashes after this query gets killed, when the tmp disk is filled up. Stack trace from 10.0 commit a6087e7dc1ef3561d8189c8db15e9591d0f9b520 #3 <signal handler called> #4 0x0000000000000000 in ?? () #5 0x0000000000d1d6b4 in pagecache_fwrite (pagecache=0x20d6540, filedesc=0x7fd1f9380060, buffer=0x7fd1fca6c070 "", pageno=1922961, type=PAGECACHE_PLAIN_PAGE, flags=36) at 10.0/storage/maria/ma_pagecache.c:665 #6 0x0000000000d20c15 in find_block (pagecache=0x20d6540, file=0x7fd1f6bf6498, pageno=1943376, init_hits_left=0, wrmode=1 '\001', block_is_copied=0 '\000', reg_req=1 '\001', page_st=0x7fd222f4f42c) at 10.0/storage/maria/ma_pagecache.c:2133 #7 0x0000000000d25bf3 in pagecache_write_part (pagecache=0x20d6540, file=0x7fd1f6bf6498, pageno=1943376, level=0, buff=0x7fd1f6bf67a0 "", type=PAGECACHE_PLAIN_PAGE, lock=PAGECACHE_LOCK_READ, pin=PAGECACHE_PIN, write_mode=PAGECACHE_WRITE_DELAY, page_link=0x7fd222f4f960, first_REDO_LSN_for_page=0, offset=0, size=8192) at 10.0/storage/maria/ma_pagecache.c:4011 #8 0x0000000000d5a7c8 in write_block_record (info=0x7fd1f6bf6070, old_record=0x0, record=0x7fd1f6846c88 "\351\017mydad3105554432", row=0x7fd1f6bf60c8, bitmap_blocks=0x7fd1f6bf60c8, head_block_is_read=0 '\000', row_pos=0x7fd222f4fbf0, undo_lsn=1, old_record_checksum=0) at 10.0/storage/maria/ma_blockrec.c:3153 #9 0x0000000000d5b6e2 in allocate_and_write_block_record (info=0x7fd1f6bf6070, record=0x7fd1f6846c88 "\351\017mydad3105554432", row=0x7fd1f6bf60c8, undo_lsn=1) at 10.0/storage/maria/ma_blockrec.c:3528 #10 0x0000000000d5b7f6 in _ma_write_init_block_record (info=0x7fd1f6bf6070, record=0x7fd1f6846c88 "\351\017mydad3105554432") at 10.0/storage/maria/ma_blockrec.c:3568 #11 0x0000000000d6c0d6 in maria_write (info=0x7fd1f6bf6070, record=0x7fd1f6846c88 "\351\017mydad3105554432") at 10.0/storage/maria/ma_write.c:156 #12 0x0000000000cfc23d in ha_maria::write_row (this=0x7fd1f6846288, buf=0x7fd1f6846c88 "\351\017mydad3105554432") at 10.0/storage/maria/ha_maria.cc:1280 #13 0x00000000006f30c4 in handler::ha_write_tmp_row (this=0x7fd1f6846288, buf=0x7fd1f6846c88 "\351\017mydad3105554432") at 10.0/sql/sql_class.h:4970 #14 0x00000000006e0cff in end_write (join=0x7fd1f6943698, join_tab=0x7fd1f686dd28, end_of_records=false) at 10.0/sql/sql_select.cc:19149 #15 0x00000000007d2d68 in JOIN_CACHE::generate_full_extensions (this=0x7fd1f686f498, rec_ptr=0x7fd1f69a2043 "U\022P\200\377i\217\060\033~") at 10.0/sql/sql_join_cache.cc:2403 #16 0x00000000007d2aa6 in JOIN_CACHE::join_matching_records (this=0x7fd1f686f498, skip_last=false) at 10.0/sql/sql_join_cache.cc:2295 #17 0x00000000007d24e5 in JOIN_CACHE::join_records (this=0x7fd1f686f498, skip_last=false) at 10.0/sql/sql_join_cache.cc:2092 #18 0x00000000006dd4fe in sub_select_cache (join=0x7fd1f6943698, join_tab=0x7fd1f686da00, end_of_records=false) at 10.0/sql/sql_select.cc:17619 #19 0x00000000007d2d68 in JOIN_CACHE::generate_full_extensions (this=0x7fd1f686f2e8, rec_ptr=0x7fd1f6988a8c "\f7]\001\003\030\203U\200\025") at 10.0/sql/sql_join_cache.cc:2403 #20 0x00000000007d2aa6 in JOIN_CACHE::join_matching_records (this=0x7fd1f686f2e8, skip_last=false) at 10.0/sql/sql_join_cache.cc:2295 #21 0x00000000007d24e5 in JOIN_CACHE::join_records (this=0x7fd1f686f2e8, skip_last=false) at 10.0/sql/sql_join_cache.cc:2092 #22 0x00000000006dd4fe in sub_select_cache (join=0x7fd1f6943698, join_tab=0x7fd1f686d6d8, end_of_records=false) at 10.0/sql/sql_select.cc:17619 #23 0x00000000007d2d68 in JOIN_CACHE::generate_full_extensions (this=0x7fd1f686f110, rec_ptr=0x7fd1f695d52e "") at 10.0/sql/sql_join_cache.cc:2403 #24 0x00000000007d2aa6 in JOIN_CACHE::join_matching_records (this=0x7fd1f686f110, skip_last=false) at 10.0/sql/sql_join_cache.cc:2295 #25 0x00000000007d24e5 in JOIN_CACHE::join_records (this=0x7fd1f686f110, skip_last=false) at 10.0/sql/sql_join_cache.cc:2092 #26 0x00000000006dd41b in sub_select_cache (join=0x7fd1f6943698, join_tab=0x7fd1f686d3b0, end_of_records=true) at 10.0/sql/sql_select.cc:17599 #27 0x00000000006dd628 in sub_select (join=0x7fd1f6943698, join_tab=0x7fd1f686d088, end_of_records=true) at 10.0/sql/sql_select.cc:17769 #28 0x00000000006dd110 in do_select (join=0x7fd1f6943698, fields=0x0, table=0x7fd1f694b088, procedure=0x0) at 10.0/sql/sql_select.cc:17480 #29 0x00000000006b88d6 in JOIN::exec_inner (this=0x7fd1f6943698) at 10.0/sql/sql_select.cc:2622 #30 0x00000000006b79c6 in JOIN::exec (this=0x7fd1f6943698) at 10.0/sql/sql_select.cc:2368 #31 0x00000000006bad5b in mysql_select (thd=0x7fd2086ff070, rref_pointer_array=0x7fd208703330, tables=0x7fd1f6823368, wild_num=0, fields=..., conds=0x7fd1f69431e0, og_num=2, order=0x7fd1f6943400, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fd1f6943678, unit=0x7fd2087029c8, select_lex=0x7fd2087030b8) at 10.0/sql/sql_select.cc:3308 #32 0x00000000006b1345 in handle_select (thd=0x7fd2086ff070, lex=0x7fd208702900, result=0x7fd1f6943678, setup_tables_done_option=0) at 10.0/sql/sql_select.cc:373 #33 0x0000000000685a93 in execute_sqlcom_select (thd=0x7fd2086ff070, all_tables=0x7fd1f6823368) at 10.0/sql/sql_parse.cc:5274 #34 0x000000000067ddc4 in mysql_execute_command (thd=0x7fd2086ff070) at 10.0/sql/sql_parse.cc:2562 #35 0x00000000006885eb in mysql_parse (thd=0x7fd2086ff070, rawbuf=0x7fd1f6822088 "SELECT PROD_PRODUITS.LIBELLE AS libelle, PROD_COMMANDES.DATE_ESTIMATION_BABEL, PROD_COMMANDES.DATE_PRISE, PROD_ITEMS.ID_PROD_ITEM, PROD_COMMANDES.ID_COMMANDE_PARTENAIRE, PROD_COMMANDES.ID_CO"..., length=936, parser_state=0x7fd222f51650) at 10.0/sql/sql_parse.cc:6529 #36 0x000000000067afd7 in dispatch_command (command=COM_QUERY, thd=0x7fd2086ff070, packet=0x7fd20861a071 "SELECT PROD_PRODUITS.LIBELLE AS libelle, PROD_COMMANDES.D"..., packet_length=936) at 10.0/sql/sql_parse.cc:1308 #37 0x000000000067a2bd in do_command (thd=0x7fd2086ff070) at 10.0/sql/sql_parse.cc:999 #38 0x00000000007982a2 in do_handle_one_connection (thd_arg=0x7fd2086ff070) at 10.0/sql/sql_connect.cc:1378 #39 0x0000000000798001 in handle_one_connection (arg=0x7fd2086ff070) at 10.0/sql/sql_connect.cc:1293 #40 0x00007fd222b8bb50 in start_thread (arg=<optimized out>) at pthread_create.c:304
              Hide
              aftab.khan aftab khan added a comment - - edited

              Hi Elena,

              WHERE cond1 AND cond2 ... AND cond9 AND (colX LIKE '%foo%' OR colX LIKE '%bar%')
              

              I agree with your findings and that is how it should be done. I have noticed the query completes <10 seconds when I convert theta-style joins to ANSI style, also moved GROUP BY after all filters, then explain output turns out to be completely different:

              EXPLAIN output after converting theta-joins to ANSI style

              *************************** 1. row ***************************
                         id: 1
                select_type: SIMPLE
                      table: PROD_COMMANDES
                       type: ALL
              possible_keys: PRIMARY,ID_SITE
                        key: NULL
                    key_len: NULL
                        ref: NULL
                       rows: 702411
                      Extra: Using where; Using temporary; Using filesort
              *************************** 2. row ***************************
                         id: 1
                select_type: SIMPLE
                      table: PROD_ITEMS
                       type: ref
              possible_keys: ID_PROD_COMMANDE_ETATE,ID_PROD_COMMANDE_PRODUI,ID_PROD_COMMANDE_ENVELOPPE,ID_PRODUIT,ID_PROD_ENVELOPPE,ID_ETATE_ID_PRODUIT
                        key: ID_PROD_COMMANDE_ETATE
                    key_len: 11
                        ref: PRODUCTION.PROD_COMMANDES.ID_PROD_COMMANDE,const
                       rows: 1
                      Extra: Using index condition; Using where
              *************************** 3. row ***************************
                         id: 1
                select_type: SIMPLE
                      table: PROD_PRODUITS
                       type: eq_ref
              possible_keys: PRIMARY
                        key: PRIMARY
                    key_len: 4
                        ref: PRODUCTION.PROD_ITEMS.ID_PRODUIT
                       rows: 1
                      Extra: Using where
              *************************** 4. row ***************************
                         id: 1
                select_type: SIMPLE
                      table: PROD_ENVELOPPES
                       type: eq_ref
              possible_keys: PRIMARY
                        key: PRIMARY
                    key_len: 4
                        ref: PRODUCTION.PROD_ITEMS.ID_PROD_ENVELOPPE
                       rows: 1
                      Extra: Using where
              4 rows in set (0.04 sec)
              

              Explain output of original query (you have it in the attachments), but I would share it here too:

              *************************** 1. row ***************************
                         id: 1
                select_type: SIMPLE
                      table: PROD_PRODUITS
                       type: ALL
              possible_keys: NULL
                        key: NULL
                    key_len: NULL
                        ref: NULL
                       rows: 1857
                      Extra: Using temporary; Using filesort
              *************************** 2. row ***************************
                         id: 1
                select_type: SIMPLE
                      table: PROD_COMMANDES
                       type: ALL
              possible_keys: PRIMARY
                        key: NULL
                    key_len: NULL
                        ref: NULL
                       rows: 702360
                      Extra: Using join buffer (flat, BNL join)
              *************************** 3. row ***************************
                         id: 1
                select_type: SIMPLE
                      table: PROD_ITEMS
                       type: ref
              possible_keys: ID_PROD_COMMANDE_ETATE,ID_PROD_COMMANDE_PRODUI,ID_PROD_COMMANDE_ENVELOPPE
                        key: ID_PROD_COMMANDE_ETATE
                    key_len: 4
                        ref: PRODUCTION.PROD_COMMANDES.ID_PROD_COMMANDE
                       rows: 1
                      Extra:
              *************************** 4. row ***************************
                         id: 1
                select_type: SIMPLE
                      table: PROD_ENVELOPPES
                       type: ALL
              possible_keys: NULL
                        key: NULL
                    key_len: NULL
                        ref: NULL
                       rows: 3136976
                      Extra: Using join buffer (flat, BNL join)
              4 rows in set (0.03 sec)
              
              Show
              aftab.khan aftab khan added a comment - - edited Hi Elena, WHERE cond1 AND cond2 ... AND cond9 AND (colX LIKE '%foo%' OR colX LIKE '%bar%') I agree with your findings and that is how it should be done. I have noticed the query completes <10 seconds when I convert theta-style joins to ANSI style, also moved GROUP BY after all filters, then explain output turns out to be completely different: EXPLAIN output after converting theta-joins to ANSI style *************************** 1. row *************************** id: 1 select_type: SIMPLE table: PROD_COMMANDES type: ALL possible_keys: PRIMARY,ID_SITE key: NULL key_len: NULL ref: NULL rows: 702411 Extra: Using where; Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: PROD_ITEMS type: ref possible_keys: ID_PROD_COMMANDE_ETATE,ID_PROD_COMMANDE_PRODUI,ID_PROD_COMMANDE_ENVELOPPE,ID_PRODUIT,ID_PROD_ENVELOPPE,ID_ETATE_ID_PRODUIT key: ID_PROD_COMMANDE_ETATE key_len: 11 ref: PRODUCTION.PROD_COMMANDES.ID_PROD_COMMANDE,const rows: 1 Extra: Using index condition; Using where *************************** 3. row *************************** id: 1 select_type: SIMPLE table: PROD_PRODUITS type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: PRODUCTION.PROD_ITEMS.ID_PRODUIT rows: 1 Extra: Using where *************************** 4. row *************************** id: 1 select_type: SIMPLE table: PROD_ENVELOPPES type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: PRODUCTION.PROD_ITEMS.ID_PROD_ENVELOPPE rows: 1 Extra: Using where 4 rows in set (0.04 sec) Explain output of original query (you have it in the attachments), but I would share it here too: *************************** 1. row *************************** id: 1 select_type: SIMPLE table: PROD_PRODUITS type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 1857 Extra: Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: PROD_COMMANDES type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 702360 Extra: Using join buffer (flat, BNL join) *************************** 3. row *************************** id: 1 select_type: SIMPLE table: PROD_ITEMS type: ref possible_keys: ID_PROD_COMMANDE_ETATE,ID_PROD_COMMANDE_PRODUI,ID_PROD_COMMANDE_ENVELOPPE key: ID_PROD_COMMANDE_ETATE key_len: 4 ref: PRODUCTION.PROD_COMMANDES.ID_PROD_COMMANDE rows: 1 Extra: *************************** 4. row *************************** id: 1 select_type: SIMPLE table: PROD_ENVELOPPES type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 3136976 Extra: Using join buffer (flat, BNL join) 4 rows in set (0.03 sec)

                People

                • Assignee:
                  monty Michael Widenius
                  Reporter:
                  aftab.khan aftab khan
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  4 Start watching this issue

                  Dates

                  • Created:
                    Updated: