Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.0.7
    • Fix Version/s: 10.0.11
    • Component/s: None
    • Labels:
    • Environment:
      CentOs, TokuDB 7.1.0

      Description

      Hi All,

      We are running into a very weird issue with slow and hanging queries on TokuDB 7.1.0. Here's some background and configs:
      DB server has 264GB of ram. 1.7TB of RAID-10 SSD storage for data, and 876GB of RAID-1 SSD storage for binary logs.

      Since beginning to write this post, we've identified the exact issue and it seems to be a bug in the way TokuDB handles indexes when an entry is deleted. Here's some background information as part of the original message I was writing, as well as our temporary workaround:

      Previously we were running on MariaDB 5.3/XtraDB with no issues. The reason we switched to TokuDB is for space savings. We are now experiencing terrible performance issues on an email table, causing us to have to truncate the table every few months. We have a cleanup script running that deletes everything from the table that's older then 50 days. There are around 180M rows in the table, consisting of emails.

      General config:

      ##Encodings/Collation 
      default-storage-engine=TokuDB 
      character-set-server=utf8 
      collation_server=utf8_unicode_ci
      max_connections = 5000 
      thread_cache_size = 900
      query_cache_size= 0 
      query_cache_type = 0
      table_cache = 4096
      

      TokuDB specific config:

      ##TokuDB
      tokudb_cache_size = 180G
      tokudb_lock_timeout = 20000
      tokudb_row_format = tokudb_small
      tokudb_directio = ON
      tokudb_read_buf_size=1048576
      tokudb_load_save_space=on
      tokudb_commit_sync=off
      

      Things run perfectly fine until simple queries of the type: select * from `InBox` where `id`=12345 limit 1

      Here's an explain (from the slave DB, prior to our latest truncate on the master), please note that the slave server is identical to the master:

      explain select * from `InBox` where `id`=752371516 limit 1;
      +------+-------------+------------+-------+---------------+---------+---------+-------+------+-------+
      | id   | select_type | table      | type  | possible_keys | key     | key_len | ref   | rows | Extra |
      +------+-------------+------------+-------+---------------+---------+---------+-------+------+-------+
      |    1 | SIMPLE      | InBox | const | PRIMARY       | PRIMARY | 8       | const |    1 |       |
      +------+-------------+------------+-------+---------------+---------+---------+-------+------+-------+
      1 row in set (0.00 sec)
      

      The select also returns in (0.00 sec).

      Now here's one problem. If you do a query, that references an item that used to exist, but has been deleted. The query will scan the whole db. This can quickly kill the database:

      If you do an explain

       explain select * from `InBox` where `id`=209685691 limit 1;
      +------+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                               |
      +------+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
      |    1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Impossible WHERE noticed after reading const tables |
      +------+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
      1 row in set (35 min 10.07 sec)
      

      or do a select:

      select * from `InBox` where `id`=209685691 limit 1;
      Empty set (1 min 32.46 sec)
      

      But if you do it in an index that has never existed, this is not a problem:

      select * from `InBox` where `id`=99752371516 limit 1;
      Empty set (0.00 sec)
      
      explain select * from `InBox` where `id`=99209685691 limit 1;
      +------+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                               |
      +------+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
      |    1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Impossible WHERE noticed after reading const tables |
      +------+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
      1 row in set (0.00 sec)
      
      select * from `InBox` where `id`=99752371516 limit 1;
      Empty set (0.00 sec)
      

      As you can see by the execution times, the difference is huge, and any accidental access to an old, non-existant item is pretty catastrophic.

      Our current workaround, is never to delete. We will have to wait till the table fills up, then truncate it.

      Some additional worries from the previous message:

      Additionally, previously on InnoDB, we would do EXPLAIN statements and use the # of rows estimated. This was very accurate to the # of rows returned. If i were to do the statement without the explain like i could do with certain indexes, EXPLAIN SELECT * from BLAH use index(`sometimeindex`) where `sometimeindex` > NOW() - interval 1 day and the # of estimated rows in the explain statement would be nearly equal to that if i ran the query itself. But now if i do that, the # of rows returned in the explain doesn't actually change.

      We're worried that with the constant deletes, that the indexes are getting seriously messed up.

      This is the schema for the (obfuscated) InBox table:

      CREATE TABLE `InBox` (
        `id` bigint(20) NOT NULL AUTO_INCREMENT,
        `cc1` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
        `cc2` timestamp NULL DEFAULT NULL,
        `cc3` tinyint(4) NOT NULL,
        `cc4` varchar(150) COLLATE utf8_unicode_ci NOT NULL,
        `cc5` varchar(150) COLLATE utf8_unicode_ci NOT NULL,
        `cc6` varchar(150) COLLATE utf8_unicode_ci NOT NULL,
        `cc7` varchar(150) COLLATE utf8_unicode_ci NOT NULL,
        `cc8` varchar(150) COLLATE utf8_unicode_ci NOT NULL,
        `cc9` longtext COLLATE utf8_unicode_ci NOT NULL,
        `cc10` longtext COLLATE utf8_unicode_ci NOT NULL,
        `cc11` timestamp NULL DEFAULT NULL,
        `cc12` tinyint(4) NOT NULL,
        `cc13` tinyint(4) NOT NULL,
        `cc14` tinyint(4) NOT NULL,
        `cc15` tinyint(4) NOT NULL,
        `cc16` tinyint(4) NOT NULL,
        `cc17` tinyint(4) NOT NULL,
        `cc18` varchar(150) COLLATE utf8_unicode_ci DEFAULT NULL,
        `cc19` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
        `cc20` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
        `cc21` int(11) NOT NULL,
        PRIMARY KEY (`id`),
        KEY `cc18` (`cc18`,`cc19`,`cc1`),
        KEY `cc11` (`cc11`,`cc1`),
        KEY `cc3` (`cc3`,`id`),
        KEY `cc4` (`cc4`,`cc1`),
        KEY `cc5` (`cc5`,`cc1`),
        KEY `cc20` (`cc20`),
        KEY `cc1` (`cc1`),
        KEY `cc16` (`cc16`,`cc1`),
        KEY `cc18_2` (`cc18`,`cc19`,`cc16`,`cc1`),
        KEY `cc21` (`cc21`),
        KEY `cc3_2` (`cc3`,`cc21`,`id`)
      ) ENGINE=TokuDB AUTO_INCREMENT=934472644 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ROW_FORMAT=TOKUDB_SMALL;
      

      Here's the DELETE sql:

      DELETE FROM `InBox` where `id` IN (1,2,3,4,5) limit 5;
      

      Thank you!
      Alex

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              azilber Alexey Zilber added a comment -

              To add some more info. This event is observable when we begin to delete data after aproximately 50 days of inserts. At that point, there's aprox. 170+ million rows, and 400GB of compressed data.

              Show
              azilber Alexey Zilber added a comment - To add some more info. This event is observable when we begin to delete data after aproximately 50 days of inserts. At that point, there's aprox. 170+ million rows, and 400GB of compressed data.
              Hide
              azilber Alexey Zilber added a comment -

              Could we assign this to someone? This is a very critical bug, possibly even blocker level, as the deletions cannot be performed on a huge tokudb table without rebuilding indexes.

              Show
              azilber Alexey Zilber added a comment - Could we assign this to someone? This is a very critical bug, possibly even blocker level, as the deletions cannot be performed on a huge tokudb table without rebuilding indexes.
              Hide
              elenst Elena Stepanova added a comment -

              Hi Alexey,

              The issue has been discussed with TokuDB people, but none of us was able to reproduce the problem so far, so we are eagerly awaiting the new information from you. You wrote in the Google group:

              I'm currently testing it on a little server of mine.

              and (about OPTIMIZE)

              I can try this on the slave server in about two months and see if it works. But, even if it does, it won't solve the problem on the master.

              But you haven't given an update on either of the experiments.
              We are particularly interested in the "little server" test, as it might be our only chance to reproduce the problem. Tokutek, on the other hand, is apparently interested in OPTIMIZE TABLE result – even though it won't solve the problem, it will probably give more information to Tim Callaghan to analyze the issue further.

              Show
              elenst Elena Stepanova added a comment - Hi Alexey, The issue has been discussed with TokuDB people, but none of us was able to reproduce the problem so far, so we are eagerly awaiting the new information from you. You wrote in the Google group: I'm currently testing it on a little server of mine. and (about OPTIMIZE) I can try this on the slave server in about two months and see if it works. But, even if it does, it won't solve the problem on the master. But you haven't given an update on either of the experiments. We are particularly interested in the "little server" test, as it might be our only chance to reproduce the problem. Tokutek, on the other hand, is apparently interested in OPTIMIZE TABLE result – even though it won't solve the problem, it will probably give more information to Tim Callaghan to analyze the issue further.
              Hide
              tmcallaghan Tim Callaghan added a comment -

              The current work around is to optimize the table.

              Show
              tmcallaghan Tim Callaghan added a comment - The current work around is to optimize the table.
              Hide
              azilber Alexey Zilber added a comment -

              Optimize table is not workable on a multi-terabyte database after every delete. Not deleting at all seems to be the only solution for now.

              I will try doing further tests as per the MariaDB's team.

              Show
              azilber Alexey Zilber added a comment - Optimize table is not workable on a multi-terabyte database after every delete. Not deleting at all seems to be the only solution for now. I will try doing further tests as per the MariaDB's team.
              Hide
              tmcallaghan Tim Callaghan added a comment -

              Alexey, partitioning might also be a workable solution for your use-case.

              Show
              tmcallaghan Tim Callaghan added a comment - Alexey, partitioning might also be a workable solution for your use-case.
              Hide
              azilber Alexey Zilber added a comment -

              We may have run into a different scenario with this bug where it's more testable, but the implications are potentially worse.
              We have another table with 90M rows on two different database servers (both are TokuDB 7.1.0, one has 256GB of ram, the other 128GB of ram (and non-ssd)

              We noticed the same symptoms as we described initially. The table status:

              MariaDB [masked_db]> show table status like 'MaskedTable1'\G;
              *************************** 1. row ***************************
                         Name: MaskedTable1
                       Engine: TokuDB
                      Version: 10
                   Row_format: tokudb_small
                         Rows: 38975004
               Avg_row_length: 70
                  Data_length: 2751257149
              Max_data_length: 9223372036854775807
                 Index_length: 3215417738
                    Data_free: 170606592
               Auto_increment: NULL
                  Create_time: 2013-08-19 07:50:23
                  Update_time: 2014-03-20 17:07:13
                   Check_time: NULL
                    Collation: utf8_unicode_ci
                     Checksum: NULL
               Create_options: row_format=TOKUDB_SMALL
                      Comment:
              1 row in set (0.00 sec)
              

              This is the original query we were running:

              SELECT id, field1, field2
              FROM MaskedTable1
              FORCE INDEX ( `active` )
              WHERE `active` =1
              AND field2 <= NOW( )
              ORDER BY field2 DESC
              LIMIT 8000;
              

              We realized, when we compared the explain of this same query to the other DB (which at that point was not having this issue, but which started having the same issue a few hours later), that the query type went from a range type, to a ref type.

              We ran analyze table on 'MaskedTable1', which did nothing. We then ran optimize table on 'MaskedTable1' and this fixed the issue. Unfortunately, if fixed the issue for a approximately 8 hours. It seems the query optimiser is also perhaps having issues. We're attempting another work-around right now, that seems to be working.

              It's best shown via these explain statements. These were done on the same db, seconds after each other, with no optimise table run between them:

              MariaDB [masked_db]> explain SELECT id, field1, field2 FROM MaskedTable1 FORCE INDEX ( `active` ) WHERE `active` =1 AND field2 <= NOW( ) ORDER BY field2 DESC LIMIT 8000; 
              +------+-------------+--------------+------+---------------+--------+---------+-------+-----------+--------------------------+
              | id   | select_type | table        | type | possible_keys | key    | key_len | ref   | rows      | Extra                    |
              +------+-------------+--------------+------+---------------+--------+---------+-------+-----------+--------------------------+
              |    1 | SIMPLE      | MaskedTable1 | ref  | active        | active | 1       | const | 114653697 | Using where; Using index |
              +------+-------------+--------------+------+---------------+--------+---------+-------+-----------+--------------------------+
              1 row in set (0.00 sec)
              
              MariaDB [masked_db]> explain SELECT id, field1, field2 FROM MaskedTable1 FORCE INDEX ( `active` ) WHERE `active` =0 AND field2 <= NOW( ) ORDER BY field2 DESC LIMIT 8000; 
              +------+-------------+--------------+-------+---------------+--------+---------+------+----------+--------------------------+
              | id   | select_type | table        | type  | possible_keys | key    | key_len | ref  | rows     | Extra                    |
              +------+-------------+--------------+-------+---------------+--------+---------+------+----------+--------------------------+
              |    1 | SIMPLE      | MaskedTable1 | range | active        | active | 5       | NULL | 45986026 | Using where; Using index |
              +------+-------------+--------------+-------+---------------+--------+---------+------+----------+--------------------------+
              1 row in set (0.00 sec)
              

              Based on the explains above, the workaround is to set active=0 right before we issue a DELETE. This is what led us to believe this issue is related to the original bug filed.

              Since this issue is becoming more critical, and is reproducible on even smaller tables now, I believe the bug needs to be escalated. Either it's an index issue or a big problem with the query optimiser.

              Show
              azilber Alexey Zilber added a comment - We may have run into a different scenario with this bug where it's more testable, but the implications are potentially worse. We have another table with 90M rows on two different database servers (both are TokuDB 7.1.0, one has 256GB of ram, the other 128GB of ram (and non-ssd) We noticed the same symptoms as we described initially. The table status: MariaDB [masked_db]> show table status like 'MaskedTable1'\G; *************************** 1. row *************************** Name: MaskedTable1 Engine: TokuDB Version: 10 Row_format: tokudb_small Rows: 38975004 Avg_row_length: 70 Data_length: 2751257149 Max_data_length: 9223372036854775807 Index_length: 3215417738 Data_free: 170606592 Auto_increment: NULL Create_time: 2013-08-19 07:50:23 Update_time: 2014-03-20 17:07:13 Check_time: NULL Collation: utf8_unicode_ci Checksum: NULL Create_options: row_format=TOKUDB_SMALL Comment: 1 row in set (0.00 sec) This is the original query we were running: SELECT id, field1, field2 FROM MaskedTable1 FORCE INDEX ( `active` ) WHERE `active` =1 AND field2 <= NOW( ) ORDER BY field2 DESC LIMIT 8000; We realized, when we compared the explain of this same query to the other DB (which at that point was not having this issue, but which started having the same issue a few hours later), that the query type went from a range type, to a ref type. We ran analyze table on 'MaskedTable1', which did nothing. We then ran optimize table on 'MaskedTable1' and this fixed the issue. Unfortunately, if fixed the issue for a approximately 8 hours. It seems the query optimiser is also perhaps having issues. We're attempting another work-around right now, that seems to be working. It's best shown via these explain statements. These were done on the same db, seconds after each other, with no optimise table run between them: MariaDB [masked_db]> explain SELECT id, field1, field2 FROM MaskedTable1 FORCE INDEX ( `active` ) WHERE `active` =1 AND field2 <= NOW( ) ORDER BY field2 DESC LIMIT 8000; +------+-------------+--------------+------+---------------+--------+---------+-------+-----------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------------+------+---------------+--------+---------+-------+-----------+--------------------------+ | 1 | SIMPLE | MaskedTable1 | ref | active | active | 1 | const | 114653697 | Using where; Using index | +------+-------------+--------------+------+---------------+--------+---------+-------+-----------+--------------------------+ 1 row in set (0.00 sec) MariaDB [masked_db]> explain SELECT id, field1, field2 FROM MaskedTable1 FORCE INDEX ( `active` ) WHERE `active` =0 AND field2 <= NOW( ) ORDER BY field2 DESC LIMIT 8000; +------+-------------+--------------+-------+---------------+--------+---------+------+----------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------------+-------+---------------+--------+---------+------+----------+--------------------------+ | 1 | SIMPLE | MaskedTable1 | range | active | active | 5 | NULL | 45986026 | Using where; Using index | +------+-------------+--------------+-------+---------------+--------+---------+------+----------+--------------------------+ 1 row in set (0.00 sec) Based on the explains above, the workaround is to set active=0 right before we issue a DELETE. This is what led us to believe this issue is related to the original bug filed. Since this issue is becoming more critical, and is reproducible on even smaller tables now, I believe the bug needs to be escalated. Either it's an index issue or a big problem with the query optimiser.
              Hide
              serg Sergei Golubchik added a comment -

              This is TokuDB issue. When it will be fixed upstream we will merge the new TokuDB with the fix.

              But as of TokuDB 7.1.5 it is not fixed yet.

              Show
              serg Sergei Golubchik added a comment - This is TokuDB issue. When it will be fixed upstream we will merge the new TokuDB with the fix. But as of TokuDB 7.1.5 it is not fixed yet.
              Hide
              prohaska7 Rich Prohaska added a comment -

              tokudb 7.1.6 fixed the long point query times for deleted keys. see https://github.com/Tokutek/ft-index/issues/218.

              Show
              prohaska7 Rich Prohaska added a comment - tokudb 7.1.6 fixed the long point query times for deleted keys. see https://github.com/Tokutek/ft-index/issues/218 .

                People

                • Assignee:
                  Unassigned
                  Reporter:
                  azilber Alexey Zilber
                • Votes:
                  1 Vote for this issue
                  Watchers:
                  7 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: