Uploaded image for project: 'MariaDB Server'
  1. MDEV-7578

Slave is ~10x slower to execute set of statements compared to master when using RBR

    Details

      Description

      CREATE TABLE `t1` (
       `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
       `cm` mediumint(8) unsigned NOT NULL,
       `md` tinyint(4) NOT NULL,
       `pid` int(10) unsigned NOT NULL,
       `npid` int(10) unsigned DEFAULT NULL,
       `cb` int(10) unsigned NOT NULL,
       PRIMARY KEY (`id`),
       UNIQUE KEY `cm` (`cm`,`md`,`pid`,`npid`)
       ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
      

      First the table is not that large (1.1GB) and this number of rows:

      select count(*) from t1;
      +----------+
      | count(*) |
      +----------+
      | 13146095 |
      +----------+
      1 row in set (2.82 sec)
      

      The table is 1GB and the server has 192 GB of RAM.

      # ls -lh t1.ibd
      -rw-rw---- 1 1.1G Feb 3 13:58 t1.ibd
      

      This table exists on two different machines (lets call them machine1 and machine2). These machines are on two different replication streams and this causes that table contents on these machines may differ. Periodically, customer executes Percona tool pt-sync-table (http://www.percona.com/doc/percona-toolkit/2.2/pt-table-sync.html). On master this produces following kind of transactions (single transaction containing lot of statements):

      ---TRANSACTION 500973435, ACTIVE (PREPARED) 23019 sec
      308870 lock struct(s), heap size 45790760, 49395321 row lock(s), undo log entries 25930772
      MySQL thread id 47698974, OS thread handle 0x7f957722e700, query id 2253075967
      TABLE LOCK table `test`.`t1` trx id 500973435 lock mode IX
      RECORD LOCKS space id 15988 page no 65789 n bits 88 index `PRIMARY` of table `test`.`t1` trx id 500973435 lock_mode X
      RECORD LOCKS space id 15988 page no 3604 n bits 504 index `cm` of table `test`.`t1` trx id 500973435 lock_mode X
      RECORD LOCKS space id 15988 page no 4120 n bits 456 index `PRIMARY` of table `test`.`t1` trx id 500973435 lock_mode X locks rec but not gap
      RECORD LOCKS space id 15988 page no 3604 n bits 504 index `cm` of table `test`.`t1` trx id 500973435 lock_mode X locks gap before rec
      RECORD LOCKS space id 15988 page no 3605 n bits 512 index `cm` of table `test`.`t1` trx id 500973435 lock_mode X
      RECORD LOCKS space id 15988 page no 4119 n bits 456 index `PRIMARY` of table `test`.`t1` trx id 500973435 lock_mode X locks rec but not gap
      RECORD LOCKS space id 15988 page no 3605 n bits 520 index `cm` of table `test`.`t1` trx id 500973435 lock_mode X locks gap before rec
      RECORD LOCKS space id 15988 page no 3603 n bits 752 index `cm` of table `test`.`t1` trx id 500973435 lock_mode X
      RECORD LOCKS space id 15988 page no 3603 n bits 752 index `cm` of table `test`.`t1` trx id 500973435 lock_mode X locks gap before rec
      TOO MANY LOCKS PRINTED FOR THIS TRX: SUPPRESSING FURTHER PRINTS
      

      On slave this same transaction is ~10x slower:

      ---TRANSACTION 455244311, ACTIVE 246994 sec setting auto-inc lock
      mysql tables in use 1, locked 1
      310972 lock struct(s), heap size 46888488, 50033767 row lock(s), undo log entries 25929883
      MySQL thread id 191, OS thread handle 0x7fa9bcc49700, query id 15186089 executing
      TABLE LOCK table `test`.`t1` trx id 455244311 lock mode IX
      RECORD LOCKS space id 15974 page no 4120 n bits 456 index `PRIMARY` of table `test`.`t1` trx id 455244311 lock_mode X locks rec but not gap
      RECORD LOCKS space id 15974 page no 3653 n bits 744 index `cm` of table `test`.`t1` trx id 455244311 lock_mode X locks rec but not gap
      RECORD LOCKS space id 15974 page no 3653 n bits 744 index `cm` of table `test`.`t1` trx id 455244311 lock mode S
      RECORD LOCKS space id 15974 page no 3653 n bits 744 index `cm` of table `test`.`t1` trx id 455244311 lock mode S locks gap before rec
      RECORD LOCKS space id 15974 page no 4119 n bits 456 index `PRIMARY` of table `test`.`t1` trx id 455244311 lock_mode X locks rec but not gap
      RECORD LOCKS space id 15974 page no 3654 n bits 776 index `cm` of table `test`.`t1` trx id 455244311 lock_mode X locks rec but not gap
      RECORD LOCKS space id 15974 page no 3654 n bits 776 index `cm` of table `test`.`t1` trx id 455244311 lock mode S
      RECORD LOCKS space id 15974 page no 3654 n bits 776 index `cm` of table `test`.`t1` trx id 455244311 lock mode S locks gap before rec
      RECORD LOCKS space id 15974 page no 53911 n bits 792 index `cm` of table `test`.`t1` trx id 455244311 lock_mode X locks rec but not gap
      TOO MANY LOCKS PRINTED FOR THIS TRX: SUPPRESSING FURTHER PRINTS
      

      So runtime: 246994 = ~ 2 days and 21 hours. Again vs the original ~ 7 hours (on the master) this behaviour is really not good. So the question is: why it takes so much longer and what can be done about this to prevent it?

      Server was not I/O bound. It was 100% cpu bound with a single thread pegged at 100%

      Number of different statements on that one big transaction:

      ls -la binlog.000001 # binlog of the large transaction we saw
      [mysqlbinlog -vvv --start-position=4 binlog.000001 | grep -c '^### DELETE FROM `test`.`t1`'
      11801095
      [mysqlbinlog -vvv --start-position=4 binlog.000001 | grep -c '^### UPDATE `test`.`t1`'
      983605
      mysqlbinlog -vvv --start-position=4 binlog.000001 | grep -c '^### INSERT INTO `test`.`t1`'
      12162467
      

        Attachments

          Activity

            People

            • Assignee:
              jplindst Jan Lindström
              Reporter:
              jplindst Jan Lindström
            • Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 4 days
                4d