Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7578

Slave is ~10x slower to execute set of statements compared to master when using RBR

    Details

      Description

      CREATE TABLE `t1` (
       `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
       `cm` mediumint(8) unsigned NOT NULL,
       `md` tinyint(4) NOT NULL,
       `pid` int(10) unsigned NOT NULL,
       `npid` int(10) unsigned DEFAULT NULL,
       `cb` int(10) unsigned NOT NULL,
       PRIMARY KEY (`id`),
       UNIQUE KEY `cm` (`cm`,`md`,`pid`,`npid`)
       ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
      

      First the table is not that large (1.1GB) and this number of rows:

      select count(*) from t1;
      +----------+
      | count(*) |
      +----------+
      | 13146095 |
      +----------+
      1 row in set (2.82 sec)
      

      The table is 1GB and the server has 192 GB of RAM.

      # ls -lh t1.ibd
      -rw-rw---- 1 1.1G Feb 3 13:58 t1.ibd
      

      This table exists on two different machines (lets call them machine1 and machine2). These machines are on two different replication streams and this causes that table contents on these machines may differ. Periodically, customer executes Percona tool pt-sync-table (http://www.percona.com/doc/percona-toolkit/2.2/pt-table-sync.html). On master this produces following kind of transactions (single transaction containing lot of statements):

      ---TRANSACTION 500973435, ACTIVE (PREPARED) 23019 sec
      308870 lock struct(s), heap size 45790760, 49395321 row lock(s), undo log entries 25930772
      MySQL thread id 47698974, OS thread handle 0x7f957722e700, query id 2253075967
      TABLE LOCK table `test`.`t1` trx id 500973435 lock mode IX
      RECORD LOCKS space id 15988 page no 65789 n bits 88 index `PRIMARY` of table `test`.`t1` trx id 500973435 lock_mode X
      RECORD LOCKS space id 15988 page no 3604 n bits 504 index `cm` of table `test`.`t1` trx id 500973435 lock_mode X
      RECORD LOCKS space id 15988 page no 4120 n bits 456 index `PRIMARY` of table `test`.`t1` trx id 500973435 lock_mode X locks rec but not gap
      RECORD LOCKS space id 15988 page no 3604 n bits 504 index `cm` of table `test`.`t1` trx id 500973435 lock_mode X locks gap before rec
      RECORD LOCKS space id 15988 page no 3605 n bits 512 index `cm` of table `test`.`t1` trx id 500973435 lock_mode X
      RECORD LOCKS space id 15988 page no 4119 n bits 456 index `PRIMARY` of table `test`.`t1` trx id 500973435 lock_mode X locks rec but not gap
      RECORD LOCKS space id 15988 page no 3605 n bits 520 index `cm` of table `test`.`t1` trx id 500973435 lock_mode X locks gap before rec
      RECORD LOCKS space id 15988 page no 3603 n bits 752 index `cm` of table `test`.`t1` trx id 500973435 lock_mode X
      RECORD LOCKS space id 15988 page no 3603 n bits 752 index `cm` of table `test`.`t1` trx id 500973435 lock_mode X locks gap before rec
      TOO MANY LOCKS PRINTED FOR THIS TRX: SUPPRESSING FURTHER PRINTS
      

      On slave this same transaction is ~10x slower:

      ---TRANSACTION 455244311, ACTIVE 246994 sec setting auto-inc lock
      mysql tables in use 1, locked 1
      310972 lock struct(s), heap size 46888488, 50033767 row lock(s), undo log entries 25929883
      MySQL thread id 191, OS thread handle 0x7fa9bcc49700, query id 15186089 executing
      TABLE LOCK table `test`.`t1` trx id 455244311 lock mode IX
      RECORD LOCKS space id 15974 page no 4120 n bits 456 index `PRIMARY` of table `test`.`t1` trx id 455244311 lock_mode X locks rec but not gap
      RECORD LOCKS space id 15974 page no 3653 n bits 744 index `cm` of table `test`.`t1` trx id 455244311 lock_mode X locks rec but not gap
      RECORD LOCKS space id 15974 page no 3653 n bits 744 index `cm` of table `test`.`t1` trx id 455244311 lock mode S
      RECORD LOCKS space id 15974 page no 3653 n bits 744 index `cm` of table `test`.`t1` trx id 455244311 lock mode S locks gap before rec
      RECORD LOCKS space id 15974 page no 4119 n bits 456 index `PRIMARY` of table `test`.`t1` trx id 455244311 lock_mode X locks rec but not gap
      RECORD LOCKS space id 15974 page no 3654 n bits 776 index `cm` of table `test`.`t1` trx id 455244311 lock_mode X locks rec but not gap
      RECORD LOCKS space id 15974 page no 3654 n bits 776 index `cm` of table `test`.`t1` trx id 455244311 lock mode S
      RECORD LOCKS space id 15974 page no 3654 n bits 776 index `cm` of table `test`.`t1` trx id 455244311 lock mode S locks gap before rec
      RECORD LOCKS space id 15974 page no 53911 n bits 792 index `cm` of table `test`.`t1` trx id 455244311 lock_mode X locks rec but not gap
      TOO MANY LOCKS PRINTED FOR THIS TRX: SUPPRESSING FURTHER PRINTS
      

      So runtime: 246994 = ~ 2 days and 21 hours. Again vs the original ~ 7 hours (on the master) this behaviour is really not good. So the question is: why it takes so much longer and what can be done about this to prevent it?

      Server was not I/O bound. It was 100% cpu bound with a single thread pegged at 100%

      Number of different statements on that one big transaction:

      ls -la binlog.000001 # binlog of the large transaction we saw
      [mysqlbinlog -vvv --start-position=4 binlog.000001 | grep -c '^### DELETE FROM `test`.`t1`'
      11801095
      [mysqlbinlog -vvv --start-position=4 binlog.000001 | grep -c '^### UPDATE `test`.`t1`'
      983605
      mysqlbinlog -vvv --start-position=4 binlog.000001 | grep -c '^### INSERT INTO `test`.`t1`'
      12162467
      

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            jplindst Jan Lindström added a comment -

            Lets try to repeat first using same sane defaults. From InnoDB point of view transaction should be executed exactly similar way as in master if the set of actual statements are exactly the same. If in RBR some statement is replicated differently then we have different case. Here is what we understant to this point of time. We have two machines A and B, both contain the same table but table contents differ. Thus in machine A master executes selects (to find out the rows that differ and how), inserts, replaces, deletes to modify table rows to match table in B. This creates a big transaction that takes ~7h on master, on transaction commit master writes ~1G binlog that is then transfered to slave for execution. For some reason this execution takes 10x time compared with master. I did debug the slave, and there was basically only one SQL-thread doing something, executing this one big transaction, statement by statement.

            Show
            jplindst Jan Lindström added a comment - Lets try to repeat first using same sane defaults. From InnoDB point of view transaction should be executed exactly similar way as in master if the set of actual statements are exactly the same. If in RBR some statement is replicated differently then we have different case. Here is what we understant to this point of time. We have two machines A and B, both contain the same table but table contents differ. Thus in machine A master executes selects (to find out the rows that differ and how), inserts, replaces, deletes to modify table rows to match table in B. This creates a big transaction that takes ~7h on master, on transaction commit master writes ~1G binlog that is then transfered to slave for execution. For some reason this execution takes 10x time compared with master. I did debug the slave, and there was basically only one SQL-thread doing something, executing this one big transaction, statement by statement.
            Hide
            elenst Elena Stepanova added a comment -

            Tentative test case (should be run with the increased testcase-timeout and maybe suite-timeout; although there is no need to wait till the end of the test, as soon as the problem is obvious it can be interrupted).

            
            --source include/master-slave.inc
            --source include/have_binlog_format_row.inc
            --source include/have_xtradb.inc
            
            let $num = 10000000;
            let $n = $num;
            
            CREATE TABLE t (pk int auto_increment primary key) engine=InnoDB;
            --sync_slave_with_master
            
            --connection master
            BEGIN;
            --disable_query_log
            --echo # Inserting $n rows...
            while ($n)
            {
              eval INSERT INTO t VALUES ($num + 1 - $n);
              dec $n;
            }
            --echo # Inserted $n rows
            --enable_query_log
            COMMIT;
            
            --connection slave
            while (`SELECT COUNT(*) < $num FROM t`)
            {
              --exec $MYSQL -h 127.0.0.1 -P $SLAVE_MYPORT -uroot -e "SHOW ENGINE INNODB STATUS \G" | grep -E 'ACTIVE|Number of rows inserted'
              --sleep 5
            }
            --echo # Final one:
            --exec $MYSQL -h 127.0.0.1 -P $SLAVE_MYPORT -uroot -e "SHOW ENGINE INNODB STATUS \G" | grep 'Number of rows inserted'
            
            --connection master
            DROP TABLE t;
            
            --source include/rpl_end.inc
            

            If the problem is reproducible, the 'Number of rows inserted' will grow slower and slower while the slave progresses through the huge transaction.

            I am running the test now, it will take a while. With 1M rows it's not convincing, that's why we couldn't reproduce it in the initial attempts.

            Show
            elenst Elena Stepanova added a comment - Tentative test case (should be run with the increased testcase-timeout and maybe suite-timeout; although there is no need to wait till the end of the test, as soon as the problem is obvious it can be interrupted). --source include/master-slave.inc --source include/have_binlog_format_row.inc --source include/have_xtradb.inc let $num = 10000000; let $n = $num; CREATE TABLE t (pk int auto_increment primary key) engine=InnoDB; --sync_slave_with_master --connection master BEGIN; --disable_query_log --echo # Inserting $n rows... while ($n) { eval INSERT INTO t VALUES ($num + 1 - $n); dec $n; } --echo # Inserted $n rows --enable_query_log COMMIT; --connection slave while (` SELECT COUNT(*) < $num FROM t`) { --exec $MYSQL -h 127.0.0.1 -P $SLAVE_MYPORT -uroot -e "SHOW ENGINE INNODB STATUS \G" | grep -E 'ACTIVE|Number of rows inserted' --sleep 5 } --echo # Final one: --exec $MYSQL -h 127.0.0.1 -P $SLAVE_MYPORT -uroot -e "SHOW ENGINE INNODB STATUS \G" | grep 'Number of rows inserted' --connection master DROP TABLE t; --source include/rpl_end.inc If the problem is reproducible, the 'Number of rows inserted' will grow slower and slower while the slave progresses through the huge transaction. I am running the test now, it will take a while. With 1M rows it's not convincing, that's why we couldn't reproduce it in the initial attempts.
            Hide
            jplindst Jan Lindström added a comment -

            Problem is on

            /********************************************************************//**
            This special handling is really to overcome the limitations of MySQL's
            binlogging. We need to eliminate the non-determinism that will arise in
            INSERT ... SELECT type of statements, since MySQL binlog only stores the
            min value of the autoinc interval. Once that is fixed we can get rid of
            the special lock handling.
            @return	DB_SUCCESS if all OK else error code */
            UNIV_INTERN
            dberr_t
            ha_innobase::innobase_lock_autoinc(void)
            /*====================================*/
            {
            	dberr_t		error = DB_SUCCESS;
            
            	ut_ad(!srv_read_only_mode);
            
            	switch (innobase_autoinc_lock_mode) {
            	case AUTOINC_NO_LOCKING:
            		/* Acquire only the AUTOINC mutex. */
            		dict_table_autoinc_lock(prebuilt->table);
            		break;
            
            	case AUTOINC_NEW_STYLE_LOCKING:
            		/* For simple (single/multi) row INSERTs, we fallback to the
            		old style only if another transaction has already acquired
            		the AUTOINC lock on behalf of a LOAD FILE or INSERT ... SELECT
            		etc. type of statement. */
                            /* NOTE: on master above example we have thd_sql_command(user_thd) == SQLCOM_INSERT
                               on slave we have thd_sql_command(user_thd) == SQLCOM_END */
            		if (thd_sql_command(user_thd) == SQLCOM_INSERT
            		    || thd_sql_command(user_thd) == SQLCOM_REPLACE) {
            			dict_table_t*	ib_table = prebuilt->table;
            
            			/* Acquire the AUTOINC mutex. */
            			dict_table_autoinc_lock(ib_table);
            
            			/* We need to check that another transaction isn't
            			already holding the AUTOINC lock on the table. */
            			if (ib_table->n_waiting_or_granted_auto_inc_locks) {
            				/* Release the mutex to avoid deadlocks. */
            				dict_table_autoinc_unlock(ib_table);
            			} else {
            				break;  /* On above case, master executes this not old style locking,
                                                           slave naturally takes old style lock for every INSERT */
            			}
            		}
            		/* Fall through to old style locking. */
            
            	case AUTOINC_OLD_STYLE_LOCKING:
            		error = row_lock_table_autoinc_for_mysql(prebuilt);
            
            		if (error == DB_SUCCESS) {
            
            			/* Acquire the AUTOINC mutex. */
            			dict_table_autoinc_lock(prebuilt->table);
            		}
            		break;
            
            	default:
            		ut_error;
            	}
            
            	return(error);
            }
            
            Show
            jplindst Jan Lindström added a comment - Problem is on /********************************************************************//** This special handling is really to overcome the limitations of MySQL's binlogging. We need to eliminate the non-determinism that will arise in INSERT ... SELECT type of statements, since MySQL binlog only stores the min value of the autoinc interval. Once that is fixed we can get rid of the special lock handling. @return DB_SUCCESS if all OK else error code */ UNIV_INTERN dberr_t ha_innobase::innobase_lock_autoinc(void) /*====================================*/ { dberr_t error = DB_SUCCESS; ut_ad(!srv_read_only_mode); switch (innobase_autoinc_lock_mode) { case AUTOINC_NO_LOCKING: /* Acquire only the AUTOINC mutex. */ dict_table_autoinc_lock(prebuilt->table); break; case AUTOINC_NEW_STYLE_LOCKING: /* For simple (single/multi) row INSERTs, we fallback to the old style only if another transaction has already acquired the AUTOINC lock on behalf of a LOAD FILE or INSERT ... SELECT etc. type of statement. */ /* NOTE: on master above example we have thd_sql_command(user_thd) == SQLCOM_INSERT on slave we have thd_sql_command(user_thd) == SQLCOM_END */ if (thd_sql_command(user_thd) == SQLCOM_INSERT || thd_sql_command(user_thd) == SQLCOM_REPLACE) { dict_table_t* ib_table = prebuilt->table; /* Acquire the AUTOINC mutex. */ dict_table_autoinc_lock(ib_table); /* We need to check that another transaction isn't already holding the AUTOINC lock on the table. */ if (ib_table->n_waiting_or_granted_auto_inc_locks) { /* Release the mutex to avoid deadlocks. */ dict_table_autoinc_unlock(ib_table); } else { break; /* On above case, master executes this not old style locking, slave naturally takes old style lock for every INSERT */ } } /* Fall through to old style locking. */ case AUTOINC_OLD_STYLE_LOCKING: error = row_lock_table_autoinc_for_mysql(prebuilt); if (error == DB_SUCCESS) { /* Acquire the AUTOINC mutex. */ dict_table_autoinc_lock(prebuilt->table); } break; default: ut_error; } return(error); }
            Hide
            jplindst Jan Lindström added a comment -

            Fixed on 5.5:

            commit f66fbe8ce0ff4ffcd6a6c185f9b3d25bd9f67f8d
            Author: Jan Lindström <jan.lindstrom@mariadb.com>
            Date: Thu Mar 5 12:05:59 2015 +0200

            MDEV-7578 :Slave is ~10x slower to execute set of statements compared to master when using
            RBR

            Analysis: On master when executing (single/multi) row INSERTs/REPLACEs
            InnoDB fallback to old style autoinc locks (table locks)
            only if another transaction has already acquired the AUTOINC lock.
            Instead on slave as we are executing log_events and sql_command
            is not correctly set, InnoDB does not use new style autoinc
            locks when it could.

            Fix: Use new style autoinc locks also when
            thd_sql_command(user_thd) == SQLCOM_END i.e. this is RBR event.

            Show
            jplindst Jan Lindström added a comment - Fixed on 5.5: commit f66fbe8ce0ff4ffcd6a6c185f9b3d25bd9f67f8d Author: Jan Lindström <jan.lindstrom@mariadb.com> Date: Thu Mar 5 12:05:59 2015 +0200 MDEV-7578 :Slave is ~10x slower to execute set of statements compared to master when using RBR Analysis: On master when executing (single/multi) row INSERTs/REPLACEs InnoDB fallback to old style autoinc locks (table locks) only if another transaction has already acquired the AUTOINC lock. Instead on slave as we are executing log_events and sql_command is not correctly set, InnoDB does not use new style autoinc locks when it could. Fix: Use new style autoinc locks also when thd_sql_command(user_thd) == SQLCOM_END i.e. this is RBR event.

              People

              • Assignee:
                jplindst Jan Lindström
                Reporter:
                jplindst Jan Lindström
              • Votes:
                2 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 4 days
                  4d