Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7830

parallel replication lock contention with DDL/DML on slave

    Details

    • Type: Bug
    • Status: Stalled
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 10.0.17
    • Fix Version/s: 10.0
    • Component/s: Replication
    • Labels:

      Description

      This slave wasn't resolving its locks:

      Trx total rec lock wait time 12700 SEC on some of the parallel threads:

      MariaDB [hotspot]> CREATE TABLE UserExpiry SELECT Location, Expire FROM UserInfo;
      ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
      
      MariaDB [hotspot]> SHOW ENGINE INNODB STATUS\G
      *************************** 1. row ***************************
        Type: InnoDB
        Name: 
      Status: 
      =====================================
      2015-03-25 13:36:21 7f96d3bf4700 INNODB MONITOR OUTPUT
      =====================================
      Per second averages calculated from the last 1 seconds
      -----------------
      BACKGROUND THREAD
      -----------------
      srv_master_thread loops: 588997 srv_active, 0 srv_shutdown, 3166 srv_idle
      srv_master_thread log flush and writes: 592163
      ----------
      SEMAPHORES
      ----------
      OS WAIT ARRAY INFO: reservation count 15337153
      OS WAIT ARRAY INFO: signal count 13206239
      Mutex spin waits 42778016, rounds 498035174, OS waits 12195640
      RW-shared spins 3104395, rounds 65795945, OS waits 1890737
      RW-excl spins 3332601, rounds 52029014, OS waits 744942
      Spin rounds per wait: 11.64 mutex, 21.19 RW-shared, 15.61 RW-excl
      ------------
      TRANSACTIONS
      ------------
      Trx id counter 2374048291
      Purge done for trx's n:o < 2374048291 undo n:o < 0 state: running but idle
      History list length 1224
      LIST OF TRANSACTIONS FOR EACH SESSION:
      ---TRANSACTION 2374048272, not started
      MySQL thread id 377135, OS thread handle 0x7f9776c6b700, query id 117572064 223.27.66.13 overseer cleaning up
      ---TRANSACTION 2374048178, not started
      MySQL thread id 373677, OS thread handle 0x7f96e346b700, query id 117571821 223.27.66.13 root cleaning up
      ---TRANSACTION 0, not started
      MySQL thread id 371257, OS thread handle 0x7f96d3bf4700, query id 117572115 223.27.66.13 root init
      SHOW ENGINE INNODB STATUS
      ---TRANSACTION 0, not started
      MySQL thread id 371180, OS thread handle 0x7f971446b700, query id 117571987 223.27.66.13 root cleaning up
      ---TRANSACTION 2332704790, not started
      MySQL thread id 97948, OS thread handle 0x7f96c53ab700, query id 0 Waiting for master to send event
      ---TRANSACTION 2374048234, not started
      MySQL thread id 7, OS thread handle 0x7f96c52d0700, query id 117571901 Waiting for prior transaction to start commit before starting next transaction
      ---TRANSACTION 2374048203, not started
      MySQL thread id 4, OS thread handle 0x7f96c51f5700, query id 117571860 Waiting for prior transaction to start commit before starting next transaction
      ---TRANSACTION 2374048246, not started
      MySQL thread id 19, OS thread handle 0x7f96c5163700, query id 117571919 Waiting for prior transaction to start commit before starting next transaction
      ---TRANSACTION 2374048238, not started
      MySQL thread id 11, OS thread handle 0x7f96c51ac700, query id 117571905 Waiting for prior transaction to start commit before starting next transaction
      ---TRANSACTION 2374048247, not started
      MySQL thread id 9, OS thread handle 0x7f96c5088700, query id 117571921 Waiting for prior transaction to start commit before starting next transaction
      ---TRANSACTION 2374048245, not started
      MySQL thread id 21, OS thread handle 0x7f980a86b700, query id 117571917 Waiting for prior transaction to start commit before starting next transaction
      ---TRANSACTION 2374047856, not started
      MySQL thread id 18, OS thread handle 0x7f983bc48700, query id 117571289 Waiting for prior transaction to start commit before starting next transaction
      ---TRANSACTION 2318180099, not started
      MySQL thread id 1, OS thread handle 0x7f96bc9fe700, query id 0 Waiting for background binlog tasks
      ---TRANSACTION 2374048288, ACTIVE (PREPARED) 0 sec
      4 lock struct(s), heap size 1184, 2 row lock(s), undo log entries 2
      MySQL thread id 22, OS thread handle 0x7f9872d03700, query id 117572113 Waiting for prior transaction to commit
      Trx #rec lock waits 159407 #table lock waits 142
      Trx total rec lock wait time 12567 SEC
      Trx total table lock wait time 0 SEC
      TABLE LOCK table "hotspot"."UserInfo" trx id 2374048288 lock mode IX lock hold time 0 wait time before grant 0 
      RECORD LOCKS space id 15794 page no 1620 n bits 480 index "UserName" of table "hotspot"."UserInfo" trx table locks 2 total table locks 5  trx id 2374048288 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
      RECORD LOCKS space id 15794 page no 1198 n bits 112 index "PRIMARY" of table "hotspot"."UserInfo" trx table locks 2 total table locks 5  trx id 2374048288 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
      TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048288 lock mode IX lock hold time 0 wait time before grant 0 
      ---TRANSACTION 2374048285, ACTIVE (PREPARED) 0 sec
      6 lock struct(s), heap size 1184, 2 row lock(s), undo log entries 4
      MySQL thread id 3, OS thread handle 0x7f96c523e700, query id 117572111 Waiting for prior transaction to commit
      Trx #rec lock waits 160163 #table lock waits 146
      Trx total rec lock wait time 12561 SEC
      Trx total table lock wait time 0 SEC
      TABLE LOCK table "hotspot"."radusage" trx id 2374048285 lock mode IX lock hold time 0 wait time before grant 0 
      TABLE LOCK table "hotspot"."UserInfo" trx id 2374048285 lock mode IX lock hold time 0 wait time before grant 0 
      RECORD LOCKS space id 15794 page no 1193 n bits 112 index "PRIMARY" of table "hotspot"."UserInfo" trx table locks 4 total table locks 5  trx id 2374048285 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
      TABLE LOCK table "hotspot"."radacct" trx id 2374048285 lock mode IX lock hold time 0 wait time before grant 0 
      RECORD LOCKS space id 11949 page no 138621 n bits 136 index "PRIMARY" of table "hotspot"."radacct" trx table locks 4 total table locks 5  trx id 2374048285 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
      TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048285 lock mode IX lock hold time 0 wait time before grant 0 
      ---TRANSACTION 2374048284, ACTIVE 0 sec inserting
      mysql tables in use 1, locked 1
      LOCK WAIT 2 lock struct(s), heap size 1184, 1 row lock(s), undo log entries 1
      MySQL thread id 14, OS thread handle 0x7f9872dde700, query id 117572106 update
      REPLACE INTO radcheck (username, attribute, op, value) VALUES ('NOTREALUSER', 'password', ':=', 'NOTREALPASSWORD')
      Trx #rec lock waits 159198 #table lock waits 137
      Trx total rec lock wait time 12568 SEC
      Trx total table lock wait time 0 SEC
      ------- TRX HAS BEEN WAITING 0 SEC FOR THIS LOCK TO BE GRANTED:
      RECORD LOCKS space id 199 page no 868 n bits 832 index "UserName" of table "hotspot"."radcheck" trx table locks 1 total table locks 3  trx id 2374048284 lock_mode X locks gap before rec insert intention waiting lock hold time 0 wait time before grant 0 
      ------------------
      TABLE LOCK table "hotspot"."radcheck" trx id 2374048284 lock mode IX lock hold time 0 wait time before grant 0 
      RECORD LOCKS space id 199 page no 868 n bits 832 index "UserName" of table "hotspot"."radcheck" trx table locks 1 total table locks 3  trx id 2374048284 lock_mode X locks gap before rec insert intention waiting lock hold time 0 wait time before grant 0 
      ---TRANSACTION 2374048283, ACTIVE (PREPARED) 0 sec
      3 lock struct(s), heap size 1184, 1 row lock(s), undo log entries 2
      MySQL thread id 15, OS thread handle 0x7f9872d4c700, query id 117572104 Waiting for prior transaction to commit
      Trx #rec lock waits 160406 #table lock waits 150
      Trx total rec lock wait time 12689 SEC
      Trx total table lock wait time 0 SEC
      TABLE LOCK table "hotspot"."active_sessions" trx id 2374048283 lock mode IX lock hold time 0 wait time before grant 0 
      RECORD LOCKS space id 152 page no 10110 n bits 80 index "PRIMARY" of table "hotspot"."active_sessions" trx table locks 2 total table locks 1  trx id 2374048283 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
      TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048283 lock mode IX lock hold time 0 wait time before grant 0 
      ---TRANSACTION 2374048282, ACTIVE (PREPARED) 0 sec
      5 lock struct(s), heap size 1184, 3 row lock(s), undo log entries 2
      MySQL thread id 12, OS thread handle 0x7f96c5287700, query id 117572102 Waiting for prior transaction to commit
      Trx #rec lock waits 159927 #table lock waits 139
      Trx total rec lock wait time 12700 SEC
      Trx total table lock wait time 0 SEC
      TABLE LOCK table "hotspot"."radcheck" trx id 2374048282 lock mode IX lock hold time 0 wait time before grant 0 
      RECORD LOCKS space id 199 page no 533 n bits 664 index "UserName" of table "hotspot"."radcheck" trx table locks 2 total table locks 3  trx id 2374048282 lock_mode X lock hold time 0 wait time before grant 0 
      RECORD LOCKS space id 199 page no 1179 n bits 160 index "PRIMARY" of table "hotspot"."radcheck" trx table locks 2 total table locks 3  trx id 2374048282 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
      RECORD LOCKS space id 199 page no 533 n bits 664 index "UserName" of table "hotspot"."radcheck" trx table locks 2 total table locks 3  trx id 2374048282 lock_mode X locks gap before rec lock hold time 0 wait time before grant 0 
      TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048282 lock mode IX lock hold time 0 wait time before grant 0 
      ---TRANSACTION 2374048281, ACTIVE (PREPARED) 0 sec
      5 lock struct(s), heap size 1184, 7 row lock(s), undo log entries 4
      MySQL thread id 20, OS thread handle 0x7f986f86f700, query id 117572100 closing tables
      Trx #rec lock waits 158976 #table lock waits 157
      Trx total rec lock wait time 12524 SEC
      Trx total table lock wait time 1 SEC
      TABLE LOCK table `hotspot`.`radreply` trx id 2374048281 lock mode IX lock hold time 0 wait time before grant 0 
      RECORD LOCKS space id 11812 page no 599 n bits 752 index `UserName` of table `hotspot`.`radreply` trx table locks 2 total table locks 2  trx id 2374048281 lock_mode X lock hold time 0 wait time before grant 0 
      RECORD LOCKS space id 11812 page no 138 n bits 304 index `PRIMARY` of table `hotspot`.`radreply` trx table locks 2 total table locks 2  trx id 2374048281 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
      RECORD LOCKS space id 11812 page no 599 n bits 752 index `UserName` of table `hotspot`.`radreply` trx table locks 2 total table locks 2  trx id 2374048281 lock_mode X locks gap before rec lock hold time 0 wait time before grant 0 
      TABLE LOCK table `mysql`.`gtid_slave_pos` trx id 2374048281 lock mode IX lock hold time 0 wait time before grant 0 
      ---TRANSACTION 2374048280, ACTIVE (PREPARED) 0 sec
      7 lock struct(s), heap size 1184, 5 row lock(s), undo log entries 7
      MySQL thread id 5, OS thread handle 0x7f96c5362700, query id 117572098 closing tables
      Trx #rec lock waits 158864 #table lock waits 153
      Trx total rec lock wait time 12568 SEC
      Trx total table lock wait time 0 SEC
      TABLE LOCK table "hotspot"."radusage" trx id 2374048280 lock mode IX lock hold time 0 wait time before grant 0 
      TABLE LOCK table "hotspot"."UserInfo" trx id 2374048280 lock mode IX lock hold time 0 wait time before grant 0 
      RECORD LOCKS space id 15794 page no 1148 n bits 112 index "PRIMARY" of table "hotspot"."UserInfo" trx table locks 4 total table locks 5  trx id 2374048280 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
      TABLE LOCK table "hotspot"."radacct" trx id 2374048280 lock mode IX lock hold time 0 wait time before grant 0 
      RECORD LOCKS space id 11949 page no 135350 n bits 144 index "PRIMARY" of table "hotspot"."radacct" trx table locks 4 total table locks 5  trx id 2374048280 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
      TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048280 lock mode IX lock hold time 0 wait time before grant 0 
      RECORD LOCKS space id 6112 page no 3 n bits 336 index "PRIMARY" of table "mysql"."gtid_slave_pos" trx table locks 4 total table locks 12  trx id 2374048280 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
      ---TRANSACTION 2374048276, ACTIVE (PREPARED) 0 sec
      6 lock struct(s), heap size 1184, 2 row lock(s), undo log entries 4
      MySQL thread id 13, OS thread handle 0x7f9872d95700, query id 117572093 Waiting for prior transaction to commit
      Trx #rec lock waits 159427 #table lock waits 145
      Trx total rec lock wait time 12551 SEC
      Trx total table lock wait time 0 SEC
      TABLE LOCK table "hotspot"."radusage" trx id 2374048276 lock mode IX lock hold time 0 wait time before grant 0 
      TABLE LOCK table "hotspot"."UserInfo" trx id 2374048276 lock mode IX lock hold time 0 wait time before grant 0 
      RECORD LOCKS space id 15794 page no 509 n bits 104 index "PRIMARY" of table "hotspot"."UserInfo" trx table locks 4 total table locks 5  trx id 2374048276 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
      TABLE LOCK table "hotspot"."radacct" trx id 2374048276 lock mode IX lock hold time 0 wait time before grant 0 
      RECORD LOCKS space id 11949 page no 138781 n bits 136 index "PRIMARY" of table "hotspot"."radacct" trx table locks 4 total table locks 5  trx id 2374048276 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
      TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048276 lock mode IX lock hold time 0 wait time before grant 0 
      ---TRANSACTION 2374048253, ACTIVE (PREPARED) 51 sec
      3 lock struct(s), heap size 1184, 1 row lock(s), undo log entries 2
      MySQL thread id 10, OS thread handle 0x7f9872e27700, query id 117571943 Waiting for prior transaction to commit
      Trx #rec lock waits 159074 #table lock waits 147
      Trx total rec lock wait time 12529 SEC
      Trx total table lock wait time 1 SEC
      TABLE LOCK table "hotspot"."radacct" trx id 2374048253 lock mode IX lock hold time 51 wait time before grant 0 
      RECORD LOCKS space id 11949 page no 127327 n bits 136 index "PRIMARY" of table "hotspot"."radacct" trx table locks 2 total table locks 5  trx id 2374048253 lock_mode X locks rec but not gap lock hold time 51 wait time before grant 0 
      TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048253 lock mode IX lock hold time 51 wait time before grant 0 
      ---TRANSACTION 2374048252, ACTIVE (PREPARED) 51 sec
      3 lock struct(s), heap size 1184, 1 row lock(s), undo log entries 1
      MySQL thread id 17, OS thread handle 0x7f9872cba700, query id 117571938 Waiting for prior transaction to commit
      COMMIT
      Trx #rec lock waits 159652 #table lock waits 145
      Trx total rec lock wait time 12876 SEC
      Trx total table lock wait time 0 SEC
      TABLE LOCK table "hotspot"."badauth_counts" trx id 2374048252 lock mode IX lock hold time 51 wait time before grant 0 
      RECORD LOCKS space id 2864 page no 5 n bits 160 index "username" of table "hotspot"."badauth_counts" trx table locks 2 total table locks 1  trx id 2374048252 lock_mode X locks gap before rec lock hold time 51 wait time before grant 0 
      TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048252 lock mode IX lock hold time 51 wait time before grant 0 
      ---TRANSACTION 2374048251, ACTIVE (PREPARED) 51 sec
      6 lock struct(s), heap size 1184, 2 row lock(s), undo log entries 4
      MySQL thread id 6, OS thread handle 0x7f96c511a700, query id 117571935 Waiting for prior transaction to commit
      Trx #rec lock waits 159091 #table lock waits 152
      Trx total rec lock wait time 12655 SEC
      Trx total table lock wait time 0 SEC
      TABLE LOCK table "hotspot"."radusage" trx id 2374048251 lock mode IX lock hold time 51 wait time before grant 0 
      TABLE LOCK table "hotspot"."UserInfo" trx id 2374048251 lock mode IX lock hold time 51 wait time before grant 0 
      RECORD LOCKS space id 15794 page no 1216 n bits 104 index "PRIMARY" of table "hotspot"."UserInfo" trx table locks 4 total table locks 5  trx id 2374048251 lock_mode X locks rec but not gap lock hold time 51 wait time before grant 0 
      TABLE LOCK table "hotspot"."radacct" trx id 2374048251 lock mode IX lock hold time 51 wait time before grant 0 
      RECORD LOCKS space id 11949 page no 138773 n bits 136 index "PRIMARY" of table "hotspot"."radacct" trx table locks 4 total table locks 5  trx id 2374048251 lock_mode X locks rec but not gap lock hold time 51 wait time before grant 0 
      TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048251 lock mode IX lock hold time 51 wait time before grant 0 
      ---TRANSACTION 2374048249, ACTIVE (PREPARED) 51 sec
      2 lock struct(s), heap size 360, 0 row lock(s), undo log entries 2
      MySQL thread id 8, OS thread handle 0x7f96c50d1700, query id 117571928 Waiting for prior transaction to commit
      Trx #rec lock waits 159554 #table lock waits 148
      Trx total rec lock wait time 12522 SEC
      Trx total table lock wait time 1 SEC
      TABLE LOCK table "hotspot"."radreply" trx id 2374048249 lock mode IX lock hold time 51 wait time before grant 0 
      TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048249 lock mode IX lock hold time 51 wait time before grant 0 
      ---TRANSACTION 2374048248, ACTIVE (PREPARED) 51 sec
      5 lock struct(s), heap size 1184, 3 row lock(s), undo log entries 2
      MySQL thread id 16, OS thread handle 0x7f96c5319700, query id 117571926 Waiting for prior transaction to commit
      Trx #rec lock waits 159396 #table lock waits 168
      Trx total rec lock wait time 12611 SEC
      Trx total table lock wait time 0 SEC
      TABLE LOCK table "hotspot"."radcheck" trx id 2374048248 lock mode IX lock hold time 51 wait time before grant 0 
      RECORD LOCKS space id 199 page no 868 n bits 832 index "UserName" of table "hotspot"."radcheck" trx table locks 2 total table locks 3  trx id 2374048248 lock_mode X lock hold time 51 wait time before grant 0 
      RECORD LOCKS space id 199 page no 1179 n bits 160 index "PRIMARY" of table "hotspot"."radcheck" trx table locks 2 total table locks 3  trx id 2374048248 lock_mode X locks rec but not gap lock hold time 51 wait time before grant 0 
      RECORD LOCKS space id 199 page no 868 n bits 832 index "UserName" of table "hotspot"."radcheck" trx table locks 2 total table locks 3  trx id 2374048248 lock_mode X locks gap before rec lock hold time 51 wait time before grant 0 
      TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048248 lock mode IX lock hold time 51 wait time before grant 0 
      --------
      FILE I/O
      --------
      

      binlog and more detailed information in private upload (two files)

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            danblack Daniel Black added a comment -

            looks like the only active transaction, 2374048284, is waiting on the lock (199 page no 868 n bits 832 index "UserName" of table "hotspot"."radcheck) held by 2374048248, ACTIVE (PREPARED)

            Show
            danblack Daniel Black added a comment - looks like the only active transaction, 2374048284, is waiting on the lock (199 page no 868 n bits 832 index "UserName" of table "hotspot"."radcheck) held by 2374048248, ACTIVE (PREPARED)
            Hide
            elenst Elena Stepanova added a comment -

            Kristian Nielsen,

            Could you please take a look at the provided info to check if you see a clear bug in there, and if you do, if you need any more data to investigate it?

            Thanks.

            Show
            elenst Elena Stepanova added a comment - Kristian Nielsen , Could you please take a look at the provided info to check if you see a clear bug in there, and if you do, if you need any more data to investigate it? Thanks.
            Hide
            knielsen Kristian Nielsen added a comment -

            So we see in the output 20 parallel replication worker threads.
            1 in update (waiting for row lock?)
            2 in "closing tables"
            rest in waiting for prior transactions.

            It is not clear to me that this is not simply a long-running transaction
            that takes more time than lock wait timeout?

            ------- TRX HAS BEEN WAITING 0 SEC FOR THIS LOCK TO BE GRANTED:
            RECORD LOCKS space id 199 page no 868 n bits 832 index "UserName" of table "hotspot"."radcheck" trx table locks 1 total table locks 3  trx id 2374048284 lock_mode X locks gap before rec insert intention waiting lock hold time 0 wait time before grant 0 
            

            If the transaction has been waiting only "0 seconds" then it suggests that
            the system is actively progressing. Or alternatively, I am misunderstanding
            what InnoDB output means (i am not familiar with it).

            It is also possible that the parallel replication is hanging? There were a
            number of parallel replication deadlocks fixed recently. It is not possible
            to see from the provided information if any of these might be the cause (or
            indeed even that there is a deadlock).

            Show
            knielsen Kristian Nielsen added a comment - So we see in the output 20 parallel replication worker threads. 1 in update (waiting for row lock?) 2 in "closing tables" rest in waiting for prior transactions. It is not clear to me that this is not simply a long-running transaction that takes more time than lock wait timeout? ------- TRX HAS BEEN WAITING 0 SEC FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 199 page no 868 n bits 832 index "UserName" of table "hotspot"."radcheck" trx table locks 1 total table locks 3 trx id 2374048284 lock_mode X locks gap before rec insert intention waiting lock hold time 0 wait time before grant 0 If the transaction has been waiting only "0 seconds" then it suggests that the system is actively progressing. Or alternatively, I am misunderstanding what InnoDB output means (i am not familiar with it). It is also possible that the parallel replication is hanging? There were a number of parallel replication deadlocks fixed recently. It is not possible to see from the provided information if any of these might be the cause (or indeed even that there is a deadlock).
            Hide
            knielsen Kristian Nielsen added a comment -

            Hm, another thought: This CREATE TABLE UserExpiry SELECT ... is run as a
            direct query on the slave (not replicated) ?

            It is possible that this statement conflicts with locks needed by replicated
            transactions, causing a deadlock. Such deadlocks may not be detected.

            Say the direct transaction is U, and T1, T2 are being replicated in
            parallel. All access table UserExpiry.

            If T1 is waiting for a row lock held by U, and U is waiting for another row
            lock held by T2, then this might cause a deadlock that is not automatically
            detected (except by timeout). Because T2 is waiting for T1 to commit first.

            This is kind of a general restriction with parallel replication, that the
            DBA/application is responsible for not running any conflicting direct
            transactions on the slave.

            Though merely reading a table is not obviously something that should
            conflict with replication. It might be worth investigating further if such
            conflicts are actually possible (I did not test). Conflicts are only
            possible for statements that keep row locks for the duration of the
            transaction. It might be interesting to check if the conflict also occurs if
            the CREATE is run in READ COMMITTED and with sql_log_bin=0.

            Show
            knielsen Kristian Nielsen added a comment - Hm, another thought: This CREATE TABLE UserExpiry SELECT ... is run as a direct query on the slave (not replicated) ? It is possible that this statement conflicts with locks needed by replicated transactions, causing a deadlock. Such deadlocks may not be detected. Say the direct transaction is U, and T1, T2 are being replicated in parallel. All access table UserExpiry. If T1 is waiting for a row lock held by U, and U is waiting for another row lock held by T2, then this might cause a deadlock that is not automatically detected (except by timeout). Because T2 is waiting for T1 to commit first. This is kind of a general restriction with parallel replication, that the DBA/application is responsible for not running any conflicting direct transactions on the slave. Though merely reading a table is not obviously something that should conflict with replication. It might be worth investigating further if such conflicts are actually possible (I did not test). Conflicts are only possible for statements that keep row locks for the duration of the transaction. It might be interesting to check if the conflict also occurs if the CREATE is run in READ COMMITTED and with sql_log_bin=0.
            Hide
            danblack Daniel Black added a comment -

            It is also possible that the parallel replication is hanging?

            sounds plausible.

            There were a number of parallel replication deadlocks fixed recently. It is not possible to see from the provided information if any of these might be the cause

            Looking forward to the next release then.

            (or indeed even that there is a deadlock).

            So a MDEV-7802 back port could be useful.

            This CREATE TABLE UserExpiry SELECT ... is run as a direct query on the slave (not replicated) ?

            yes

            If T1 is waiting for a row lock held by U, and U is waiting for another row lock held by T2, then this might cause a deadlock that is not automatically detected (except by timeout). Because T2 is waiting for T1 to commit first.

            sounds plausible

            This is kind of a general restriction with parallel replication, that the DBA/application is responsible for not running any conflicting direct

            transactions on the slave.

            sounds like a worth while addition to the documentation

            It might be interesting to check if the conflict also occurs if the CREATE is run in READ COMMITTED and with sql_log_bin=0.

            Ok. I will test this out.

            Show
            danblack Daniel Black added a comment - It is also possible that the parallel replication is hanging? sounds plausible. There were a number of parallel replication deadlocks fixed recently. It is not possible to see from the provided information if any of these might be the cause Looking forward to the next release then. (or indeed even that there is a deadlock). So a MDEV-7802 back port could be useful. This CREATE TABLE UserExpiry SELECT ... is run as a direct query on the slave (not replicated) ? yes If T1 is waiting for a row lock held by U, and U is waiting for another row lock held by T2, then this might cause a deadlock that is not automatically detected (except by timeout). Because T2 is waiting for T1 to commit first. sounds plausible This is kind of a general restriction with parallel replication, that the DBA/application is responsible for not running any conflicting direct transactions on the slave. sounds like a worth while addition to the documentation It might be interesting to check if the conflict also occurs if the CREATE is run in READ COMMITTED and with sql_log_bin=0. Ok. I will test this out.

              People

              • Assignee:
                knielsen Kristian Nielsen
                Reporter:
                danblack Daniel Black
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: