Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6860

Parallel async replication hangs on a Galera node

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 10.0.13-galera
    • Fix Version/s: None
    • Component/s: Galera, Replication
    • Labels:

      Description

      In a setup where a master replicates to a Galera node acting as a slave with slave-parallel-threads > 1, the slave threads hang which brings replication to a halt.

      How to reproduce:

      • Run the attached script on master.

      On slave:

      MariaDB [test]> show processlist;
      +----+-------------+-----------------+------+---------+------+--------------------------------------------------------------------------------+------------------+----------+
      | Id | User        | Host            | db   | Command | Time | State                                                                          | Info             | Progress |
      +----+-------------+-----------------+------+---------+------+--------------------------------------------------------------------------------+------------------+----------+
      |  1 | system user |                 | NULL | Sleep   |  840 | NULL                                                                           | NULL             |    0.000 |
      |  2 | system user |                 | NULL | Sleep   |  840 | wsrep aborter idle                                                             | NULL             |    0.000 |
      |  5 | system user |                 | NULL | Sleep   |  838 | NULL                                                                           | NULL             |    0.000 |
      |  6 | system user |                 | NULL | Sleep   |  838 | NULL                                                                           | NULL             |    0.000 |
      |  7 | system user |                 | NULL | Sleep   |  838 | NULL                                                                           | NULL             |    0.000 |
      |  8 | system user |                 | NULL | Connect |  263 | Waiting for prior transaction to start commit before starting next transaction | NULL             |    0.000 |
      |  9 | system user |                 | NULL | Connect |  263 | closing tables                                                                 | NULL             |    0.000 |
      | 10 | system user |                 | NULL | Connect |  263 | closing tables                                                                 | NULL             |    0.000 |
      | 11 | system user |                 | NULL | Connect |  263 | closing tables                                                                 | NULL             |    0.000 |
      | 12 | root        | localhost:52597 | test | Query   |    0 | init                                                                           | show processlist |    0.000 |
      | 13 | system user |                 | NULL | Connect |  336 | Waiting for master to send event                                               | NULL             |    0.000 |
      | 14 | system user |                 | NULL | Connect |  336 | Slave has read all relay log; waiting for the slave I/O thread to update it    | NULL             |    0.000 |
      +----+-------------+-----------------+------+---------+------+--------------------------------------------------------------------------------+------------------+----------+
      12 rows in set (0.00 sec)
      
      MariaDB [test]> show slave status\G
      *************************** 1. row ***************************
                     Slave_IO_State: Waiting for master to send event
                        Master_Host: 127.0.0.1
                        Master_User: root
                        Master_Port: 15999
                      Connect_Retry: 60
                    Master_Log_File: nirbhay-VirtualBox-bin.000001
                Read_Master_Log_Pos: 146163
                     Relay_Log_File: nirbhay-VirtualBox-relay-bin.000002
                      Relay_Log_Pos: 1463
              Relay_Master_Log_File: nirbhay-VirtualBox-bin.000001
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: Yes
                    Replicate_Do_DB: 
                Replicate_Ignore_DB: 
                 Replicate_Do_Table: 
             Replicate_Ignore_Table: 
            Replicate_Wild_Do_Table: 
        Replicate_Wild_Ignore_Table: 
                         Last_Errno: 0
                         Last_Error: 
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 1163
                    Relay_Log_Space: 146773
                    Until_Condition: None
                     Until_Log_File: 
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File: 
                 Master_SSL_CA_Path: 
                    Master_SSL_Cert: 
                  Master_SSL_Cipher: 
                     Master_SSL_Key: 
              Seconds_Behind_Master: 268
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error: 
                     Last_SQL_Errno: 0
                     Last_SQL_Error: 
        Replicate_Ignore_Server_Ids: 
                   Master_Server_Id: 1
                     Master_SSL_Crl: 
                 Master_SSL_Crlpath: 
                         Using_Gtid: Slave_Pos
                        Gtid_IO_Pos: 0-1-1006
      1 row in set (0.00 sec)
      

      The node also hangs when shutdown.

      141009 11:38:17 [Note] ./bin/mysqld: Normal shutdown
      
      141009 11:38:17 [Note] WSREP: Stop replication
      141009 11:38:17 [Note] WSREP: Provider disconnect
      141009 11:38:17 [Note] WSREP: Closing send monitor...
      141009 11:38:17 [Note] WSREP: Closed send monitor.
      141009 11:38:17 [Note] WSREP: gcomm: terminating thread
      141009 11:38:17 [Note] WSREP: gcomm: joining thread
      141009 11:38:17 [Note] WSREP: gcomm: closing backend
      141009 11:38:17 [Note] WSREP: view((empty))
      141009 11:38:17 [Note] WSREP: Received self-leave message.
      141009 11:38:17 [Note] WSREP: gcomm: closed
      141009 11:38:17 [Note] WSREP: Flow-control interval: [0, 0]
      141009 11:38:17 [Note] WSREP: Received SELF-LEAVE. Closing connection.
      141009 11:38:17 [Note] WSREP: Shifting SYNCED -> CLOSED (TO: 10)
      141009 11:38:17 [Note] WSREP: RECV thread exiting 0: Success
      141009 11:38:17 [Note] WSREP: recv_thread() joined.
      141009 11:38:17 [Note] WSREP: Closing replication queue.
      141009 11:38:17 [Note] WSREP: Closing slave action queue.
      141009 11:38:19 [Note] WSREP: waiting for client connections to close: 5
      

        Gliffy Diagrams

          Attachments

            Activity

            There are no comments yet on this issue.

              People

              • Assignee:
                nirbhay_c Nirbhay Choubey
                Reporter:
                nirbhay_c Nirbhay Choubey
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: