Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7610

MariaDB Galera Cluster Table locking issue after mysqldump execution

    Details

    • Type: Bug
    • Status: Stalled
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 5.5.41-galera
    • Fix Version/s: None
    • Component/s: Galera
    • Environment:
      CentOS release 6.5 2.6.32-431.3.1.el6.x86_64
      VM, 4GB RAM
      MariaDB Galera Cluster consists of 3 nodes

      Description

      We updated MariaDB Galera Version to 5.5.41 on 19th of January.
      We start a daily mysqldump on 21:00 CET. This worked for ~ 1 year without any issues.
      Last night (18th of FEB), the following issue happend at 21:00 (9 p.m.):

      • mysqldump starts on Node 1 with the following parameters:
        mysqldump -uroot -p --create-options {DATABASE_NAME}

        -f

      • simultaneously, the following query did not complete on Node 1 and remains with the state "sql end" on the server:
        INSERT INTO ` {DBNAME}`.`{TABLE_NAME}` (isin, date, count )\n VALUES ('XXXX12345', CURDATE() , 1 )\n ON DUPLICATE KEY UPDATE count=count+1

        * full mysqldump on Node 1 completed with a duration of only a minute without any issues like every day, dumps are valid and complete

        * new incoming connections on Node 1 try to access the same table, but are waiting in state "Waiting for table level lock"
        INSERT INTO `{DBNAME}

        `.`

        {TABLE_NAME}

        ` (isin, date, count )\n VALUES ('YYYY67890', CURDATE() , 1 )\n ON DUPLICATE KEY UPDATE count=count+1

      • INSERT/UPDATE statements on the same table on Node 2/3 do also not complete. Sessions are haning with similar statements in states "update" and "query end", first sessions appeared on 21:00
      • Number of connections increases on all 3 nodes, since all nodes are waiting for first INSERT statement on Node 1 to complete, therefore replication stopped.
      • Nodes 2 and 3 ran out of free connections and resources at ~ 21:10
      • INSERT statement on Node 1 could not be killed, whole Cluster needed to be shut down and restarted
      • Errorlogs do not show any issues until the high amount of connections leads to service unavailability
      • Nodes did not write any Core Dumps
      • This Incident is (hopefully) not reproducable, as already mentioned the cluster ran wihtout any issues about a year, the mysqldump is made every day at the same time.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            elenst Elena Stepanova added a comment -

            Assigning to Nirbhay Choubey to see if it rings a bell.

            Show
            elenst Elena Stepanova added a comment - Assigning to Nirbhay Choubey to see if it rings a bell.
            Hide
            nirbhay_c Nirbhay Choubey added a comment -

            I tried to reproduce this issue by performing a dump with records being inserted simultaneously. I found no issues whatsoever.
            Would you mind sharing the table structures? Also, are there myisam tables too?

            Show
            nirbhay_c Nirbhay Choubey added a comment - I tried to reproduce this issue by performing a dump with records being inserted simultaneously. I found no issues whatsoever. Would you mind sharing the table structures? Also, are there myisam tables too?
            Hide
            TK Tobias Kleinke added a comment -

            Hi Nirbhay,
            thanks for your response.

            I am afraid a cannot provide detailed meta-data information here, but this is what I can tell you:

            • database has 83 tables, 78 are InnoDB, 5 are myISAM (we set "wsrep_replicate_myisam = 1" in my.cnf - I know this is still experimental...)
            • locked table was an InnoDB table and has about 2 million rows (data length: ~ 140 MB) and is the biggest table of the database
            • locked table has 3 columns with the following types: varchar(15), int(11), date / primary key is: varchar(15), date

            Regards,
            Tobias

            Show
            TK Tobias Kleinke added a comment - Hi Nirbhay, thanks for your response. I am afraid a cannot provide detailed meta-data information here, but this is what I can tell you: database has 83 tables, 78 are InnoDB, 5 are myISAM (we set "wsrep_replicate_myisam = 1" in my.cnf - I know this is still experimental...) locked table was an InnoDB table and has about 2 million rows (data length: ~ 140 MB) and is the biggest table of the database locked table has 3 columns with the following types: varchar(15), int(11), date / primary key is: varchar(15), date Regards, Tobias

              People

              • Assignee:
                nirbhay_c Nirbhay Choubey
                Reporter:
                TK Tobias Kleinke
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - 2 hours
                  2h
                  Remaining:
                  Remaining Estimate - 2 hours
                  2h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified