Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8323

Failed DDL execution can cause a full Galera Cluster crash

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 5.5.43-galera, 10.0.19-galera
    • Fix Version/s: N/A
    • Component/s: Galera, wsrep
    • Labels:

      Description

      Consider the following sequence of events happening with Galera Cluster if wsrep_OSU_method is set to TOI:

      • You have a 3 node cluster: node1, node2, and node3.
      • node1 is almost out of disk space.
      • You execute DDL on node1, such as: ALTER TABLE tab DROP COLUMN col;
      • node1 executes the DDL statement, and tells node2 and node3 to execute it in Total Order Isolation.
      • ALTER TABLE statement fails on node1 because it ran out of disk space, but the command succeeds on node2 and node3.

      node1 will see an error like this:

      2015-06-16 04:45:55 7f7b000c7700 InnoDB: Error: Write to file (merge) failed at offset 68157440.
      InnoDB: 1048576 bytes should have been written, only 1036288 were written.
      InnoDB: Operating system error number 28.
      InnoDB: Check that your OS and file system support files of this size.
      InnoDB: Check also that the disk is not full or a disk quota exceeded.
      InnoDB: Error number 28 means 'No space left on device'.
      InnoDB: Some operating system error numbers are described at
      InnoDB: http://dev.mysql.com/doc/refman/5.6/en/operating-system-error-codes.html
      150616 4:45:55 [ERROR] Slave SQL: Error 'Got error 64 'Temp file write failure' from InnoDB' on query. Default database: 'db1'. Query: 'ALTER TABLE tab DROP COLUMN col', Internal MariaDB error code: 1296
      150616 4:45:55 [Warning] WSREP: RBR event 1 Query apply warning: 1, 19743667
      150616 4:45:55 [Warning] WSREP: Ignoring error for TO isolated action: source: 9f6bdb3d-0bc1-11e5-a9f2-ca15da9a1a8b version: 3 local: 0 state: APPLYING flags: 65 conn_id: 24689371 trx_id: -1 seqnos (l: 3471879, g: 19743667, s: 19743666, d: 19743666, ts: 912410740656233)
      
      • Since node1 now has a different table definition than node2 and node3, you will eventually have consistency errors.

      node2 and node3 might see errors like this:

      150616 5:15:11 [ERROR] Slave SQL: Column 11 of table 'db1.tab' cannot be converted from type 'int' to type 'date', Internal MariaDB error code: 1677
      150616 5:15:11 [Warning] WSREP: RBR event 2 Write_rows_v1 apply warning: 3, 19743684
      150616 5:15:11 [ERROR] WSREP: Failed to apply trx: source: 75edc58a-0bb2-11e5-a1fe-cb59d7f111b4 version: 3 local: 0 state: APPLYING flags: 1 conn_id: 23347068 trx_id: 59826665 seqnos (l: 3742229, g: 19743684, s: 19743683, d: 19743667, ts: 768200703670749)
      150616 5:15:11 [ERROR] WSREP: Failed to apply trx 19743684 4 times
      150616 5:15:11 [ERROR] WSREP: Node consistency compromized, aborting...
      

      And node1 will see node2 and node3 leave the cluster, causing a loss of quorum and total cluster failure:

      150616 5:15:12 [Note] WSREP: forgetting 07459bc1 (tcp://$node2_ip:4567)
      150616 5:15:12 [Note] WSREP: (75edc58a, 'tcp://0.0.0.0:4567') address 'tcp://10.0.0.72:4567' pointing to uuid 75edc58a is blacklisted, skipping
      150616 5:15:12 [Note] WSREP: forgetting 9f6bdb3d (tcp://$node3_ip:4567)
      150616 5:15:12 [Note] WSREP: Node 75edc58a state prim
      150616 5:15:12 [Note] WSREP: view(view_id(PRIM,75edc58a,10) memb {
      75edc58a,0
      } joined {
      } left {
      } partitioned {
      07459bc1,0
      9f6bdb3d,0
      })
      

      Should it be possible for this to happen?

      Can we fix this by making a node crash if DDL fails if wsrep_OSU_method is set to TOI? Making one node crash is probably better than total cluster failure most of the time.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            nirbhay_c Nirbhay Choubey added a comment - - edited

            Hi @jeoffmontee

            • You execute DDL on node1, such as: ALTER TABLE tab DROP COLUMN col;`

            This is not exactly what happens internally. In TOI, the DDL is replicated to other
            nodes during parsing phase and thus the execution happens in the same slot on
            all the nodes.
            Now, if the DDL fails on one of the nodes, the ALTERed object definition is outdated
            on this node. As a result, the subsequent DMLs (given they have been updated post
            ALTER and/or are compatible with the the new object definition) may fail on this node,
            eventually resulting in its eviction.

            Since node1 now has a different table definition than node2 and node3, you will eventually have consistency errors.
            node2 and node3 might see errors like this:

            Post-ALTER, in my opinion, the DMLs should also be tuned accordingly (made compatible
            to the new altered definition).

            Show
            nirbhay_c Nirbhay Choubey added a comment - - edited Hi @jeoffmontee You execute DDL on node1, such as: ALTER TABLE tab DROP COLUMN col;` This is not exactly what happens internally. In TOI, the DDL is replicated to other nodes during parsing phase and thus the execution happens in the same slot on all the nodes. Now, if the DDL fails on one of the nodes, the ALTERed object definition is outdated on this node. As a result, the subsequent DMLs (given they have been updated post ALTER and/or are compatible with the the new object definition) may fail on this node, eventually resulting in its eviction. Since node1 now has a different table definition than node2 and node3, you will eventually have consistency errors. node2 and node3 might see errors like this: Post-ALTER, in my opinion, the DMLs should also be tuned accordingly (made compatible to the new altered definition).
            Hide
            GeoffMontee Geoff Montee added a comment -

            Hi Nirbhay Choubey,

            Now, if the DDL fails on one of the nodes, the ALTERed object definition is outdated
            on this node. As a result, the subsequent DMLs (given they have been updated post
            ALTER and/or are compatible with the the new object definition) may fail on this node,
            eventually resulting in its eviction.

            If that is what is supposed to happen, I wonder if it isn't working properly. The series of events described in the JIRA issue, including the full cluster crash, has actually happened.

            Rather than evicting node1, node2 and node3 thought that they were compromised, so they intentionally crashed, which created a loss of quorum. Only node1 (with the outdated schema) was left alive.

            Show
            GeoffMontee Geoff Montee added a comment - Hi Nirbhay Choubey , Now, if the DDL fails on one of the nodes, the ALTERed object definition is outdated on this node. As a result, the subsequent DMLs (given they have been updated post ALTER and/or are compatible with the the new object definition) may fail on this node, eventually resulting in its eviction. If that is what is supposed to happen, I wonder if it isn't working properly. The series of events described in the JIRA issue, including the full cluster crash, has actually happened. Rather than evicting node1, node2 and node3 thought that they were compromised, so they intentionally crashed, which created a loss of quorum. Only node1 (with the outdated schema) was left alive.

              People

              • Assignee:
                nirbhay_c Nirbhay Choubey
                Reporter:
                GeoffMontee Geoff Montee
              • Votes:
                1 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: