Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4404

Galera Node throws "Could not read field" error and drops out of cluster

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 5.5.29-galera
    • Fix Version/s: 5.5.41-galera
    • Component/s: Galera
    • Labels:
    • Environment:
      CentOS release 6.3

      Description

      After a couple days of running one node in a 2 node cluster (with arbitrator) will error out saying "Could not read field" Error 1610 and then "Could not execute Update_rows event" Error 1030.

      The other node continued. The field exists in the table. Nodes were initialized using xtrabackup method.

      130417 13:54:15 [ERROR] Slave SQL: Could not read field 'UPDT_DB_DTTM' of table 'GLOBAL.CLIENT_LAST_OPEN_PLATFORM', Error_code: 1610
      130417 13:54:15 [ERROR] Slave SQL: Could not execute Update_rows event on table GLOBAL.CLIENT_LAST_OPEN_PLATFORM; Got error 1610 from storage engine, Error_code: 1030; handler error No Error!; the event's master log FIRST,
       end_log_pos 272, Error_code: 1030
      130417 13:54:15 [Warning] WSREP: RBR event 2 Update_rows apply warning: 1610, 222480722
      130417 13:54:15 [ERROR] WSREP: Failed to apply trx: source: 6b76b9ab-a623-11e2-0800-0bcb2fe17662 version: 2 local: 0 state: APPLYING flags: 1 conn_id: 7818 trx_id: 8694488863 seqnos (l: 60013290, g: 222480722, s: 222480721
      , d: 222480682, ts: 1366221255138895484)
      130417 13:54:15 [ERROR] WSREP: Failed to apply app buffer: seqno: 222480722, status: WSREP_FATAL
               at galera/src/replicator_smm.cpp:apply_wscoll():53
               at galera/src/replicator_smm.cpp:apply_trx_ws():120
      130417 13:54:15 [ERROR] WSREP: Node consistency compromized, aborting...
      130417 13:54:15 [Note] WSREP: Closing send monitor...
      130417 13:54:15 [Note] WSREP: Closed send monitor.
      130417 13:54:15 [Note] WSREP: gcomm: terminating thread
      130417 13:54:15 [Note] WSREP: gcomm: joining thread
      130417 13:54:15 [Note] WSREP: gcomm: closing backend
      130417 13:54:15 [Note] WSREP: view(view_id(NON_PRIM,382a82e8-a3d4-11e2-0800-2753044f506b,25) memb {
              382a82e8-a3d4-11e2-0800-2753044f506b,
      } joined {
      } left {
      } partitioned {
              6b76b9ab-a623-11e2-0800-0bcb2fe17662,
              ebe9df82-9e2c-11e2-0800-65ae0b1b9cdc,
      })
      130417 13:54:15 [Note] WSREP: view((empty))
      130417 13:54:15 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      130417 13:54:15 [Note] WSREP: gcomm: closed
      130417 13:54:15 [Note] WSREP: Flow-control interval: [16, 16]
      130417 13:54:15 [Note] WSREP: Received NON-PRIMARY.
      130417 13:54:15 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 222480778)
      130417 13:54:15 [Note] WSREP: Received self-leave message.
      130417 13:54:15 [Note] WSREP: Flow-control interval: [0, 0]
      130417 13:54:15 [Note] WSREP: Received SELF-LEAVE. Closing connection.
      130417 13:54:15 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 222480778)
      130417 13:54:15 [Note] WSREP: RECV thread exiting 0: Success
      130417 13:54:15 [Note] WSREP: recv_thread() joined.
      130417 13:54:15 [Note] WSREP: Closing slave action queue.
      130417 13:54:15 [Note] WSREP: /usr/sbin/mysqld: Terminated.
      130417 13:54:16 mysqld_safe Number of processes running now: 0
      130417 13:54:16 mysqld_safe WSREP: not restarting wsrep node automatically
      130417 13:54:16 mysqld_safe mysqld from pid file /database/data/4c-maria-02.pid ended
      

      From our settings:

      # logs and replication
      log-bin=mysql-bin
      binlog-format=ROW
      max_binlog_cache_size=1024G
      
      # Galera Settings
      wsrep_provider=/usr/lib64/galera/libgalera_smm.so
      wsrep_cluster_address=gcomm://10.200.0.10
      wsrep_cluster_name='client01_cluster'
      wsrep_node_name='4c-maria-02'
      wsrep_slave_threads=24
      wsrep_retry_autocommit=10
      wsrep_sst_method=xtrabackup
      wsrep_sst_auth=galera:AhwFVAahpZfh8BVG
      
      
      # innodb (xtradb) settings
      default_storage_engine=InnoDB
      innodb_file_per_table
      innodb_file_format=barracuda
      innodb_log_file_size=2000M
      innodb_log_files_in_group=2
      innodb_flush_log_at_trx_commit=2
      innodb_autoinc_lock_mode=2
      innodb_locks_unsafe_for_binlog=1
      

      MariaDB-galera was installed from repos.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            matt.wheeler Matthew Wheeler added a comment -

            I am happy to report this happened again and after we have updated to 5.5.32 with the extra logging.. I have a snapshot of both systems nodes in the cluster. Please let me know what files you wish me to send you.

            You have never seen a bunch of programmers and admins so happy over a crash!!

            Show
            matt.wheeler Matthew Wheeler added a comment - I am happy to report this happened again and after we have updated to 5.5.32 with the extra logging.. I have a snapshot of both systems nodes in the cluster. Please let me know what files you wish me to send you. You have never seen a bunch of programmers and admins so happy over a crash!!
            Hide
            seppo Seppo Jaakola added a comment -

            Matthew, this is interesting indeed! Please upload your mysql error logs from both nodes. If you have GRA__.log file related to the crash, it will be needed as well. If the information is sensitive (GRA file contains the transaction data in plain text), you can email directly to me as well (seppo.jaakola@codership.com).

            We recently tracked a problem related to binlog event annotation processing. If you have binlog_annotate_row_events enabled, it may affect the the issue you are facing.

            Show
            seppo Seppo Jaakola added a comment - Matthew, this is interesting indeed! Please upload your mysql error logs from both nodes. If you have GRA_ _ .log file related to the crash, it will be needed as well. If the information is sensitive (GRA file contains the transaction data in plain text), you can email directly to me as well (seppo.jaakola@codership.com). We recently tracked a problem related to binlog event annotation processing. If you have binlog_annotate_row_events enabled, it may affect the the issue you are facing.
            Hide
            matt.wheeler Matthew Wheeler added a comment -

            No sensitive info but I emailed the files to you anyway. Let me know if you didn't get them.

            I included our cnf files. we do not have binlog_annotate_row_events set and I don't think it is enabled by default.

            Let me know if you need anything else.

            thanks again.

            Show
            matt.wheeler Matthew Wheeler added a comment - No sensitive info but I emailed the files to you anyway. Let me know if you didn't get them. I included our cnf files. we do not have binlog_annotate_row_events set and I don't think it is enabled by default. Let me know if you need anything else. thanks again.
            Hide
            matt.wheeler Matthew Wheeler added a comment -

            After the last couple of updates we have not seen this issue again.

            This could be closed.

            Show
            matt.wheeler Matthew Wheeler added a comment - After the last couple of updates we have not seen this issue again. This could be closed.
            Hide
            elenst Elena Stepanova added a comment -

            Nirbhay Choubey,

            Could you please confirm it should be closed, and close if it should be?

            Show
            elenst Elena Stepanova added a comment - Nirbhay Choubey , Could you please confirm it should be closed, and close if it should be?

              People

              • Assignee:
                nirbhay_c Nirbhay Choubey
                Reporter:
                matt.wheeler Matthew Wheeler
              • Votes:
                3 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: