Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6175

MariaDB Galera 5.5.36 Signal 6, Entire Cluster Dies

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 5.5.36-galera
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Environment:

      Description

      Galera died in a ball of fire this morning with all 3 Cluster nodes showing something like the follow (from the error log)

      140425  6:52:27 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000000 of size 134217728 bytes
      140425  6:53:02 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000000
      140425  7:22:34 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000001 of size 191230321 bytes
      140425  7:22:54 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000001
      140425  7:52:33 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000002 of size 189538622 bytes
      140425  7:52:55 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000002
      140425  8:22:33 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000003 of size 191615069 bytes
      140425  8:22:54 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000003
      140425  8:52:33 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000004 of size 182155485 bytes
      140425  9:02:49 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000005 of size 134217728 bytes
      140425  9:03:48 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
      
      To report this bug, see http://kb.askmonty.org/en/reporting-bugs
      
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
      
      Server version: 5.5.36-MariaDB-wsrep-log
      key_buffer_size=67108864
      read_buffer_size=1048576
      max_used_connections=558
      max_threads=302
      thread_count=356
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 702370 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
      
      Thread pointer: 0x0x7f17e9012000
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      140425  9:03:48 [Note] WSREP: declaring 62a2e8b4-c73f-11e3-bda3-daee80deac33 stable
      stack_bottom = 0x7f17d18d3940 thread_stack 0x48000
      140425  9:03:48 [Note] WSREP: (3ac797f0-c73f-11e3-a58e-37c0ddd67eee, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://172.29.33.169:4567 
      140425  9:03:48 [Note] WSREP: Node 3ac797f0-c73f-11e3-a58e-37c0ddd67eee state prim
      140425  9:03:49 [Note] WSREP: (3ac797f0-c73f-11e3-a58e-37c0ddd67eee, 'tcp://0.0.0.0:4567') reconnecting to 5efac803-cae4-11e3-bb7b-b67ea5991bad (tcp://172.29.33.169:4567), attempt 0
      140425  9:03:50 [Note] WSREP: (3ac797f0-c73f-11e3-a58e-37c0ddd67eee, 'tcp://0.0.0.0:4567') reconnecting to 62a2e8b4-c73f-11e3-bda3-daee80deac33 (tcp://172.29.33.167:4567), attempt 0
      140425  9:03:53 [Note] WSREP: evs::proto(3ac797f0-c73f-11e3-a58e-37c0ddd67eee, OPERATIONAL, view_id(REG,3ac797f0-c73f-11e3-a58e-37c0ddd67eee,12)) suspecting node: 62a2e8b4-c73f-11e3-bda3-daee80deac33
      140425  9:03:54 [Note] WSREP: evs::proto(3ac797f0-c73f-11e3-a58e-37c0ddd67eee, GATHER, view_id(REG,3ac797f0-c73f-11e3-a58e-37c0ddd67eee,12)) suspecting node: 62a2e8b4-c73f-11e3-bda3-daee80deac33
      /usr/sbin/mysqld(my_print_stacktrace+0x2b)[0xa9127b]
      /usr/sbin/mysqld(handle_fatal_signal+0x398)[0x6ea128]
      /lib64/libpthread.so.0[0x3d5c60f710]
      /lib64/libc.so.6(gsignal+0x35)[0x3d5c232925]
      /lib64/libc.so.6(abort+0x175)[0x3d5c234105]
      /usr/lib64/galera/libgalera_smm.so(_ZN6galera13Certification16purge_for_trx_v3EPNS_9TrxHandleE+0x265)[0x7f20b495c9a5]
      /usr/lib64/galera/libgalera_smm.so(_ZNK6galera13Certification15PurgeAndDiscardclERSt4pairIKlPNS_9TrxHandleEE+0xd2)[0x7f20b4965112]
      /usr/lib64/galera/libgalera_smm.so(_ZN6galera13Certification16purge_trxs_upto_Elb+0x91)[0x7f20b4960631]
      /usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM18process_commit_cutEll+0x88)[0x7f20b4987e98]
      /usr/lib64/galera/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x315)[0x7f20b4969055]
      /usr/lib64/galera/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x63)[0x7f20b49694e3]
      /usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x93)[0x7f20b4985a23]
      /usr/lib64/galera/libgalera_smm.so(galera_recv+0x23)[0x7f20b499f993]
      /usr/sbin/mysqld[0x69e83f]
      /usr/sbin/mysqld(start_wsrep_THD+0x385)[0x5270f5]
      /lib64/libpthread.so.0[0x3d5c6079d1]
      /lib64/libc.so.6(clone+0x6d)[0x3d5c2e8b6d]
      
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x0): is an invalid pointer
      Connection ID (thread ID): 7
      Status: NOT_KILLED
      
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=off
      
      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.
      140425 09:03:56 mysqld_safe Number of processes running now: 0
      140425 09:03:56 mysqld_safe WSREP: not restarting wsrep node automatically
      140425 09:03:56 mysqld_safe mysqld from pid file /var/lib/mysql/data/redacted.pid ended
      140425 09:15:24 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql/data
      140425 09:15:24 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/data/wsrep_recovery.wXdkzc' --pid-file='/var/lib/mysql/data/redacted.redacted.com-recover.pid'
      140425 09:15:34 mysqld_safe WSREP: Recovered position 0c4d2c70-c73f-11e3-8556-d312fc5c48bb:2509203
      140425  9:15:34 [Note] WSREP: wsrep_start_position var submitted: '0c4d2c70-c73f-11e3-8556-d312fc5c48bb:2509203'
      140425  9:15:34 [Note] WSREP: Read nil XID from storage engines, skipping position init
      140425  9:15:34 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
      140425  9:15:34 [Note] WSREP: wsrep_load(): Galera 25.3.2(r170) by Codership Oy <info@codership.com> loaded successfully.
      140425  9:15:34 [Note] WSREP: CRC-32C: using hardware acceleration.
      140425  9:15:34 [Note] WSREP: Found saved state: 0c4d2c70-c73f-11e3-8556-d312fc5c48bb:-1
      140425  9:15:34 [Note] WSREP: Passing config to GCS: base_host = 172.29.33.165; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql/data//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.proto_max = 5
      140425  9:15:34 [Note] WSREP: Assign initial position for certification: 2509203, protocol version: -1
      140425  9:15:34 [Note] WSREP: wsrep_sst_grab()
      140425  9:15:34 [Note] WSREP: Start replication
      140425  9:15:34 [Note] WSREP: Setting initial position to 0c4d2c70-c73f-11e3-8556-d312fc5c48bb:2509203
      140425  9:15:34 [Note] WSREP: protonet asio version 0
      140425  9:15:34 [Note] WSREP: Using CRC-32C (optimized) for message checksums.
      140425  9:15:34 [Note] WSREP: backend: asio
      140425  9:15:34 [Note] WSREP: GMCast version 0
      140425  9:15:34 [Note] WSREP: (11119c10-cc84-11e3-95fc-669db67d5915, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
      140425  9:15:34 [Note] WSREP: (11119c10-cc84-11e3-95fc-669db67d5915, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
      140425  9:15:34 [Note] WSREP: EVS version 0
      140425  9:15:34 [Note] WSREP: PC version 0
      140425  9:15:34 [Note] WSREP: gcomm: connecting to group 'redacted', peer '172.29.33.165:,172.29.33.167:,172.29.33.169:'
      140425  9:15:34 [Warning] WSREP: (11119c10-cc84-11e3-95fc-669db67d5915, 'tcp://0.0.0.0:4567') address 'tcp://172.29.33.165:4567' points to own listening address, blacklisting
      140425  9:15:37 [Warning] WSREP: no nodes coming from prim view, prim not possible
      140425  9:15:37 [Note] WSREP: view(view_id(NON_PRIM,11119c10-cc84-11e3-95fc-669db67d5915,1) memb {
      	11119c10-cc84-11e3-95fc-669db67d5915,0
      } joined {
      } left {
      } partitioned {
      })
      140425  9:15:38 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50214S), skipping check
      140425  9:16:07 [Note] WSREP: view((empty))
      140425  9:16:07 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
      	 at gcomm/src/pc.cpp:connect():141
      140425  9:16:07 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():196: Failed to open backend connection: -110 (Connection timed out)
      140425  9:16:07 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1291: Failed to open channel 'redacted' at 'gcomm://172.29.33.165,172.29.33.167,172.29.33.169': -110 (Connection timed out)
      140425  9:16:07 [ERROR] WSREP: gcs connect failed: Connection timed out
      140425  9:16:07 [ERROR] WSREP: wsrep::connect() failed: 7
      140425  9:16:07 [ERROR] Aborting
      
      140425  9:16:07 [Note] WSREP: Service disconnected.
      140425  9:16:08 [Note] WSREP: Some threads may fail to exit.
      140425  9:16:08 [Note] /usr/sbin/mysqld: Shutdown complete
      
      140425 09:16:08 mysqld_safe mysqld from pid file /var/lib/mysql/data/redacted.redacted.com.pid ended
      140425 09:16:46 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql/data
      140425 09:16:46 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/data/wsrep_recovery.ODDV5Q' --pid-file='/var/lib/mysql/data/redacted.redacted.com-recover.pid'
      140425 09:16:52 mysqld_safe WSREP: Recovered position 0c4d2c70-c73f-11e3-8556-d312fc5c48bb:2509203
      140425  9:16:52 [Note] WSREP: wsrep_start_position var submitted: '0c4d2c70-c73f-11e3-8556-d312fc5c48bb:2509203'
      140425  9:16:52 [Note] WSREP: Read nil XID from storage engines, skipping position init
      140425  9:16:52 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
      140425  9:16:52 [Note] WSREP: wsrep_load(): Galera 25.3.2(r170) by Codership Oy <info@codership.com> loaded successfully.
      140425  9:16:52 [Note] WSREP: CRC-32C: using hardware acceleration.
      140425  9:16:52 [Note] WSREP: Found saved state: 0c4d2c70-c73f-11e3-8556-d312fc5c48bb:-1
      140425  9:16:52 [Note] WSREP: Passing config to GCS: base_host = 172.29.33.165; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql/data//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.proto_max = 5
      140425  9:16:52 [Note] WSREP: Assign initial position for certification: 2509203, protocol version: -1
      140425  9:16:52 [Note] WSREP: wsrep_sst_grab()
      140425  9:16:52 [Note] WSREP: Start replication
      140425  9:16:52 [Note] WSREP: Setting initial position to 0c4d2c70-c73f-11e3-8556-d312fc5c48bb:2509203
      140425  9:16:52 [Note] WSREP: protonet asio version 0
      140425  9:16:52 [Note] WSREP: Using CRC-32C (optimized) for message checksums.
      140425  9:16:52 [Note] WSREP: backend: asio
      140425  9:16:52 [Note] WSREP: GMCast version 0
      140425  9:16:52 [Note] WSREP: (3f5f5d45-cc84-11e3-b5b7-f26a4b642ec0, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
      140425  9:16:52 [Note] WSREP: (3f5f5d45-cc84-11e3-b5b7-f26a4b642ec0, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
      140425  9:16:52 [Note] WSREP: EVS version 0
      140425  9:16:52 [Note] WSREP: PC version 0
      140425  9:16:52 [Note] WSREP: gcomm: connecting to group 'redacted', peer ''
      140425  9:16:52 [Note] WSREP: Node 3f5f5d45-cc84-11e3-b5b7-f26a4b642ec0 state prim
      140425  9:16:52 [Note] WSREP: view(view_id(PRIM,3f5f5d45-cc84-11e3-b5b7-f26a4b642ec0,1) memb {
      	3f5f5d45-cc84-11e3-b5b7-f26a4b642ec0,0
      } joined {
      } left {
      } partitioned {
      })
      140425  9:16:52 [Note] WSREP: gcomm: connected
      140425  9:16:52 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
      140425  9:16:52 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
      140425  9:16:52 [Note] WSREP: Opened channel 'redacted'
      140425  9:16:52 [Note] WSREP: Waiting for SST to complete.
      140425  9:16:52 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
      140425  9:16:52 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 3f601dc1-cc84-11e3-ae8c-a266f08afc7d
      140425  9:16:52 [Note] WSREP: STATE EXCHANGE: sent state msg: 3f601dc1-cc84-11e3-ae8c-a266f08afc7d
      140425  9:16:52 [Note] WSREP: STATE EXCHANGE: got state msg: 3f601dc1-cc84-11e3-ae8c-a266f08afc7d from 0 (redacted)
      140425  9:16:52 [Note] WSREP: Quorum results:
      	version    = 3,
      	component  = PRIMARY,
      	conf_id    = 0,
      	members    = 1/1 (joined/total),
      	act_id     = 2509203,
      	last_appl. = -1,
      	protocols  = 0/5/2 (gcs/repl/appl),
      	group UUID = 0c4d2c70-c73f-11e3-8556-d312fc5c48bb
      140425  9:16:52 [Note] WSREP: Flow-control interval: [16, 16]
      140425  9:16:52 [Note] WSREP: Restored state OPEN -> JOINED (2509203)
      140425  9:16:52 [Note] WSREP: Member 0 (redacted) synced with group.
      140425  9:16:52 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 2509203)
      140425  9:16:52 [Note] WSREP: New cluster view: global state: 0c4d2c70-c73f-11e3-8556-d312fc5c48bb:2509203, view# 1: Primary, number of nodes: 1, my index: 0, protocol version 2
      140425  9:16:52 [Note] WSREP: SST complete, seqno: 2509203
      140425  9:16:52 InnoDB: The InnoDB memory heap is disabled
      140425  9:16:52 InnoDB: Mutexes and rw_locks use GCC atomic builtins
      140425  9:16:52 InnoDB: Compressed tables use zlib 1.2.3
      140425  9:16:52 InnoDB: Using Linux native AIO
      140425  9:16:52 InnoDB: Initializing buffer pool, size = 32.0G
      140425  9:16:53 InnoDB: Completed initialization of buffer pool
      140425  9:16:53 InnoDB: highest supported file format is Barracuda.
      140425  9:16:55  InnoDB: Waiting for the background threads to start
      140425  9:16:56 Percona XtraDB (http://www.percona.com) 5.5.36-MariaDB-33.0 started; log sequence number 7825046431393
      140425  9:16:56 [Note] Plugin 'FEEDBACK' is disabled.
      140425  9:16:56 [Note] Server socket created on IP: '0.0.0.0'.
      140425  9:16:56 [Note] Event Scheduler: Loaded 1 event
      140425  9:16:56 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      140425  9:16:56 [Note] WSREP: REPL Protocols: 5 (3, 1)
      140425  9:16:56 [Note] WSREP: Assign initial position for certification: 2509203, protocol version: 3
      140425  9:16:56 [Note] WSREP: Service thread queue flushed.
      140425  9:16:56 [Note] WSREP: Synchronized with group, ready for connections
      140425  9:16:56 [Note] /usr/sbin/mysqld: ready for connections.
      Version: '5.5.36-MariaDB-wsrep-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server, wsrep_25.9.r3961
      140425  9:16:56 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      

      The servers largely handle our ColdFusion ressions which can involve large TEXT fields:

      cvardb1 mysql> show create table CDATA\G
      *************************** 1. row ***************************
             Table: CDATA
      Create Table: CREATE TABLE `CDATA` (
        `cfid` varchar(64) NOT NULL DEFAULT '',
        `app` varchar(64) NOT NULL DEFAULT '',
        `data` longtext NOT NULL,
        PRIMARY KEY (`cfid`)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8 KEY_BLOCK_SIZE=16
      1 row in set (0.00 sec)
      
      cvardb1 mysql> show create table CGLOBAL\G
      *************************** 1. row ***************************
             Table: CGLOBAL
      Create Table: CREATE TABLE `CGLOBAL` (
        `cfid` varchar(64) NOT NULL DEFAULT '',
        `data` longtext NOT NULL,
        `lvisit` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
        PRIMARY KEY (`cfid`),
        KEY `lvisit_idx` (`lvisit`)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8 KEY_BLOCK_SIZE=16
      1 row in set (0.00 sec)
      

      It also serves as a copy of some caching tables we have setup on this, and our other, Galera cluster running the same version. I suspect that it has to do with the large TEXT fields, but that's just hazarding a guess.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            txtim Tim Soderstrom added a comment -

            This node is the one that runs events, and seems to have just hung rather than full on crashed:

            140425 6:52:27 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000000 of size 134217728 bytes
            140425 6:53:02 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000000
            140425 7:22:34 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000001 of size 191230321 bytes
            140425 7:22:54 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000001
            140425 7:52:33 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000002 of size 189538622 bytes
            140425 7:52:55 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000002
            140425 8:22:33 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000003 of size 191615069 bytes
            140425 8:22:54 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000003
            140425 8:52:33 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000004 of size 182155485 bytes
            140425 9:02:49 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000005 of size 134217728 bytes
            140425 9:03:48 [ERROR] WSREP: Unsupported key prefix: : 71 (Protocol error)
            at galera/src/key_set.cpp:throw_bad_prefix():129
            140425 9:03:48 [Note] WSREP: applier thread exiting (code:7)
            140425 9:03:48 [ERROR] WSREP: node consistency compromised, aborting
            140425 9:03:48 [Note] WSREP: starting shutdown
            140425 9:03:48 [Note] /usr/sbin/mysqld: Normal shutdown

            140425 9:03:48 [Note] WSREP: Stop replication
            140425 9:03:48 [Note] WSREP: Closing send monitor...
            140425 9:03:48 [Note] WSREP: Closed send monitor.
            140425 9:03:48 [Note] WSREP: gcomm: terminating thread
            140425 9:03:48 [Note] WSREP: gcomm: joining thread
            140425 9:03:48 [Note] WSREP: gcomm: closing backend
            140425 9:03:48 [Note] WSREP: view(view_id(NON_PRIM,3ac797f0-c73f-11e3-a58e-37c0ddd67eee,11) memb

            { 5efac803-cae4-11e3-bb7b-b67ea5991bad,0 }

            joined {
            } left {
            } partitioned

            { 3ac797f0-c73f-11e3-a58e-37c0ddd67eee,0 62a2e8b4-c73f-11e3-bda3-daee80deac33,0 }

            )
            140425 9:03:48 [Note] WSREP: view((empty))
            140425 9:03:48 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
            140425 9:03:48 [Note] WSREP: gcomm: closed
            140425 9:03:48 [Note] WSREP: Flow-control interval: [16, 16]
            140425 9:03:48 [Note] WSREP: Received NON-PRIMARY.
            140425 9:03:48 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 2509311)
            140425 9:03:48 [Note] WSREP: Received self-leave message.
            140425 9:03:48 [Note] WSREP: Flow-control interval: [0, 0]
            140425 9:03:48 [Note] WSREP: Received SELF-LEAVE. Closing connection.
            140425 9:03:48 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 2509311)
            140425 9:03:48 [Note] WSREP: RECV thread exiting 0: Success
            140425 9:03:48 [Note] WSREP: recv_thread() joined.
            140425 9:03:48 [Note] WSREP: Closing replication queue.
            140425 9:03:48 [Note] WSREP: Closing slave action queue.
            140425 9:22:34 [ERROR] Event Scheduler: [root@localhost][clientvariables.purge_sessions] Deadlock found when trying to get lock; try restarting transaction
            140425 9:22:34 [Note] Event Scheduler: [root@localhost].[clientvariables.purge_sessions] event execution failed.

            We have a 3rd node but it just sits there and doesn't actually run any queries as we use a fail-over setup for the actual application queries. It reported the following:

            140425 6:52:27 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000003 of size 134217728 bytes
            140425 6:53:02 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000003
            140425 7:22:34 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000004 of size 191230321 bytes
            140425 7:22:54 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000004
            140425 7:52:33 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000005 of size 189538622 bytes
            140425 7:52:55 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000005
            140425 8:22:33 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000006 of size 191615069 bytes
            140425 8:22:54 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000006
            140425 8:52:33 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000007 of size 182155485 bytes
            140425 9:02:49 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000008 of size 134217728 bytes
            140425 9:03:48 [ERROR] mysqld got signal 6 ;
            This could be because you hit a bug. It is also possible that this binary
            or one of the libraries it was linked against is corrupt, improperly built,
            or misconfigured. This error can also be caused by malfunctioning hardware.

            To report this bug, see http://kb.askmonty.org/en/reporting-bugs

            We will try our best to scrape up some info that will hopefully help
            diagnose the problem, but since we have already crashed,
            something is definitely wrong and this may fail.

            Server version: 5.5.36-MariaDB-wsrep-log
            key_buffer_size=67108864
            read_buffer_size=1048576
            max_used_connections=11
            max_threads=302
            thread_count=9
            It is possible that mysqld could use up to
            key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 689546 K bytes of memory
            Hope that's ok; if not, decrease some variables in the equation.

            Thread pointer: 0x0x7f1696412000
            Attempting backtrace. You can use the following information to find out
            where mysqld died. If you see no messages after this, something went
            terribly wrong...
            140425 9:03:48 [Note] WSREP: declaring 3ac797f0-c73f-11e3-a58e-37c0ddd67eee stable
            140425 9:03:48 [Note] WSREP: (62a2e8b4-c73f-11e3-bda3-daee80deac33, 'tcp://0.0.0.0:4567') address 'tcp://172.29.33.167:4567' pointing to uuid 62a2e8b4-c73f-11e3-bda3-daee80deac33 is blacklisted, skipping
            stack_bottom = 0x7f167f040940 thread_stack 0x48000
            /usr/sbin/mysqld(my_print_stacktrace+0x2b)[0xa9127b]
            /usr/sbin/mysqld(handle_fatal_signal+0x398)[0x6ea128]
            /lib64/libpthread.so.0[0x3c5680f710]
            /lib64/libc.so.6(gsignal+0x35)[0x3c56432925]
            /lib64/libc.so.6(abort+0x175)[0x3c56434105]
            /usr/lib64/galera/libgalera_smm.so(_ZN6galera13Certification16purge_for_trx_v3EPNS_9TrxHandleE+0x265)[0x7f1f7b34e9a5]
            /usr/lib64/galera/libgalera_smm.so(_ZNK6galera13Certification15PurgeAndDiscardclERSt4pairIKlPNS_9TrxHandleEE+0xd2)[0x7f1f7b357112]
            /usr/lib64/galera/libgalera_smm.so(_ZN6galera13Certification16purge_trxs_upto_Elb+0x91)[0x7f1f7b352631]
            /usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM18process_commit_cutEll+0x88)[0x7f1f7b379e98]
            /usr/lib64/galera/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x315)[0x7f1f7b35b055]
            /usr/lib64/galera/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x63)[0x7f1f7b35b4e3]
            /usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x93)[0x7f1f7b377a23]
            /usr/lib64/galera/libgalera_smm.so(galera_recv+0x23)[0x7f1f7b391993]
            /usr/sbin/mysqld[0x69e83f]
            /usr/sbin/mysqld(start_wsrep_THD+0x385)[0x5270f5]
            /lib64/libpthread.so.0[0x3c568079d1]
            /lib64/libc.so.6(clone+0x6d)[0x3c564e8b6d]

            Trying to get some variables.
            Some pointers may be invalid and cause the dump to abort.
            Query (0x0): is an invalid pointer
            Connection ID (thread ID): 8
            Status: NOT_KILLED

            Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=off

            The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
            information that should help you find out what is causing the crash.
            140425 09:03:49 mysqld_safe Number of processes running now: 0
            140425 09:03:49 mysqld_safe WSREP: not restarting wsrep node automatically
            140425 09:03:49 mysqld_safe mysqld from pid file /var/lib/mysql/data/cvardb2.photobiz.com.pid ended

            Show
            txtim Tim Soderstrom added a comment - This node is the one that runs events, and seems to have just hung rather than full on crashed: 140425 6:52:27 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000000 of size 134217728 bytes 140425 6:53:02 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000000 140425 7:22:34 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000001 of size 191230321 bytes 140425 7:22:54 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000001 140425 7:52:33 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000002 of size 189538622 bytes 140425 7:52:55 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000002 140425 8:22:33 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000003 of size 191615069 bytes 140425 8:22:54 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000003 140425 8:52:33 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000004 of size 182155485 bytes 140425 9:02:49 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000005 of size 134217728 bytes 140425 9:03:48 [ERROR] WSREP: Unsupported key prefix: : 71 (Protocol error) at galera/src/key_set.cpp:throw_bad_prefix():129 140425 9:03:48 [Note] WSREP: applier thread exiting (code:7) 140425 9:03:48 [ERROR] WSREP: node consistency compromised, aborting 140425 9:03:48 [Note] WSREP: starting shutdown 140425 9:03:48 [Note] /usr/sbin/mysqld: Normal shutdown 140425 9:03:48 [Note] WSREP: Stop replication 140425 9:03:48 [Note] WSREP: Closing send monitor... 140425 9:03:48 [Note] WSREP: Closed send monitor. 140425 9:03:48 [Note] WSREP: gcomm: terminating thread 140425 9:03:48 [Note] WSREP: gcomm: joining thread 140425 9:03:48 [Note] WSREP: gcomm: closing backend 140425 9:03:48 [Note] WSREP: view(view_id(NON_PRIM,3ac797f0-c73f-11e3-a58e-37c0ddd67eee,11) memb { 5efac803-cae4-11e3-bb7b-b67ea5991bad,0 } joined { } left { } partitioned { 3ac797f0-c73f-11e3-a58e-37c0ddd67eee,0 62a2e8b4-c73f-11e3-bda3-daee80deac33,0 } ) 140425 9:03:48 [Note] WSREP: view((empty)) 140425 9:03:48 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1 140425 9:03:48 [Note] WSREP: gcomm: closed 140425 9:03:48 [Note] WSREP: Flow-control interval: [16, 16] 140425 9:03:48 [Note] WSREP: Received NON-PRIMARY. 140425 9:03:48 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 2509311) 140425 9:03:48 [Note] WSREP: Received self-leave message. 140425 9:03:48 [Note] WSREP: Flow-control interval: [0, 0] 140425 9:03:48 [Note] WSREP: Received SELF-LEAVE. Closing connection. 140425 9:03:48 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 2509311) 140425 9:03:48 [Note] WSREP: RECV thread exiting 0: Success 140425 9:03:48 [Note] WSREP: recv_thread() joined. 140425 9:03:48 [Note] WSREP: Closing replication queue. 140425 9:03:48 [Note] WSREP: Closing slave action queue. 140425 9:22:34 [ERROR] Event Scheduler: [root@localhost] [clientvariables.purge_sessions] Deadlock found when trying to get lock; try restarting transaction 140425 9:22:34 [Note] Event Scheduler: [root@localhost] . [clientvariables.purge_sessions] event execution failed. We have a 3rd node but it just sits there and doesn't actually run any queries as we use a fail-over setup for the actual application queries. It reported the following: 140425 6:52:27 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000003 of size 134217728 bytes 140425 6:53:02 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000003 140425 7:22:34 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000004 of size 191230321 bytes 140425 7:22:54 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000004 140425 7:52:33 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000005 of size 189538622 bytes 140425 7:52:55 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000005 140425 8:22:33 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000006 of size 191615069 bytes 140425 8:22:54 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000006 140425 8:52:33 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000007 of size 182155485 bytes 140425 9:02:49 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000008 of size 134217728 bytes 140425 9:03:48 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see http://kb.askmonty.org/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 5.5.36-MariaDB-wsrep-log key_buffer_size=67108864 read_buffer_size=1048576 max_used_connections=11 max_threads=302 thread_count=9 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 689546 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0x7f1696412000 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... 140425 9:03:48 [Note] WSREP: declaring 3ac797f0-c73f-11e3-a58e-37c0ddd67eee stable 140425 9:03:48 [Note] WSREP: (62a2e8b4-c73f-11e3-bda3-daee80deac33, 'tcp://0.0.0.0:4567') address 'tcp://172.29.33.167:4567' pointing to uuid 62a2e8b4-c73f-11e3-bda3-daee80deac33 is blacklisted, skipping stack_bottom = 0x7f167f040940 thread_stack 0x48000 /usr/sbin/mysqld(my_print_stacktrace+0x2b) [0xa9127b] /usr/sbin/mysqld(handle_fatal_signal+0x398) [0x6ea128] /lib64/libpthread.so.0 [0x3c5680f710] /lib64/libc.so.6(gsignal+0x35) [0x3c56432925] /lib64/libc.so.6(abort+0x175) [0x3c56434105] /usr/lib64/galera/libgalera_smm.so(_ZN6galera13Certification16purge_for_trx_v3EPNS_9TrxHandleE+0x265) [0x7f1f7b34e9a5] /usr/lib64/galera/libgalera_smm.so(_ZNK6galera13Certification15PurgeAndDiscardclERSt4pairIKlPNS_9TrxHandleEE+0xd2) [0x7f1f7b357112] /usr/lib64/galera/libgalera_smm.so(_ZN6galera13Certification16purge_trxs_upto_Elb+0x91) [0x7f1f7b352631] /usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM18process_commit_cutEll+0x88) [0x7f1f7b379e98] /usr/lib64/galera/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x315) [0x7f1f7b35b055] /usr/lib64/galera/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x63) [0x7f1f7b35b4e3] /usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x93) [0x7f1f7b377a23] /usr/lib64/galera/libgalera_smm.so(galera_recv+0x23) [0x7f1f7b391993] /usr/sbin/mysqld [0x69e83f] /usr/sbin/mysqld(start_wsrep_THD+0x385) [0x5270f5] /lib64/libpthread.so.0 [0x3c568079d1] /lib64/libc.so.6(clone+0x6d) [0x3c564e8b6d] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0x0): is an invalid pointer Connection ID (thread ID): 8 Status: NOT_KILLED Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=off The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 140425 09:03:49 mysqld_safe Number of processes running now: 0 140425 09:03:49 mysqld_safe WSREP: not restarting wsrep node automatically 140425 09:03:49 mysqld_safe mysqld from pid file /var/lib/mysql/data/cvardb2.photobiz.com.pid ended
            Hide
            nirbhay_c Nirbhay Choubey added a comment -

            Duplicate of MDEV#5720.

            Show
            nirbhay_c Nirbhay Choubey added a comment - Duplicate of MDEV#5720.

              People

              • Assignee:
                nirbhay_c Nirbhay Choubey
                Reporter:
                txtim Tim Soderstrom
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: