Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-5720

MariaDB custer crashes repeatedly after upgrading to 5.5.35

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 5.5.35-galera
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I have got a MariaDB cluster running on 2(3) servers:

      node1: ubuntu precise with MariaDB 5.5.33 (from MariaDB-repository), 32GB RAM
      node2: ubuntu precise with MariaDB 5.5.33 (from MariaDB-repository), 32GB RAM
      node3: ubuntu lucid with percona garbd, 12GB RAM

      Everything works fine for months since the version upgrade last weekend.

      There is an long running upgrade on node1 (>10 minutes) which causes a shutdown of the complete cluster several times (note the protocol error on node2).

      Error log from node1:
      140224 2:46:34 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000012 of size 134217728 bytes
      140224 2:46:46 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000013 of size 150834130 bytes
      140224 2:46:52 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000012
      140224 2:47:18 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000014 of size 142653150 bytes
      140224 2:47:24 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.

      To report this bug, see http://kb.askmonty.org/en/reporting-bugs

      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.

      Server version: 5.5.35-MariaDB-1~precise-wsrep-log
      key_buffer_size=1572864000
      read_buffer_size=1048576
      max_used_connections=19
      max_threads=92
      thread_count=3
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1766640 K bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.

      Thread pointer: 0x0x7f74c9012000
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      140224 2:47:24 [Note] WSREP: declaring 07637906-9ad2-11e3-887c-63c08b41925a stable
      140224 2:47:24 [Note] WSREP: (1001b738-986c-11e3-998a-1f3ad494de5f, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://node2:4567
      140224 2:47:24 [Note] WSREP: Node 07637906-9ad2-11e3-887c-63c08b41925a state prim
      140224 2:47:24 [Note] WSREP: view(view_id(PRIM,07637906-9ad2-11e3-887c-63c08b41925a,8) memb

      { 07637906-9ad2-11e3-887c-63c08b41925a,0 1001b738-986c-11e3-998a-1f3ad494de5f,0 } joined {
      } left {
      } partitioned { 718d462c-986c-11e3-a9e7-3bef67f69149,0 })
      140224 2:47:24 [Note] WSREP: forgetting 718d462c-986c-11e3-a9e7-3bef67f69149 (tcp://node2:4567)
      140224 2:47:24 [Note] WSREP: (1001b738-986c-11e3-998a-1f3ad494de5f, 'tcp://0.0.0.0:4567') turning message relay requesting off
      140224 2:47:24 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
      140224 2:47:24 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
      140224 2:47:24 [Note] WSREP: STATE EXCHANGE: sent state msg: 9b9de178-9cf5-11e3-9e4a-337e6e8b2b92

      140224 2:47:24 [Note] WSREP: STATE EXCHANGE: got state msg: 9b9de178-9cf5-11e3-9e4a-337e6e8b2b92 from 0 (garb)
      140224 2:47:24 [Note] WSREP: STATE EXCHANGE: got state msg: 9b9de178-9cf5-11e3-9e4a-337e6e8b2b92 from 1 (node1)
      140224 2:47:24 [Note] WSREP: Quorum results:
      version = 3,
      component = PRIMARY,
      conf_id = 7,
      members = 2/2 (joined/total),
      act_id = 1866755,
      last_appl. = 1866752,
      protocols = 0/5/2 (gcs/repl/appl),
      group UUID = 6249921b-68a1-11e3-b7bc-1fd36434c4bc
      140224 2:47:24 [Note] WSREP: Flow-control interval: [23, 23]
      stack_bottom = 0x7f74a9f45a20 thread_stack 0x30000
      140224 2:47:29 [Note] WSREP: cleaning up 718d462c-986c-11e3-a9e7-3bef67f69149 (tcp://node2:4567)
      140224 2:47:33 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000015 of size 141985694 bytes
      /usr/sbin/mysqld(my_print_stacktrace+0x2b)[0x7f7a97686b0b]
      /usr/sbin/mysqld(handle_fatal_signal+0x471)[0x7f7a972a2e41]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f7a95aedcb0]
      /usr/lib/galera/libgalera_smm.so(_ZN6galera13Certification16purge_for_trx_v3EPNS_9TrxHandleE+0x9b)[0x7f74b3923d9b]
      /usr/lib/galera/libgalera_smm.so(_ZNK6galera13Certification15PurgeAndDiscardclERSt4pairIKlPNS_9TrxHandleEE+0xc0)[0x7f74b392bcb0]
      /usr/lib/galera/libgalera_smm.so(ZSt8for_eachISt17_Rb_tree_iteratorISt4pairIKlPN6galera9TrxHandleEEENS3_13Certification15PurgeAndDiscardEET0_T_SB_SA+0x2c)[
      0x7f74b392bedc]
      /usr/lib/galera/libgalera_smm.so(_ZN6galera13Certification16purge_trxs_upto_Elb+0x72)[0x7f74b3925722]
      /usr/lib/galera/libgalera_smm.so(_ZN6galera13Certification15purge_trxs_uptoElb+0x4b)[0x7f74b39532bb]
      /usr/lib/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM18process_commit_cutEll+0x94)[0x7f74b394d864]
      /usr/lib/galera/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x43d)[0x7f74b39318ad]
      /usr/lib/galera/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x5b)[0x7f74b39324eb]
      /usr/lib/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x63)[0x7f74b3951d03]
      /usr/lib/galera/libgalera_smm.so(galera_recv+0x23)[0x7f74b3961eb3]
      /usr/sbin/mysqld(+0x4a25d1)[0x7f7a972585d1]
      /usr/sbin/mysqld(start_wsrep_THD+0x409)[0x7f7a970bf3f9]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f7a95ae5e9a]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f7a952163fd]

      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x0): is an invalid pointer
      Connection ID (thread ID): 2
      Status: NOT_KILLED

      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_conditio
      n_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,pa
      rtial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_
      cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=off

      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.
      140224 02:47:48 mysqld_safe Number of processes running now: 0
      140224 02:47:48 mysqld_safe WSREP: not restarting wsrep node automatically
      140224 02:47:48 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended






      Error log from node2:
      140224 2:46:34 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000012 of size 134217728 bytes
      140224 2:46:44 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000013 of size 150834130 bytes
      140224 2:46:52 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000012
      140224 2:47:16 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000014 of size 142653150 bytes
      140224 2:47:24 [ERROR] WSREP: Unsupported key prefix: ^C: 71 (Protocol error)
      at galera/src/key_set.cpp:throw_bad_prefix():129
      140224 2:47:24 [Note] WSREP: applier thread exiting (code:7)
      140224 2:47:24 [ERROR] WSREP: node consistency compromised, aborting
      140224 2:47:24 [Note] WSREP: starting shutdown
      140224 2:47:24 [Note] /usr/sbin/mysqld: Normal shutdown

      140224 2:47:24 [Note] WSREP: Stop replication
      140224 2:47:24 [Note] WSREP: Closing send monitor...
      140224 2:47:24 [Note] WSREP: Closed send monitor.
      140224 2:47:24 [Note] WSREP: gcomm: terminating thread
      140224 2:47:24 [Note] WSREP: gcomm: joining thread
      140224 2:47:24 [Note] WSREP: gcomm: closing backend
      140224 2:47:24 [Note] WSREP: view(view_id(NON_PRIM,07637906-9ad2-11e3-887c-63c08b41925a,7) memb { 718d462c-986c-11e3-a9e7-3bef67f69149,0 } joined {
      } left {
      } partitioned { 07637906-9ad2-11e3-887c-63c08b41925a,0 1001b738-986c-11e3-998a-1f3ad494de5f,0 }

      )
      140224 2:47:24 [Note] WSREP: view((empty))
      140224 2:47:24 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      140224 2:47:24 [Note] WSREP: gcomm: closed
      140224 2:47:24 [Note] WSREP: Flow-control interval: [16, 16]
      140224 2:47:24 [Note] WSREP: Received NON-PRIMARY.
      140224 2:47:24 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 1866755)
      140224 2:47:24 [Note] WSREP: Received self-leave message.
      140224 2:47:24 [Note] WSREP: Flow-control interval: [0, 0]
      140224 2:47:24 [Note] WSREP: Received SELF-LEAVE. Closing connection.
      140224 2:47:24 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 1866755)
      140224 2:47:24 [Note] WSREP: RECV thread exiting 0: Success
      140224 2:47:24 [Note] WSREP: recv_thread() joined.
      140224 2:47:24 [Note] WSREP: Closing replication queue.
      140224 2:47:24 [Note] WSREP: Closing slave action queue.
      140224 2:47:26 [Note] WSREP: rollbacker thread exiting
      140224 2:47:26 [Note] Event Scheduler: Purging the queue. 0 events
      140224 2:47:26 [Note] WSREP: dtor state: SYNCED
      140224 2:47:26 [Note] WSREP: Closing send monitor...
      140224 2:47:26 [Note] WSREP: Closed send monitor.
      140224 2:47:26 [Note] WSREP: mon: entered 238125 oooe fraction 0 oool fraction 4.19948e-06
      140224 2:47:26 [Note] WSREP: mon: entered 238125 oooe fraction 0.0123423 oool fraction 4.19948e-06
      140224 2:47:26 [Note] WSREP: mon: entered 241668 oooe fraction 0 oool fraction 4.13791e-06
      140224 2:47:26 [Note] WSREP: cert index usage at exit 0
      140224 2:47:26 [Note] WSREP: cert trx map usage at exit 5
      140224 2:47:26 [Note] WSREP: deps set usage at exit 0
      140224 2:47:26 [Note] WSREP: avg deps dist 4
      terminate called after throwing an instance of 'gu::Exception'
      what(): Unsupported key prefix: ^C: 71 (Protocol error)
      at galera/src/key_set.cpp:throw_bad_prefix():129
      140224 2:47:26 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.

      To report this bug, see http://kb.askmonty.org/en/reporting-bugs

      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.

      Server version: 5.5.35-MariaDB-1~precise-wsrep-log
      key_buffer_size=1572864000
      read_buffer_size=1048576
      max_used_connections=25
      max_threads=92
      thread_count=0
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1766640 K bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.

      Thread pointer: 0x0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...

      stack_bottom = 0x0 thread_stack 0x30000
      /usr/sbin/mysqld(my_print_stacktrace+0x2b)[0x7fb4e0c27b0b]
      /usr/sbin/mysqld(handle_fatal_signal+0x471)[0x7fb4e0843e41]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7fb4df08ecb0]
      /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7fb4de6f9425]
      /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7fb4de6fcb8b]
      /usr/lib/x86_64-linux-gnu/libstdc++.so.6(ZN9gnu_cxx27_verbose_terminate_handlerEv+0x11d)[0x7fb4dee3669d]
      /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5846)[0x7fb4dee34846]
      /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5873)[0x7fb4dee34873]
      /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb596e)[0x7fb4dee3496e]
      /usr/lib/galera/libgalera_smm.so(_ZN2gu10ThrowErrorD1Ev+0xf8)[0x7faefd027fa8]
      /usr/lib/galera/libgalera_smm.so(_ZN6galera6KeySet7KeyPart16throw_bad_prefixEh+0x6a)[0x7faefd1118fa]
      /usr/lib/galera/libgalera_smm.so(_ZN6galera13Certification16purge_for_trx_v3EPNS_9TrxHandleE+0x2fa)[0x7faefd122ffa]
      /usr/lib/galera/libgalera_smm.so(_ZNK6galera13Certification15PurgeAndDiscardclERSt4pairIKlPNS_9TrxHandleEE+0xc0)[0x7faefd12acb0]
      /usr/lib/galera/libgalera_smm.so(ZSt8for_eachISt17_Rb_tree_iteratorISt4pairIKlPN6galera9TrxHandleEEENS3_13Certification15PurgeAndDiscardEET0_T_SB_SA+0x2c)[
      0x7faefd12aedc]
      /usr/lib/galera/libgalera_smm.so(_ZN6galera13CertificationD2Ev+0x53)[0x7faefd124953]
      /usr/lib/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMMD2Ev+0xbc)[0x7faefd14c07c]
      /usr/lib/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMMD0Ev+0x9)[0x7faefd14c5c9]
      /usr/lib/galera/libgalera_smm.so(galera_tear_down+0x16)[0x7faefd1602e6]
      /usr/sbin/mysqld(wsrep_unload+0x1f)[0x7fb4e0c8940f]
      /usr/sbin/mysqld(_Z12wsrep_deinitv+0x10)[0x7fb4e07edb70]
      /usr/sbin/mysqld(+0x30b14c)[0x7fb4e066214c]
      /usr/sbin/mysqld(kill_server_thread+0xe)[0x7fb4e06622be]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7fb4df086e9a]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb4de7b73fd]
      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.
      140224 02:47:27 mysqld_safe Number of processes running now: 0
      140224 02:47:27 mysqld_safe WSREP: not restarting wsrep node automatically
      140224 02:47:27 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            nirbhay_c Nirbhay Choubey added a comment -
            Show
            nirbhay_c Nirbhay Choubey added a comment - Looks similar to : https://bugs.launchpad.net/galera/+bug/1267507
            Hide
            marcel.normann Marcel Normann added a comment -

            You are right, definitely a duplicate!

            Show
            marcel.normann Marcel Normann added a comment - You are right, definitely a duplicate!
            Hide
            nirbhay_c Nirbhay Choubey added a comment -

            Fixed in Galera 25.3.5 (lp:1267507)

            Show
            nirbhay_c Nirbhay Choubey added a comment - Fixed in Galera 25.3.5 (lp:1267507)

              People

              • Assignee:
                nirbhay_c Nirbhay Choubey
                Reporter:
                marcel.normann Marcel Normann
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: