Details
-
Type:
Bug
-
Status: Closed
-
Priority:
Critical
-
Resolution: Duplicate
-
Affects Version/s: 5.5.35-galera
-
Fix Version/s: None
-
Component/s: None
-
Labels:None
Description
I have got a MariaDB cluster running on 2(3) servers:
node1: ubuntu precise with MariaDB 5.5.33 (from MariaDB-repository), 32GB RAM
node2: ubuntu precise with MariaDB 5.5.33 (from MariaDB-repository), 32GB RAM
node3: ubuntu lucid with percona garbd, 12GB RAM
Everything works fine for months since the version upgrade last weekend.
There is an long running upgrade on node1 (>10 minutes) which causes a shutdown of the complete cluster several times (note the protocol error on node2).
Error log from node1:
140224 2:46:34 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000012 of size 134217728 bytes
140224 2:46:46 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000013 of size 150834130 bytes
140224 2:46:52 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000012
140224 2:47:18 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000014 of size 142653150 bytes
140224 2:47:24 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 5.5.35-MariaDB-1~precise-wsrep-log
key_buffer_size=1572864000
read_buffer_size=1048576
max_used_connections=19
max_threads=92
thread_count=3
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1766640 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0x7f74c9012000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
140224 2:47:24 [Note] WSREP: declaring 07637906-9ad2-11e3-887c-63c08b41925a stable
140224 2:47:24 [Note] WSREP: (1001b738-986c-11e3-998a-1f3ad494de5f, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://node2:4567
140224 2:47:24 [Note] WSREP: Node 07637906-9ad2-11e3-887c-63c08b41925a state prim
140224 2:47:24 [Note] WSREP: view(view_id(PRIM,07637906-9ad2-11e3-887c-63c08b41925a,8) memb
} left {
} partitioned { 718d462c-986c-11e3-a9e7-3bef67f69149,0 })
140224 2:47:24 [Note] WSREP: forgetting 718d462c-986c-11e3-a9e7-3bef67f69149 (tcp://node2:4567)
140224 2:47:24 [Note] WSREP: (1001b738-986c-11e3-998a-1f3ad494de5f, 'tcp://0.0.0.0:4567') turning message relay requesting off
140224 2:47:24 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
140224 2:47:24 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
140224 2:47:24 [Note] WSREP: STATE EXCHANGE: sent state msg: 9b9de178-9cf5-11e3-9e4a-337e6e8b2b92
140224 2:47:24 [Note] WSREP: STATE EXCHANGE: got state msg: 9b9de178-9cf5-11e3-9e4a-337e6e8b2b92 from 0 (garb)
140224 2:47:24 [Note] WSREP: STATE EXCHANGE: got state msg: 9b9de178-9cf5-11e3-9e4a-337e6e8b2b92 from 1 (node1)
140224 2:47:24 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 7,
members = 2/2 (joined/total),
act_id = 1866755,
last_appl. = 1866752,
protocols = 0/5/2 (gcs/repl/appl),
group UUID = 6249921b-68a1-11e3-b7bc-1fd36434c4bc
140224 2:47:24 [Note] WSREP: Flow-control interval: [23, 23]
stack_bottom = 0x7f74a9f45a20 thread_stack 0x30000
140224 2:47:29 [Note] WSREP: cleaning up 718d462c-986c-11e3-a9e7-3bef67f69149 (tcp://node2:4567)
140224 2:47:33 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000015 of size 141985694 bytes
/usr/sbin/mysqld(my_print_stacktrace+0x2b)[0x7f7a97686b0b]
/usr/sbin/mysqld(handle_fatal_signal+0x471)[0x7f7a972a2e41]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f7a95aedcb0]
/usr/lib/galera/libgalera_smm.so(_ZN6galera13Certification16purge_for_trx_v3EPNS_9TrxHandleE+0x9b)[0x7f74b3923d9b]
/usr/lib/galera/libgalera_smm.so(_ZNK6galera13Certification15PurgeAndDiscardclERSt4pairIKlPNS_9TrxHandleEE+0xc0)[0x7f74b392bcb0]
/usr/lib/galera/libgalera_smm.so(ZSt8for_eachISt17_Rb_tree_iteratorISt4pairIKlPN6galera9TrxHandleEEENS3_13Certification15PurgeAndDiscardEET0_T_SB_SA+0x2c)[
0x7f74b392bedc]
/usr/lib/galera/libgalera_smm.so(_ZN6galera13Certification16purge_trxs_upto_Elb+0x72)[0x7f74b3925722]
/usr/lib/galera/libgalera_smm.so(_ZN6galera13Certification15purge_trxs_uptoElb+0x4b)[0x7f74b39532bb]
/usr/lib/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM18process_commit_cutEll+0x94)[0x7f74b394d864]
/usr/lib/galera/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x43d)[0x7f74b39318ad]
/usr/lib/galera/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x5b)[0x7f74b39324eb]
/usr/lib/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x63)[0x7f74b3951d03]
/usr/lib/galera/libgalera_smm.so(galera_recv+0x23)[0x7f74b3961eb3]
/usr/sbin/mysqld(+0x4a25d1)[0x7f7a972585d1]
/usr/sbin/mysqld(start_wsrep_THD+0x409)[0x7f7a970bf3f9]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f7a95ae5e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f7a952163fd]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): is an invalid pointer
Connection ID (thread ID): 2
Status: NOT_KILLED
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_conditio
n_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,pa
rtial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_
cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=off
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
140224 02:47:48 mysqld_safe Number of processes running now: 0
140224 02:47:48 mysqld_safe WSREP: not restarting wsrep node automatically
140224 02:47:48 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
Error log from node2:
140224 2:46:34 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000012 of size 134217728 bytes
140224 2:46:44 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000013 of size 150834130 bytes
140224 2:46:52 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000012
140224 2:47:16 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000014 of size 142653150 bytes
140224 2:47:24 [ERROR] WSREP: Unsupported key prefix: ^C: 71 (Protocol error)
at galera/src/key_set.cpp:throw_bad_prefix():129
140224 2:47:24 [Note] WSREP: applier thread exiting (code:7)
140224 2:47:24 [ERROR] WSREP: node consistency compromised, aborting
140224 2:47:24 [Note] WSREP: starting shutdown
140224 2:47:24 [Note] /usr/sbin/mysqld: Normal shutdown
140224 2:47:24 [Note] WSREP: Stop replication
140224 2:47:24 [Note] WSREP: Closing send monitor...
140224 2:47:24 [Note] WSREP: Closed send monitor.
140224 2:47:24 [Note] WSREP: gcomm: terminating thread
140224 2:47:24 [Note] WSREP: gcomm: joining thread
140224 2:47:24 [Note] WSREP: gcomm: closing backend
140224 2:47:24 [Note] WSREP: view(view_id(NON_PRIM,07637906-9ad2-11e3-887c-63c08b41925a,7) memb { 718d462c-986c-11e3-a9e7-3bef67f69149,0 } joined {
} left {
} partitioned { 07637906-9ad2-11e3-887c-63c08b41925a,0 1001b738-986c-11e3-998a-1f3ad494de5f,0 }
)
140224 2:47:24 [Note] WSREP: view((empty))
140224 2:47:24 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
140224 2:47:24 [Note] WSREP: gcomm: closed
140224 2:47:24 [Note] WSREP: Flow-control interval: [16, 16]
140224 2:47:24 [Note] WSREP: Received NON-PRIMARY.
140224 2:47:24 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 1866755)
140224 2:47:24 [Note] WSREP: Received self-leave message.
140224 2:47:24 [Note] WSREP: Flow-control interval: [0, 0]
140224 2:47:24 [Note] WSREP: Received SELF-LEAVE. Closing connection.
140224 2:47:24 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 1866755)
140224 2:47:24 [Note] WSREP: RECV thread exiting 0: Success
140224 2:47:24 [Note] WSREP: recv_thread() joined.
140224 2:47:24 [Note] WSREP: Closing replication queue.
140224 2:47:24 [Note] WSREP: Closing slave action queue.
140224 2:47:26 [Note] WSREP: rollbacker thread exiting
140224 2:47:26 [Note] Event Scheduler: Purging the queue. 0 events
140224 2:47:26 [Note] WSREP: dtor state: SYNCED
140224 2:47:26 [Note] WSREP: Closing send monitor...
140224 2:47:26 [Note] WSREP: Closed send monitor.
140224 2:47:26 [Note] WSREP: mon: entered 238125 oooe fraction 0 oool fraction 4.19948e-06
140224 2:47:26 [Note] WSREP: mon: entered 238125 oooe fraction 0.0123423 oool fraction 4.19948e-06
140224 2:47:26 [Note] WSREP: mon: entered 241668 oooe fraction 0 oool fraction 4.13791e-06
140224 2:47:26 [Note] WSREP: cert index usage at exit 0
140224 2:47:26 [Note] WSREP: cert trx map usage at exit 5
140224 2:47:26 [Note] WSREP: deps set usage at exit 0
140224 2:47:26 [Note] WSREP: avg deps dist 4
terminate called after throwing an instance of 'gu::Exception'
what(): Unsupported key prefix: ^C: 71 (Protocol error)
at galera/src/key_set.cpp:throw_bad_prefix():129
140224 2:47:26 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 5.5.35-MariaDB-1~precise-wsrep-log
key_buffer_size=1572864000
read_buffer_size=1048576
max_used_connections=25
max_threads=92
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1766640 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2b)[0x7fb4e0c27b0b]
/usr/sbin/mysqld(handle_fatal_signal+0x471)[0x7fb4e0843e41]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7fb4df08ecb0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7fb4de6f9425]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7fb4de6fcb8b]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(ZN9gnu_cxx27_verbose_terminate_handlerEv+0x11d)[0x7fb4dee3669d]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5846)[0x7fb4dee34846]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5873)[0x7fb4dee34873]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb596e)[0x7fb4dee3496e]
/usr/lib/galera/libgalera_smm.so(_ZN2gu10ThrowErrorD1Ev+0xf8)[0x7faefd027fa8]
/usr/lib/galera/libgalera_smm.so(_ZN6galera6KeySet7KeyPart16throw_bad_prefixEh+0x6a)[0x7faefd1118fa]
/usr/lib/galera/libgalera_smm.so(_ZN6galera13Certification16purge_for_trx_v3EPNS_9TrxHandleE+0x2fa)[0x7faefd122ffa]
/usr/lib/galera/libgalera_smm.so(_ZNK6galera13Certification15PurgeAndDiscardclERSt4pairIKlPNS_9TrxHandleEE+0xc0)[0x7faefd12acb0]
/usr/lib/galera/libgalera_smm.so(ZSt8for_eachISt17_Rb_tree_iteratorISt4pairIKlPN6galera9TrxHandleEEENS3_13Certification15PurgeAndDiscardEET0_T_SB_SA+0x2c)[
0x7faefd12aedc]
/usr/lib/galera/libgalera_smm.so(_ZN6galera13CertificationD2Ev+0x53)[0x7faefd124953]
/usr/lib/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMMD2Ev+0xbc)[0x7faefd14c07c]
/usr/lib/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMMD0Ev+0x9)[0x7faefd14c5c9]
/usr/lib/galera/libgalera_smm.so(galera_tear_down+0x16)[0x7faefd1602e6]
/usr/sbin/mysqld(wsrep_unload+0x1f)[0x7fb4e0c8940f]
/usr/sbin/mysqld(_Z12wsrep_deinitv+0x10)[0x7fb4e07edb70]
/usr/sbin/mysqld(+0x30b14c)[0x7fb4e066214c]
/usr/sbin/mysqld(kill_server_thread+0xe)[0x7fb4e06622be]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7fb4df086e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb4de7b73fd]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
140224 02:47:27 mysqld_safe Number of processes running now: 0
140224 02:47:27 mysqld_safe WSREP: not restarting wsrep node automatically
140224 02:47:27 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
Gliffy Diagrams
Attachments
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
Looks similar to : https://bugs.launchpad.net/galera/+bug/1267507