MariaDB galera got signal 11 when altering table add unique index

Description

Updated at 2016/1/15
From the stack trace and the code, it seems wsrep thread is trying to abort a QUERY_COMMITING?
I'm not sure what does that mean, from the literal, wsrep is aborting a committing tran? So what's happened?
Does the alter table fail cause this?(current node fail, but another node unique index created), Or
alter table add unique index has been succefully commited but not flushed to disk yet, then wsrep
failed to abort but hit signal 11 that caused the current node remains unchange?

wsrep_grant_mdl_exception()
... ...
else if (granted_thd->wsrep_query_state == QUERY_COMMITTING)
{
WSREP_DEBUG("MDL granted, but committing thd abort scheduled");
ticket->wsrep_report(wsrep_debug);
mysql_mutex_unlock(&granted_thd->LOCK_wsrep_thd);
wsrep_abort_thd((void*)request_thd, (void*)granted_thd, 1);
ret = FALSE;
ha_abort_transaction()
... ...
THD_TRANS *trans= &victim_thd->transaction.all;
Ha_trx_info *ha_info= trans->ha_list, *ha_info_next;

for (; ha_info; ha_info= ha_info_next)
{
handlerton *hton= ha_info->ht();
if (!hton->abort_transaction) <--- signal 11 should occur here

I assume we are trying to abort the wsrep thread for this alter table command replication.
It seems we don't regist the corrent storage engine info info the transaction
==============================================================================
In an online enviroment, I need to modify the index type to be unique, so I fire an alter
command
ALTER TABLE `ht_user_loginfo`
DROP INDEX `Index_uniqueid`,
ADD UNIQUE INDEX `Index_uniqueid` (`uniqueid`)

to my mariadb galera cluster 10.1.9 with 3 nodes (two data nodes and one arbitrator node),
unfortunately, the mariadb server crashed, from the errorlog,I can see
it htis signal 11,
Then I check another data node of cluster, I found the unique index is created successfully,
while current node iis failed.

The status showed below is captured after I recovered the cluster from the crash.

MariaDB [(none)]> show status like 'wsrep%';
-----------------------------------------------------------------------------------+

Variable_name

Value

-----------------------------------------------------------------------------------+

wsrep_apply_oooe

0.695295

wsrep_apply_oool

0.001338

wsrep_apply_window

7.136138

wsrep_causal_reads

0

wsrep_cert_deps_distance

116.807631

wsrep_cert_index_size

85

wsrep_cert_interval

8.518146

wsrep_cluster_conf_id

3

wsrep_cluster_size

3

wsrep_cluster_state_uuid

41c2f17c-b8dd-11e5-a2f0-d29d6194f4fd

wsrep_cluster_status

Primary

wsrep_commit_oooe

0.000000

wsrep_commit_oool

0.000000

wsrep_commit_window

6.429097

wsrep_connected

ON

wsrep_evs_delayed

 

wsrep_evs_evict_list

 

wsrep_evs_repl_latency

0.000330851/0.000559662/0.00195228/0.000179164/1643

wsrep_evs_state

OPERATIONAL

wsrep_flow_control_paused

0.009547

wsrep_flow_control_paused_ns

1819437628239

wsrep_flow_control_recv

534776

wsrep_flow_control_sent

0

wsrep_gcomm_uuid

c6e2e2f9-ad43-11e5-874e-e29556fbb355

wsrep_incoming_addresses

192.168.6.101:3306,192.168.6.21:3306,

wsrep_last_committed

63491248

wsrep_local_bf_aborts

0

wsrep_local_cached_downto

63263077

wsrep_local_cert_failures

13

wsrep_local_commits

63491216

wsrep_local_index

1

wsrep_local_recv_queue

0

wsrep_local_recv_queue_avg

0.005204

wsrep_local_recv_queue_max

4

wsrep_local_recv_queue_min

0

wsrep_local_replays

0

wsrep_local_send_queue

0

wsrep_local_send_queue_avg

1.841741

wsrep_local_send_queue_max

62

wsrep_local_send_queue_min

0

wsrep_local_state

4

wsrep_local_state_comment

Synced

wsrep_local_state_uuid

41c2f17c-b8dd-11e5-a2f0-d29d6194f4fd

wsrep_protocol_version

7

wsrep_provider_name

Galera

wsrep_provider_vendor

Codership Oy <info@codership.com>

wsrep_provider_version

25.3.9(r3385)

wsrep_ready

ON

wsrep_received

620329

wsrep_received_bytes

4963695

wsrep_repl_data_bytes

31427131711

wsrep_repl_keys

272911424

wsrep_repl_keys_bytes

3643668604

wsrep_repl_other_bytes

0

wsrep_replicated

63491247

wsrep_replicated_bytes

39134240123

wsrep_thread_count

2

-----------------------------------------------------------------------------------+

below is the errorlog

2016-01-12 9:35:24 140579879008000 [Warning] IP address '168.8.44.132' could not be resolved: Temporary failure in name resolution
2016-01-12 11:29:15 140579879008000 [Note] InnoDB: Online DDL : Start
2016-01-12 11:29:15 140579879008000 [Note] InnoDB: Online DDL : Start reading clustered index of the table and create temporary files
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : End of reading clustered index of the table and create temporary files
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : Start merge-sorting index `Index_uniqueid`--temporary-- (1 / 1), estimated cost : 20.0000
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting has estimated 32 runs
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting current run 1 estimated 32 runs
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting current run 2 estimated 16 runs
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting current run 3 estimated 8 runs
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting current run 4 estimated 4 runs
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : merge-sorting current run 5 estimated 2 runs
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : End of merge-sorting index `Index_uniqueid`--temporary-- (1 / 1)
2016-01-12 11:29:16 140579879008000 [Note] InnoDB: Online DDL : Start building index `Index_uniqueid`--temporary-- (1 / 1), estimated cost : 30.0000
2016-01-12 11:29:17 140579879008000 [Note] InnoDB: Online DDL : End of building index `Index_uniqueid`--temporary-- (1 / 1)
2016-01-12 11:29:17 140579879008000 [Note] InnoDB: Online DDL : Start applying row log
2016-01-12 11:29:17 140579879008000 [Note] InnoDB: Online DDL : End of applying row log
2016-01-12 11:29:17 140579879008000 [Note] InnoDB: Online DDL : Completed
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.
2016-01-12 11:29:17 140579879008000 [Warning] WSREP: Cannot abort transaction.

160112 11:29:17 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see http://kb.askmonty.org/en/reporting-bugs

Server version: 10.1.9-MariaDB-log
key_buffer_size=67108864
read_buffer_size=1048576
max_used_connections=171
max_threads=3002
thread_count=152
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 27793254 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x7fdb78869008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fdb4dc02dd8 thread_stack 0x48000
mysys/stacktrace.c:247(my_print_stacktrace)[0xbf822e]
sql/signal_handler.cc:160(handle_fatal_signal)[0x760e24]
/lib64/libpthread.so.0[0x35c500f710]
sql/handler.cc:6084(ha_abort_transaction(THD*, THD*, char))[0x7637ca]
sql/wsrep_thd.cc:603(wsrep_abort_thd(void*, void*, char))[0x6ff448]
sql/wsrep_mysqld.cc:1732(wsrep_grant_mdl_exception(MDL_context*, MDL_ticket*, MDL_key const*))[0x6f2423]
sql/mdl.cc:1173(MDL_lock::Ticket_list::add_ticket(MDL_ticket*))[0x6928cf]
psi/mysql_thread.h:662(inline_mysql_mutex_lock)[0x695580]
sql/mdl.cc:2241(MDL_context::upgrade_shared_lock(MDL_ticket*, enum_mdl_type, double))[0x695c94]
sql/sql_base.cc:1882(wait_while_table_is_used)[0x5612c7]
sql/sql_table.cc:7068(mysql_inplace_alter_table)[0x637846]
sql/sql_table.cc:8874(mysql_alter_table(THD*, char*, char*, HA_CREATE_INFO*, TABLE_LIST*, Alter_info*, unsigned int, st_order*, bool))[0x63fb0a]
sql/sql_alter.cc:327(Sql_cmd_alter_table::execute(THD*))[0x68dff3]
sql/sql_parse.cc:5677(mysql_execute_command(THD*))[0x5afa9c]
sql/sql_parse.cc:7302(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x5b7ed9]
sql/sql_parse.cc:7126(wsrep_mysql_parse)[0x5b7f68]
sql/sql_parse.cc:1489(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x5ba035]
sql/sql_parse.cc:1111(do_command(THD*))[0x5bb2f7]
sql/sql_connect.cc:1349(do_handle_one_connection(THD*))[0x689883]
sql/sql_connect.cc:1263(handle_one_connection)[0x689aa2]
perfschema/pfs.cc:1863(pfs_spawn_thread)[0xafa679]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7fdb690c1020): is an invalid pointer
Connection ID (thread ID): 5105
Status: NOT_KILLED

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
160112 11:29:19 mysqld_safe Number of processes running now: 0
160112 11:29:19 mysqld_safe WSREP: not restarting wsrep node automatically
160112 11:29:19 mysqld_safe mysqld from pid file /mysqldata/3306/mysqld.pid ended

Environment

Server version: 10.1.9-MariaDB-log MariaDB Server
CentOS 6.6
2.6.32-504.el6.x86_64 #1 SMP Wed Oct 15 04:27:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

3 nodes of mariadb galera cluster

Assignee

Nirbhay Choubey

Reporter

Chengrui Wang

Labels

Components

Affects versions

Priority

Critical
Configure