Potential lock_sys->mutex deadlock

Description

Some users have noticed that under certain workloads, MariaDB Galera Cluster 10.0.19 consistently hangs. The only way to unhang the server at that point is to kill the process.

I'm going to ask the users to confirm that this issue is still present on MariaDB Galera Cluster 10.0.21 as well.

The users' logs seem to suggest there might be a deadlock related to lock_sys->mutex. An excerpt of the log is here:

This might be related to these PXC bugs:

https://bugs.launchpad.net/percona-xtradb-cluster/5.6/+bug/1233301

https://bugs.launchpad.net/percona-server/+bug/1233690

https://bugs.launchpad.net/percona-xtradb-cluster/5.6/+bug/1231518

The users' stack trace does show one thread grabbing the mutex in innobase_kill_connection, which is where PXC's bug report mention the issue happening. The user's stack trace:

This could also be related to this Codership bug:

https://github.com/codership/mysql-wsrep/issues/184

However, I'm not sure if that bug can lead to deadlocks specific to lock_sys->mutex.

The user's stack trace shows one thread that is similar to the thread 41 in that issue. The user's stack trace:

Environment

None

Assignee

Jan Lindström

Reporter

Geoff Montee

Labels

Sprint

None

Fix versions

Affects versions

Priority

Blocker
Configure