Some users have noticed that under certain workloads, MariaDB Galera Cluster 10.0.19 consistently hangs. The only way to unhang the server at that point is to kill the process.
I'm going to ask the users to confirm that this issue is still present on MariaDB Galera Cluster 10.0.21 as well.
The users' logs seem to suggest there might be a deadlock related to lock_sys->mutex. An excerpt of the log is here:
This might be related to these PXC bugs:
The users' stack trace does show one thread grabbing the mutex in innobase_kill_connection, which is where PXC's bug report mention the issue happening. The user's stack trace:
This could also be related to this Codership bug:
However, I'm not sure if that bug can lead to deadlocks specific to lock_sys->mutex.
The user's stack trace shows one thread that is similar to the thread 41 in that issue. The user's stack trace: