Details
-
Type:
Bug
-
Status: Closed
-
Priority:
Major
-
Resolution: Cannot Reproduce
-
Affects Version/s: 10.0.14-galera
-
Fix Version/s: 10.0.16-galera
-
Component/s: Galera
-
Labels:None
Description
May be already fixed, but I thought I'd file it to get it written down somewhere.
I was running a 3-node cluster on 3 separate VMs.
I had a bash loop
while :; do mysql -e "insert..."; sleep 1; done
I killed 2 of the VMs (but left the one running that my bash loop connected to). The remaining note formed a new non-primary component as expected. I did SET GLOBAL wsrep_cluster_address='gcomm://' to make the remaining cluster node primary. But the client started by the bash loop got stuck for quite a long time (10+ minutes?):
| 737 | unauthenticated user | connecting host | NULL | Connect | NULL | login | NULL | 0.000 |
I tried to kill the thread, but it stays in "Killed":
| 737 | unauthenticated user | connecting host | NULL | Killed | NULL | login | NULL | 0.000 |
Even killing the mysql process itself that's trying to connect doesn't make this MySQL thread go away.
So, it looks like there is some race condition that can cause a client to get stuck perhaps permanently in "login" when wsrep is trying to change the node/cluster state in some way.
Gliffy Diagrams
Attachments
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
This was pretty easy to reproduce on 10.0.14, but I haven't seen it yet on 10.0.16, so maybe it's fixed ...