Details
-
Type:
Bug
-
Status: Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 10.0.13
-
Fix Version/s: 10.0.14
-
Component/s: Data Definition - Alter Table, Events, Storage Engine - InnoDB
-
Labels:None
-
Environment:Ubuntu 12.04 LTS Linux db1062 3.2.0-60-generic #91-Ubuntu SMP
Server version: 10.0.13-MariaDB-log Source distribution
Description
Am running the following on a slave:
- Largish (24h, 600M rows, 200G) ALTER TABLE
- Events with INFORMATION_SCHEMA queries
- Threadpool pool-of-threads active
- Replication active
- No other significant traffic
After several hours, MariaDB locks up with 0% CPU and disk activity, and no response on existing or new connections on port, extra_port, or socket.
Attached are gdb backtraces for two occurrences, examples of the ALTER and the INFORMATION_SCHEMA activity, and other info. Would appreciate any insight from devs to identify the deadlock, and to narrow down the variables for a test case that isn't 200G.
Am presently trialing the ALTER outside the threadpool using the extra_port, with all other settings unchanged.
Other notes:
- It doesn't seem to be a thread pool overload, as there aren't enough threads in the backtrace.
- The INFORMATION_SCHEMA event traffic uses GET_LOCK to serialize some activity and prevent pile-up.
Gliffy Diagrams
Attachments
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
Using extra_port for the ALTER still locked up, but managed to catch it in action. See attached processlist and gdb trace #3. Similar to https://mariadb.atlassian.net/browse/MDEV-5551 ?
The client connections in Command="Killed" and State="Cleaning up" successfully connected according to client logs, and apparently got past authentication
, but the first query always failed with "Got timeout reading communication packets".
Subsequently the client did not show open TCP connections hanging around in lsof -i tcp output. However the server did continue to show open TCP connection in lsof, and did not report anything in err log with log_warnings=2.
Notice the REPLACE INTO `heartbeat` query in state Update for many seconds; that is percona toolkit pt-heartbeat writing to an InnoDB table which is usually instant. Attempting SHOW ENGINE INNODB STATUS to investigate simply hung indefinitely.