Details
-
Type:
Bug
-
Status: Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: 10.0.14-galera
-
Fix Version/s: None
-
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB
-
Labels:None
-
Environment:Linux db1 3.2.0-4-amd64 #1 SMP Debian 3.2.63-2 x86_64 GNU/Linux
Description
We recently had an issue with one node of our 3-node galera cluster eating all hard disk space. It turned out to be that the ibdata1 file had grown up too much in size. However, this increase cannot be explained by the cluster's everyday usage.
The cluster is hosting one database that has reached the size of 25GB over a span of 5 years. When all free space was consumed, the size of ibdata1 had grown up to ~12GB, which is about 48% of the whole database size. Using 'innochecksum' and 'innodb_space' tools, we found out that 97% of pages in the ibdata1 file is undo log pages.
Unfortunately the cluster was loosely monitored at the time of the event, so we do not know if the increase was rapid or increamental over a longer period of time. In any case, the size of 12GB seems to be too big to have been caused by normal usage. After recovering from the situation (we rebootstrapped the cluster on one of the other two nodes and rejoined the affected node after removing its data files), we started monitoring ibdata1 on all 3 nodes. With the same everyday usage on the cluster, ibdata1 now remains the same size (~76MB) on all 3 nodes. So we are not seeing any incremental increase in size.
Some other, possibly relevant, facts:
- At the time of the event, we got innodb status output with a "History list length" value of 40957337 which seems pretty big.
- We had recenlty (before the event) introduced some changes so that one of our applications was issuing multiple queries doing "INSERT INTO ... SELECT ... " in the same query, which is considered bad practice as it introduces locks that can be avoided if the SELECT is done separately (we have now fixed this).
- We had also configured the nodes with "innodb_locks_unsafe_for_binlog=1".
We have kept a copy of the 12GB ibdata1 file for further investigation, but we are not sure about what to investigate. Maybe, there is some repeatability of data that could make us think that an endless loop wrote 12GB of data before eating all space, or maybe there is something else.
We will really appreciate it if we get any advice on this by innodb / xtradb experts out there.
Thank you in advance.
Gliffy Diagrams
Attachments
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
Jan Lindström,
Could you please take a look at this and maybe advise on how to proceed with the investigation more efficiently.