Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4867

Incomplete Binlog writes on disk-full error, also breaking slave replication

    Details

    • Type: Bug
    • Status: Stalled
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 10.0.3, 5.5.32, 5.3.12
    • Fix Version/s: 10.0, 5.5
    • Component/s: None
    • Labels:
      None
    • Environment:
      Linux, probably any

      Description

      It appears the binlog and replication code still has write calls that aren't checked for successful completion. On disk full, the binlog gets truncated.

      As a direct consequence and example, a slave will report this error:
      Got fatal error 1236 from master when reading data from binary log: 'binlog truncated in the middle of event'

      Obviously it's not just a problem with replication, but also for having correct binlogs for point-in-time recovery.

      Suggested solution: writes should check for success, and in case of a disk-full error, report and retry correctly. The "wait until space has been freed up" approach that's used on other disk-full conditions, would be fine here also.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            elenst Elena Stepanova added a comment -

            Hi,

            I'm getting "wait until" when the binary log cannot be written (on 5.5 and on 10.0, didn't check 5.3):

            130902 0:25:22 [Warning] mysqld: Disk is full writing '/data/smalldisk/master-bin.000001' (Errcode: 28). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)
            130902 0:25:22 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs

            I do the following:

            • start server with default options and with log-bin=<small partition>/master-bin (the datadir is on a different partition)
            • set binlog_format=row (for faster growing)
            • create a table;
            • run some INSERTs until the next one hangs and the message above is printed into the error log.

            Is there anything special in the use case you described that I've missed?

            Show
            elenst Elena Stepanova added a comment - Hi, I'm getting "wait until" when the binary log cannot be written (on 5.5 and on 10.0, didn't check 5.3): 130902 0:25:22 [Warning] mysqld: Disk is full writing '/data/smalldisk/master-bin.000001' (Errcode: 28). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space) 130902 0:25:22 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs I do the following: start server with default options and with log-bin=<small partition>/master-bin (the datadir is on a different partition) set binlog_format=row (for faster growing) create a table; run some INSERTs until the next one hangs and the message above is printed into the error log. Is there anything special in the use case you described that I've missed?
            Hide
            martijnotto Martijn Otto added a comment -

            We recently ran into exactly this problem too. The server does indeed continue to write to the binlog after free space has been created again, however that does leave it with a broken binlog.

            Say you are trying to write 1024 bytes to the binlog, but only 512 bytes are available. 512 bytes will be written, after which the disk is full and the message is written to the error log.

            After creating free space, the remaining 512 bytes are not written to the file. Instead, the next binlog entry is written. Any slaves replicating this server will notice this and stop replicating. The only way to get them to work again is to guess at which position the binlog continues and set that as the new master_log_pos.

            Show
            martijnotto Martijn Otto added a comment - We recently ran into exactly this problem too. The server does indeed continue to write to the binlog after free space has been created again, however that does leave it with a broken binlog. Say you are trying to write 1024 bytes to the binlog, but only 512 bytes are available. 512 bytes will be written, after which the disk is full and the message is written to the error log. After creating free space, the remaining 512 bytes are not written to the file. Instead, the next binlog entry is written. Any slaves replicating this server will notice this and stop replicating. The only way to get them to work again is to guess at which position the binlog continues and set that as the new master_log_pos.
            Hide
            elenst Elena Stepanova added a comment -

            I see, thanks for the info, I will check it out.

            Show
            elenst Elena Stepanova added a comment - I see, thanks for the info, I will check it out.
            Hide
            elenst Elena Stepanova added a comment -

            In my manual tests on the current 5.5 tree the rest of the event (or the event group) was written to the binary log after the space was freed.
            That's how it was:

            • there is a little space left;
            • statement/transaction gets committed;
            • binary log grows a bit, using up all remaining disk space;
              => the message in the error log appears, the statement/commit execution doesn't finish, the slave doesn't replicate the event, the binary log is unreadable;
            • disk space is freed
              => the binlog grows further and becomes readable again, the statement/transaction execution finishes, the event is successfully replicated on the slave.

            I tried it with RBR (atomic transactions) and with SBR (atomic and multi-statement transactions).

            Possibly the initially described erroneous behavior somehow depends on the event type(s) and their contents, or on server startup parameters, or on the file system, or it's caused by a specific combination of free/required disk space, or it is only possible in the concurrent scenario.

            I'm assigning this to Kristian, maybe he'll be able to see the reason of the problem by code inspection (when he has a chance to inspect it).

            Show
            elenst Elena Stepanova added a comment - In my manual tests on the current 5.5 tree the rest of the event (or the event group) was written to the binary log after the space was freed. That's how it was: there is a little space left; statement/transaction gets committed; binary log grows a bit, using up all remaining disk space; => the message in the error log appears, the statement/commit execution doesn't finish, the slave doesn't replicate the event, the binary log is unreadable; disk space is freed => the binlog grows further and becomes readable again, the statement/transaction execution finishes, the event is successfully replicated on the slave. I tried it with RBR (atomic transactions) and with SBR (atomic and multi-statement transactions). Possibly the initially described erroneous behavior somehow depends on the event type(s) and their contents, or on server startup parameters, or on the file system, or it's caused by a specific combination of free/required disk space, or it is only possible in the concurrent scenario. I'm assigning this to Kristian, maybe he'll be able to see the reason of the problem by code inspection (when he has a chance to inspect it).
            Hide
            monty Michael Widenius added a comment -

            Elena did all the rights steps to try to reproduce this, so there is not much more I can do regarding testing.

            I took a look at all code in log.cc and related code but could not find anything wrong.

            The logic for file full is quite simply and is the same for any event:
            In the lowest file write call, mysys/my_write.c:my_write() we detect if we on write() get
            an ENOSPC or EDQUOT error. If yes, we give a warning and wait 60 seconds and try again by writing the data we did not manage to write before.

            I am reasonable confident in the my_write() code, so the error is very likely somewhere else. I would however need more data to be able to find out where.

            If you could upload the last 100K of the binary log where things went wrong to the private folder at ftp://ftp.askmonty.org, then we would have more to go on to find and fix this.

            Show
            monty Michael Widenius added a comment - Elena did all the rights steps to try to reproduce this, so there is not much more I can do regarding testing. I took a look at all code in log.cc and related code but could not find anything wrong. The logic for file full is quite simply and is the same for any event: In the lowest file write call, mysys/my_write.c:my_write() we detect if we on write() get an ENOSPC or EDQUOT error. If yes, we give a warning and wait 60 seconds and try again by writing the data we did not manage to write before. I am reasonable confident in the my_write() code, so the error is very likely somewhere else. I would however need more data to be able to find out where. If you could upload the last 100K of the binary log where things went wrong to the private folder at ftp://ftp.askmonty.org , then we would have more to go on to find and fix this.
            Hide
            monty Michael Widenius added a comment -

            I have changed the priority of this to "minor" until we find a way to repeat it or more information of what could be different from our test and the system where this happens.

            Show
            monty Michael Widenius added a comment - I have changed the priority of this to "minor" until we find a way to repeat it or more information of what could be different from our test and the system where this happens.
            Hide
            arjenlentz Arjen Lentz added a comment -

            Thanks Monty
            Unfortunately I don't think we had a copy of the log available from the last occurrence. 
            We haven't spotted this in recent times, obviously we try to avoid disk full situations for clients.
            We'll endeavour to provide a log should we spot it again. Or perhaps another user will encounter the issue...

            Regards
            Arjen. 

            -------- Original message --------
            From: "Michael Widenius (JIRA)" <jira@mariadb.atlassian.net>
            Date:24/03/2014 04:13 (GMT+10:00)
            To: arjen@openquery.com
            Subject: [JIRA] (MDEV-4867) Incomplete Binlog writes on disk-full error,
            also breaking slave replication

                [ https://mariadb.atlassian.net/browse/MDEV-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=44497#comment-44497 ]

            Michael Widenius commented on MDEV-4867:
            ----------------------------------------

            Elena did all the rights steps to try to reproduce this, so there is not much more I can do regarding testing.

            I took a look at all code in log.cc and related code but could not find anything wrong.

            The logic for file full is quite simply and is the same for any event:
            In the lowest file write call, mysys/my_write.c:my_write() we detect if we on write() get
            an ENOSPC or EDQUOT error.   If yes, we give a warning and wait 60 seconds and try again by writing the data we did not manage to write before.

            I am reasonable confident in the my_write() code, so the error is very likely somewhere else. I would however need more data to be able to find out where.

            If you could upload the last 100K of the binary log where things went wrong to the private folder at ftp://ftp.askmonty.org, then we would have more to go on to find and fix this.


            This message was sent by Atlassian JIRA
            (v6.2-OD-10-004-WN#6253)

            Show
            arjenlentz Arjen Lentz added a comment - Thanks Monty Unfortunately I don't think we had a copy of the log available from the last occurrence.  We haven't spotted this in recent times, obviously we try to avoid disk full situations for clients. We'll endeavour to provide a log should we spot it again. Or perhaps another user will encounter the issue... Regards Arjen.  -------- Original message -------- From: "Michael Widenius (JIRA)" <jira@mariadb.atlassian.net> Date:24/03/2014 04:13 (GMT+10:00) To: arjen@openquery.com Subject: [JIRA] ( MDEV-4867 ) Incomplete Binlog writes on disk-full error, also breaking slave replication     [ https://mariadb.atlassian.net/browse/MDEV-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=44497#comment-44497 ] Michael Widenius commented on MDEV-4867 : ---------------------------------------- Elena did all the rights steps to try to reproduce this, so there is not much more I can do regarding testing. I took a look at all code in log.cc and related code but could not find anything wrong. The logic for file full is quite simply and is the same for any event: In the lowest file write call, mysys/my_write.c:my_write() we detect if we on write() get an ENOSPC or EDQUOT error.   If yes, we give a warning and wait 60 seconds and try again by writing the data we did not manage to write before. I am reasonable confident in the my_write() code, so the error is very likely somewhere else. I would however need more data to be able to find out where. If you could upload the last 100K of the binary log where things went wrong to the private folder at ftp://ftp.askmonty.org , then we would have more to go on to find and fix this. – This message was sent by Atlassian JIRA (v6.2-OD-10-004-WN#6253)

              People

              • Assignee:
                monty Michael Widenius
                Reporter:
                arjen Arjen Lentz
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated: