Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Not a Bug
    • Affects Version/s: 10.0.20
    • Fix Version/s: N/A
    • Component/s: OTHER
    • Labels:
      None
    • Environment:
      AWS EC2 / NixOS / Linux-amd64

      Description

      Few times we got damaged binlog index on two of our servers running MariaDB 10.0.20.

      I can't tell anything exact, but.

      Once index looked fine, but probably included invisible symbols, and none of master command were working (RESET MASTER, SHOW MASTER STATUS).

      Another time index got some binary data (attached). This happened on master and slave (both running the same version - 10.0.20, and have the same config)

      Yet another time index on master has something on the first line and mysqldump --master-data=1/2 caught SIGSEGV permanently:

      # mysqldump --master-data=2 --single-transaction=1 --compact=1 --comments=0 bob_live_tw > /dev/null
      Segmentation fault (core dumped)
      

      mysqld printed something weird on stderr (attached).

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            elenst Elena Stepanova added a comment -

            Igor Pashev,

            Judging by the logs, you have more problems than just binlog index corruption, and they appear to be related.

            Do you have some kind of a wrapper around mysqld / mysqld_safe which starts/stops/restarts the server? For example, where these records in the error log come from?

            2015-07-21 14:22:46 --MARK--
            

            and more importantly

            ready to mysql
            

            The obvious problem (that might or might not be the reason for the binlog issue, but there is a good chance it is) is that your server does not shut down properly upon restart, so during restart you have a concurrent use of essential files. Maybe binlog index is one of those, then, if both the "old" and the "new" instances write into it at once, it's not surprising it ends up with a garbage.

            There are visible indications that the concurrency problem does exist.
            First, look at the shutdown/restart (a relatively normal one, without a crash or anything):

            2015-07-21 18:48:37 --MARK--
            150721 20:06:11 [Note] /nix/store/x98iqcgraz75l3738va7kprwsapcggv3-mariadb-10.0.20/bin/mysqld: Normal shutdown
            
            150721 20:06:11 [Note] Event Scheduler: Purging the queue. 0 events
            ready to mysql
            150721 20:06:12 [Note] /nix/store/x98iqcgraz75l3738va7kprwsapcggv3-mariadb-10.0.20/bin/mysqld (mysqld 10.0.20-MariaDB-log) starting as process 26891 ...
            

            So, the normal shutdown just started, it's not completed (it would have said "Shutdown complete"). But somehow, something decides it's "ready to mysql" and starts the new one. Apparently, it does not even SIGKILL the previous server before doing so (it would have been a bad idea, but starting a new one without stopping the old one is even worse):

            150721 20:06:12 [Note] /nix/store/x98iqcgraz75l3738va7kprwsapcggv3-mariadb-10.0.20/bin/mysqld (mysqld 10.0.20-MariaDB-log) starting as process 26891 ...
            150721 20:06:12 [ERROR] mysqld: Can't lock aria control file '/var/mysql/db/aria_log_control' for exclusive use, error: 11. Will retry for 30 seconds
            

            As the log says, something is still using the aria control file. Given the above, it's apparently the old process which is still shutting down.

            There are other weird artefacts which might be caused by the same issue:

            2015-07-21 11:11:15 7fa2e2a37700  InnoDB: Operating system error number 11 in a file operation.
            InnoDB: Error number 11 means 'Resource temporarily unavailable'.
            InnoDB: Some operating system error numbers are described at
            InnoDB: http://dev.mysql.com/doc/refman/5.6/en/operating-system-error-codes.html
            150721 11:11:15 [ERROR] InnoDB: File (unknown): 'read' returned OS error 111. Cannot continue operation
            
            InnoDB: Page directory corruption: infimum not pointed to
            2015-07-23 19:05:49 7f4a68826700 InnoDB: Page dump in ascii and hex (16384 bytes):
            

            etc.

            I suggest to get rid of this problem (e.g. fix the wrapper so it waits for the shutdown to finish, with all proper checks on the PID), to start fresh and see if the binlog index corruption re-appears. If the wrapper is something we provide that I'm not aware of, please let me know.
            We'll be waiting for your feedback.

            Show
            elenst Elena Stepanova added a comment - Igor Pashev , Judging by the logs, you have more problems than just binlog index corruption, and they appear to be related. Do you have some kind of a wrapper around mysqld / mysqld_safe which starts/stops/restarts the server? For example, where these records in the error log come from? 2015-07-21 14:22:46 --MARK-- and more importantly ready to mysql The obvious problem (that might or might not be the reason for the binlog issue, but there is a good chance it is) is that your server does not shut down properly upon restart, so during restart you have a concurrent use of essential files. Maybe binlog index is one of those, then, if both the "old" and the "new" instances write into it at once, it's not surprising it ends up with a garbage. There are visible indications that the concurrency problem does exist. First, look at the shutdown/restart (a relatively normal one, without a crash or anything): 2015-07-21 18:48:37 --MARK-- 150721 20:06:11 [Note] /nix/store/x98iqcgraz75l3738va7kprwsapcggv3-mariadb-10.0.20/bin/mysqld: Normal shutdown 150721 20:06:11 [Note] Event Scheduler: Purging the queue. 0 events ready to mysql 150721 20:06:12 [Note] /nix/store/x98iqcgraz75l3738va7kprwsapcggv3-mariadb-10.0.20/bin/mysqld (mysqld 10.0.20-MariaDB-log) starting as process 26891 ... So, the normal shutdown just started, it's not completed (it would have said "Shutdown complete"). But somehow, something decides it's "ready to mysql" and starts the new one. Apparently, it does not even SIGKILL the previous server before doing so (it would have been a bad idea, but starting a new one without stopping the old one is even worse): 150721 20:06:12 [Note] /nix/store/x98iqcgraz75l3738va7kprwsapcggv3-mariadb-10.0.20/bin/mysqld (mysqld 10.0.20-MariaDB-log) starting as process 26891 ... 150721 20:06:12 [ERROR] mysqld: Can't lock aria control file '/var/mysql/db/aria_log_control' for exclusive use, error: 11. Will retry for 30 seconds As the log says, something is still using the aria control file. Given the above, it's apparently the old process which is still shutting down. There are other weird artefacts which might be caused by the same issue: 2015-07-21 11:11:15 7fa2e2a37700 InnoDB: Operating system error number 11 in a file operation. InnoDB: Error number 11 means 'Resource temporarily unavailable'. InnoDB: Some operating system error numbers are described at InnoDB: http://dev.mysql.com/doc/refman/5.6/en/operating-system-error-codes.html 150721 11:11:15 [ERROR] InnoDB: File (unknown): 'read' returned OS error 111. Cannot continue operation InnoDB: Page directory corruption: infimum not pointed to 2015-07-23 19:05:49 7f4a68826700 InnoDB: Page dump in ascii and hex (16384 bytes): etc. I suggest to get rid of this problem (e.g. fix the wrapper so it waits for the shutdown to finish, with all proper checks on the PID), to start fresh and see if the binlog index corruption re-appears. If the wrapper is something we provide that I'm not aware of, please let me know. We'll be waiting for your feedback.
            Hide
            ip1981 Igor Pashev added a comment -

            Yes, you are right. Our server is running under supervisor, that itself is under super-supervisor.
            And we have discovered recently that super-supervisor sends SIGKILL to mysqld.

            Thank you for your analysis! I'd close this case as "invalid"

            Show
            ip1981 Igor Pashev added a comment - Yes, you are right. Our server is running under supervisor, that itself is under super-supervisor. And we have discovered recently that super-supervisor sends SIGKILL to mysqld. Thank you for your analysis! I'd close this case as "invalid"
            Hide
            elenst Elena Stepanova added a comment -

            Closing as not-a-bug for now. If the issue with the binlog index corruption re-appears on a cleaner setup, please comment to re-open.

            Show
            elenst Elena Stepanova added a comment - Closing as not-a-bug for now. If the issue with the binlog index corruption re-appears on a cleaner setup, please comment to re-open.

              People

              • Assignee:
                Unassigned
                Reporter:
                ip1981 Igor Pashev
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: