Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4464

MariaDB 5.5.29 + Galera on Ubuntu 12.04 Crash

    Details

      Description

      3 node Galera cluster - using Rsync SST. Runs fine for a few days - then does this in the middle of the night - with no load on the server - loads of RAM - no swapping:

      May 1 01:54:38 site-db2 mysqld: 130501 1:54:38 [ERROR] mysqld got signal 11 ;
      May 1 01:54:38 site-db2 mysqld: This could be because you hit a bug. It is also possible that this binary
      May 1 01:54:38 site-db2 mysqld: or one of the libraries it was linked against is corrupt, improperly built,
      May 1 01:54:38 site-db2 mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
      May 1 01:54:38 site-db2 mysqld:
      May 1 01:54:38 site-db2 mysqld: To report this bug, see http://kb.askmonty.org/en/reporting-bugs
      May 1 01:54:38 site-db2 mysqld:
      May 1 01:54:38 site-db2 mysqld: We will try our best to scrape up some info that will hopefully help
      May 1 01:54:38 site-db2 mysqld: diagnose the problem, but since we have already crashed,
      May 1 01:54:38 site-db2 mysqld: something is definitely wrong and this may fail.
      May 1 01:54:38 site-db2 mysqld:
      May 1 01:54:38 site-db2 mysqld: Server version: 5.5.29-MariaDB-mariadb1~precise
      May 1 01:54:38 site-db2 mysqld: key_buffer_size=268435456
      May 1 01:54:38 site-db2 mysqld: read_buffer_size=131072
      May 1 01:54:38 site-db2 mysqld: max_used_connections=6
      May 1 01:54:38 site-db2 mysqld: max_threads=802
      May 1 01:54:38 site-db2 mysqld: thread_count=3
      May 1 01:54:38 site-db2 mysqld: It is possible that mysqld could use up to
      May 1 01:54:38 site-db2 mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2018916 K bytes of memory
      May 1 01:54:38 site-db2 mysqld: Hope that's ok; if not, decrease some variables in the equation.
      May 1 01:54:38 site-db2 mysqld:
      May 1 01:54:38 site-db2 mysqld: Thread pointer: 0x0x0
      May 1 01:54:38 site-db2 mysqld: Attempting back2trace. You can use the following information to find out
      May 1 01:54:38 site-db2 mysqld: where mysqld died. If you see no messages after this, something went
      May 1 01:54:38 site-db2 mysqld: terribly wrong...
      May 1 01:54:38 site-db2 mysqld: stack_bottom = 0x0 thread_stack 0x40000
      May 1 01:54:38 site-db2 mysqld: :0()[0x7fc4fd0682fb]
      May 1 01:54:38 site-db2 mysqld: :0()[0x7fc4fcc8daa1]
      May 1 01:54:38 site-db2 mysqld: :0()[0x7fc4fb501cb0]
      May 1 01:54:38 site-db2 mysqld: :0()[0x7fc4f856ec65]
      May 1 01:54:38 site-db2 mysqld: :0()[0x7fc4f856ee39]
      May 1 01:54:38 site-db2 mysqld: :0()[0x7fc4f8570438]
      May 1 01:54:38 site-db2 mysqld: :0()[0x7fc4f8627d2e]
      May 1 01:54:38 site-db2 mysqld: :0()[0x7fc4f862d8b7]
      May 1 01:54:38 site-db2 mysqld: :0()[0x7fc4f8633181]
      May 1 01:54:38 site-db2 mysqld: :0()[0x7fc4fb4f9e9a]
      May 1 01:54:38 site-db2 mysqld: :0()[0x7fc4fac2accd]
      May 1 01:54:38 site-db2 mysqld: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      May 1 01:54:38 site-db2 mysqld: information that should help you find out what is causing the crash.
      May 1 01:54:38 site-db2 mysqld_safe: Number of processes running now: 0
      May 1 01:54:38 site-db2 mysqld_safe: WSREP: not restarting wsrep node automatically
      May 1 01:54:38 site-db2 mysqld_safe: mysqld from pid file /var/run/mysqld/mysqld.pid ended

      Any help greatly appreciated!

      Tim

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            tim.clark Tim Clark added a comment -

            Checked apparmor this evening - it has the standard Mariadb 'blank' policy for usr.sbin.mysqld - so I don't think that's the problem.

            Tim

            Show
            tim.clark Tim Clark added a comment - Checked apparmor this evening - it has the standard Mariadb 'blank' policy for usr.sbin.mysqld - so I don't think that's the problem. Tim
            Hide
            vovochka Vladimir Perepechin added a comment - - edited

            Seppo, i thought that it's the same problem because of my log entries after crash:

            terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<asio::system_error> >'
            what(): Transport endpoint is not connected
            130514 15:39:08 [ERROR] mysqld got signal 6 ;
            This could be because you hit a bug. It is also possible that this binary
            or one of the libraries it was linked against is corrupt, improperly built,
            or misconfigured. This error can also be caused by malfunctioning hardware.

            To report this bug, see http://kb.askmonty.org/en/reporting-bugs

            We will try our best to scrape up some info that will hopefully help
            diagnose the problem, but since we have already crashed,
            something is definitely wrong and this may fail.

            Server version: 5.5.29-MariaDB
            key_buffer_size=536870912
            read_buffer_size=1048576
            max_used_connections=13
            max_threads=153
            thread_count=6
            It is possible that mysqld could use up to
            key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1936541 K bytes of memory
            Hope that's ok; if not, decrease some variables in the equation.

            Thread pointer: 0x0x0
            Attempting backtrace. You can use the following information to find out
            where mysqld died. If you see no messages after this, something went
            terribly wrong...
            stack_bottom = 0x0 thread_stack 0x48000
            ??:0(my_print_stacktrace)[0xa9d44e]
            ??:0(handle_fatal_signal)[0x6e3c2b]
            :0()[0x7f9f6f778500]
            :0()[0x7f9f6e02e8a5]
            :0()[0x7f9f6e030085]
            :0()[0x7f9f6e6d1a5d]
            :0()[0x7f9f6e6cfbe6]
            :0()[0x7f9f6e6cfc13]
            :0()[0x7f9f6e6cfd0e]
            :0()[0x7f9f6baecf8a]
            :0()[0x7f9f6baed0ab]
            :0()[0x7f9f6baed231]
            :0()[0x7f9f6bade497]
            :0()[0x7f9f6badf0a3]
            :0()[0x7f9f6bae8cf0]
            :0()[0x7f9f6bb092c9]
            :0()[0x7f9f6bb03616]
            :0()[0x7f9f6bb1c227]
            :0()[0x7f9f6bb1fb19]
            :0()[0x7f9f6f770851]
            :0()[0x7f9f6e0e490d]
            The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
            information that should help you find out what is causing the crash.

            Not the best backtrace, but googling "'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<asio::system_error> >'" + mariadb brings me too this report.
            And looking at Elena's report:

            > May 10 23:22:33 mysqld: terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<asio::system_error> >'
            > May 10 23:22:33 mysqld: what(): Transport endpoint is not connected
            > May 10 23:22:33 mysqld: 130510 23:22:33 [ERROR] mysqld got signal 6 ;

            I thought that it's the same problem.

            Show
            vovochka Vladimir Perepechin added a comment - - edited Seppo, i thought that it's the same problem because of my log entries after crash: terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<asio::system_error> >' what(): Transport endpoint is not connected 130514 15:39:08 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see http://kb.askmonty.org/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 5.5.29-MariaDB key_buffer_size=536870912 read_buffer_size=1048576 max_used_connections=13 max_threads=153 thread_count=6 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1936541 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x48000 ??:0(my_print_stacktrace) [0xa9d44e] ??:0(handle_fatal_signal) [0x6e3c2b] :0( ) [0x7f9f6f778500] :0( ) [0x7f9f6e02e8a5] :0( ) [0x7f9f6e030085] :0( ) [0x7f9f6e6d1a5d] :0( ) [0x7f9f6e6cfbe6] :0( ) [0x7f9f6e6cfc13] :0( ) [0x7f9f6e6cfd0e] :0( ) [0x7f9f6baecf8a] :0( ) [0x7f9f6baed0ab] :0( ) [0x7f9f6baed231] :0( ) [0x7f9f6bade497] :0( ) [0x7f9f6badf0a3] :0( ) [0x7f9f6bae8cf0] :0( ) [0x7f9f6bb092c9] :0( ) [0x7f9f6bb03616] :0( ) [0x7f9f6bb1c227] :0( ) [0x7f9f6bb1fb19] :0( ) [0x7f9f6f770851] :0( ) [0x7f9f6e0e490d] The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. Not the best backtrace, but googling "'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<asio::system_error> >'" + mariadb brings me too this report. And looking at Elena's report: > May 10 23:22:33 mysqld: terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<asio::system_error> >' > May 10 23:22:33 mysqld: what(): Transport endpoint is not connected > May 10 23:22:33 mysqld: 130510 23:22:33 [ERROR] mysqld got signal 6 ; I thought that it's the same problem.
            Hide
            seppo Seppo Jaakola added a comment -

            @Vladimir: your crash looks indeed the same

            ...and this issue is probably same as reported with PXC in these reports:
            https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1153727
            https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1184034

            As a workaround, the cluster ports can be protected e.g. with iptables

            Show
            seppo Seppo Jaakola added a comment - @Vladimir: your crash looks indeed the same ...and this issue is probably same as reported with PXC in these reports: https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1153727 https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1184034 As a workaround, the cluster ports can be protected e.g. with iptables
            Hide
            tim.clark Tim Clark added a comment -

            Hi Seppo,

            Quick update - Since protecting the port - issue hasn't reoccurred - also noticed that we had a scheduled Nessus scan on a Friday (which coincides nicely with the crash) - so that's a very likely cause.

            Tim

            Show
            tim.clark Tim Clark added a comment - Hi Seppo, Quick update - Since protecting the port - issue hasn't reoccurred - also noticed that we had a scheduled Nessus scan on a Friday (which coincides nicely with the crash) - so that's a very likely cause. Tim
            Hide
            seppo Seppo Jaakola added a comment -

            The issue is now in Fix Released state in Galera portal. Galera plugin 2.6 has the fix and is present in MGC 5.5.32 and later releases

            Show
            seppo Seppo Jaakola added a comment - The issue is now in Fix Released state in Galera portal. Galera plugin 2.6 has the fix and is present in MGC 5.5.32 and later releases

              People

              • Assignee:
                seppo Seppo Jaakola
                Reporter:
                tim.clark Tim Clark
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: