Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4158

Crash on applying updates in MariaDB-Galera

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:
      Centos 5.8

      Description

      130209 5:48:19 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.

      To report this bug, see http://kb.askmonty.org/en/reporting-bugs

      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.

      Server version: 5.5.28a-MariaDB-log
      key_buffer_size=0
      read_buffer_size=131072
      max_used_connections=0
      max_threads=501
      thread_count=2
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 194588 K bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.

      Thread pointer: 0x0x14c383b0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x426bd0c8 thread_stack 0x80000
      ??:0(my_print_stacktrace)[0xa9a3ae]
      ??:0(handle_fatal_signal)[0x6e383b]
      :0()[0x3b8a40ebe0]
      ??:0(plugin_lock(THD*, st_plugin_int*))[0x5a020c]
      ??:0(ha_checktype(THD*, legacy_db_type, bool, bool))[0x6e925f]
      ??:0(open_table_def(THD*, TABLE_SHARE*, unsigned int))[0x62c420]
      ??:0(_Z15get_table_shareP3THDP10TABLE_LISTPcjjPij.clone.7)[0x546ffc]
      ??:0(open_table(THD*, TABLE_LIST*, st_mem_root*, Open_table_context*))[0x550036]
      ??:0(open_tables(THD*, TABLE_LIST*, unsigned int, unsigned int, Prelocking_strategy*))[0x5514c1]
      ??:0(open_and_lock_tables(THD*, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*))[0x552294]
      ??:0(Rows_log_event::do_apply_event(Relay_log_info const*))[0x7ae815]
      ??:0(_ZL15wsrep_apply_rbrP3THDPKhm)[0x58fff6]
      ??:0(wsrep_apply_cb(void*, void const*, unsigned long, long))[0x5905f6]
      :0()[0x2aaaab57f94a]
      :0()[0x2aaaab588372]
      :0()[0x2aaaab588f05]
      :0()[0x2aaaab560f94]
      :0()[0x2aaaab5617d8]
      :0()[0x2aaaab57ee4d]
      :0()[0x2aaaab599023]
      ??:0(wsrep_replication_process(THD*))[0x58fbb3]
      ??:0(start_wsrep_THD)[0x50af2c]
      :0()[0x3b8a40677d]
      :0()[0x3b89cd3c1d]

      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x0): is an invalid pointer
      Connection ID (thread ID): 2
      Status: NOT_KILLED

      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=on,mrr_cost_based=on,mrr_sort_keys=on,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=off

      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.
      130209 05:48:19 mysqld_safe mysqld from pid file /var/lib/mysql/devdb02.corp.wepay-inc.com.pid ended

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            elenst Elena Stepanova added a comment -

            Hi Aleksey,

            There isn't really much for investigation to go with..
            Is the problem persistent?
            Was there anything in the error log prior to the signal?
            Do you still have the log of the other node(s) this one was replicating from? Would it be possible to upload the datadir of the crashed node, along with the binlog it was replicating from at the time of the crash, and the cnf file from this node?

            Thanks

            Show
            elenst Elena Stepanova added a comment - Hi Aleksey, There isn't really much for investigation to go with.. Is the problem persistent? Was there anything in the error log prior to the signal? Do you still have the log of the other node(s) this one was replicating from? Would it be possible to upload the datadir of the crashed node, along with the binlog it was replicating from at the time of the crash, and the cnf file from this node? Thanks
            Hide
            aleksey.sanin Aleksey Sanin added a comment -

            I've seen the crash a couple times in 3 days. Same stack trace in plugin_lock() and same NULL pointer access. There were nothing interesting in the logs on this or other nodes. I am continuing testing and if I see it again I will definitely get all the data you've asked about.

            Show
            aleksey.sanin Aleksey Sanin added a comment - I've seen the crash a couple times in 3 days. Same stack trace in plugin_lock() and same NULL pointer access. There were nothing interesting in the logs on this or other nodes. I am continuing testing and if I see it again I will definitely get all the data you've asked about.
            Hide
            elenst Elena Stepanova added a comment -

            Moving discussion regarding this bug from the recent comment from MDEV-4179:

            >> Lastly, I've actually remembered that we've seen similar issue on dev environment though stack trace was different:

            >> https://mariadb.atlassian.net/browse/MDEV-4158

            >> It was the same upgrade process though it didn't crash 100% of the time. May be it is a timing issue somewhere?

            If it was also happening on slave restart, I guess it might be the same issue. To make sure, I will need to get a confirmation that on wsrep-recovery not only a slave is started and IO thread runs, but SQL thread can start applying events too. If that's the case, it could explain both the stack trace in this report, and the database corruption in MDEV-4179 if the event was ALTER TABLE or something equally crash-unsafe.

            A side note:
            How did you get key_buffer_size=0 ? MDEV-4179 shows a real value, is it a different build, different package, or different config?

            Show
            elenst Elena Stepanova added a comment - Moving discussion regarding this bug from the recent comment from MDEV-4179 : >> Lastly, I've actually remembered that we've seen similar issue on dev environment though stack trace was different: >> https://mariadb.atlassian.net/browse/MDEV-4158 >> It was the same upgrade process though it didn't crash 100% of the time. May be it is a timing issue somewhere? If it was also happening on slave restart, I guess it might be the same issue. To make sure, I will need to get a confirmation that on wsrep-recovery not only a slave is started and IO thread runs, but SQL thread can start applying events too. If that's the case, it could explain both the stack trace in this report, and the database corruption in MDEV-4179 if the event was ALTER TABLE or something equally crash-unsafe. A side note: How did you get key_buffer_size=0 ? MDEV-4179 shows a real value, is it a different build, different package, or different config?
            Hide
            aleksey.sanin Aleksey Sanin added a comment -

            The MDEV-4158 was also happening on server restart with slave enabled. And there is a good chance there was an ALTER TABLE there.

            For key_buffer_size, I think that MySQL sets it to the default 32K value if it is set to 0 in the config.

            Show
            aleksey.sanin Aleksey Sanin added a comment - The MDEV-4158 was also happening on server restart with slave enabled. And there is a good chance there was an ALTER TABLE there. For key_buffer_size, I think that MySQL sets it to the default 32K value if it is set to 0 in the config.
            Hide
            elenst Elena Stepanova added a comment -

            I suppose for now we can assume it's the same issue as MDEV-4179. If it continues to happen or/and there is any new information, we can always re-open it.

            Show
            elenst Elena Stepanova added a comment - I suppose for now we can assume it's the same issue as MDEV-4179 . If it continues to happen or/and there is any new information, we can always re-open it.

              People

              • Assignee:
                elenst Elena Stepanova
                Reporter:
                aleksey.sanin Aleksey Sanin
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: