Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6340

Mariadb 10.0.12 fatal "Lost connection" error w/ GCC 4.9 'Release' build; workaround ~ CFLAGS="-fno-delete-null-pointer-checks"

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.0.12
    • Fix Version/s: 10.0.13
    • Component/s: None
    • Labels:
      None
    • Environment:

      Description

      After a clean/new install of MariaDB 10.0.11, undertaking a completely NEW drush-install from clean Drupal v7.28 source, I get the following fatal error + crash:

          SQLSTATE[HY000]: General error: 2006 MySQL server has gone away
      

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              grantk GrantK added a comment - - edited

              gcc 4.9.1 is neither released, nor shipping with any distribution; GCC 4.9.0 is.

              is the decision, then, to simply ignore builds of RELEASE MariaDB being broken with RELEASE GCC, and kick the ball down the road to GCC 4.9.1, whenever it's released?

              How, exactly, do we RE-OPEN this?

              Show
              grantk GrantK added a comment - - edited gcc 4.9.1 is neither released, nor shipping with any distribution; GCC 4.9.0 is. is the decision, then, to simply ignore builds of RELEASE MariaDB being broken with RELEASE GCC, and kick the ball down the road to GCC 4.9.1, whenever it's released? How, exactly, do we RE-OPEN this?
              Hide
              serg Sergei Golubchik added a comment -

              I've reopened it.

              But 4.9.0 is pretty much the bleeding edge, most distributions don't ship it (and, as you can see, they have good reasons not to). On the other hand, 4.9.1 is already in Mageia Cauldron (which is in the development stage and won't be declared stable anytime soon).

              I will try to see if we can change something in MariaDB to avoid this gcc bug. But given that it is a gcc bug, apparently, and all that I wrote above, this won't be a hight priority bug, sorry.

              Show
              serg Sergei Golubchik added a comment - I've reopened it. But 4.9.0 is pretty much the bleeding edge, most distributions don't ship it (and, as you can see, they have good reasons not to). On the other hand, 4.9.1 is already in Mageia Cauldron (which is in the development stage and won't be declared stable anytime soon). I will try to see if we can change something in MariaDB to avoid this gcc bug. But given that it is a gcc bug, apparently, and all that I wrote above, this won't be a hight priority bug, sorry.
              Hide
              grantk GrantK added a comment -

              Can you provide a reference to the specific GCC bug that you suggest is fixed?

              In apparent reference to

              "Operational Notification – Changes in gcc Code Optimization Can Cause a Crash in BIND"
              https://kb.isc.org/article/AA-01167

              as pointed out by showaz in bind's #irc, the bind dev team posted a GCC bug here,

              "GCC 4.9 generates incorrect object code"
              https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61236,

              for which a workaround is the similar

              -fno-delete-null-pointer-checks

              @ GCC, that bug has been resolved as INVALID by the GCC team, and, as a result, the bind team committed fixes to their repository branches to address the crash and work around the optimization issue.

              In that bug report, it's glibc that's called into question, not gcc.

              Noting as posted above here, in the mariadb backtrace,

              ...
              /lib64/libpthread.so.0(+0x80db)[0x7fcf4ff230db]
              /lib64/libc.so.6(clone+0x6d)[0x7fcf4ebd390d]

              So, is it in fact GCC, as you've ascribed, or glibc/other, that's invovled with the MariaDB crashes?

              Show
              grantk GrantK added a comment - Can you provide a reference to the specific GCC bug that you suggest is fixed? In apparent reference to "Operational Notification – Changes in gcc Code Optimization Can Cause a Crash in BIND" https://kb.isc.org/article/AA-01167 as pointed out by showaz in bind's #irc, the bind dev team posted a GCC bug here, "GCC 4.9 generates incorrect object code" https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61236 , for which a workaround is the similar -fno-delete-null-pointer-checks @ GCC, that bug has been resolved as INVALID by the GCC team, and, as a result, the bind team committed fixes to their repository branches to address the crash and work around the optimization issue. In that bug report, it's glibc that's called into question, not gcc. Noting as posted above here, in the mariadb backtrace, ... /lib64/libpthread.so.0(+0x80db) [0x7fcf4ff230db] /lib64/libc.so.6(clone+0x6d) [0x7fcf4ebd390d] So, is it in fact GCC, as you've ascribed, or glibc/other, that's invovled with the MariaDB crashes?
              Hide
              serg Sergei Golubchik added a comment -

              Jan Lindström, please take a look at the following patch:

              === modified file 'storage/innobase/include/lock0lock.h'
              --- storage/innobase/include/lock0lock.h        2014-05-07 15:32:23 +0000
              +++ storage/innobase/include/lock0lock.h        2014-07-30 19:36:42 +0000
              @@ -277,31 +277,31 @@
               UNIV_INTERN
               dberr_t
               lock_rec_insert_check_and_lock(
               /*===========================*/
                      ulint           flags,  /*!< in: if BTR_NO_LOCKING_FLAG bit is
                                              set, does nothing */
                      const rec_t*    rec,    /*!< in: record after which to insert */
                      buf_block_t*    block,  /*!< in/out: buffer block of rec */
                      dict_index_t*   index,  /*!< in: index */
                      que_thr_t*      thr,    /*!< in: query thread */
                      mtr_t*          mtr,    /*!< in/out: mini-transaction */
                      ibool*          inherit)/*!< out: set to TRUE if the new
                                              inserted record maybe should inherit
                                              LOCK_GAP type locks from the successor
                                              record */
              -       __attribute__((nonnull, warn_unused_result));
              +       __attribute__((nonnull(2,3,4,6,7), warn_unused_result));
               /*********************************************************************//**
              

              (the same for xtradb, of course).

              Here's why: old declaration promises that thr can never be NULL, and gcc-4.9.0 trusts that and optimizes accordingly. But in fact, the function starts from

              lock_rec_insert_check_and_lock(
              /*===========================*/
                      ...
              	ibool*		inherit)
              {
                      ...
              	if (flags & BTR_NO_LOCKING_FLAG) {
              		return(DB_SUCCESS);
              	}
              
              	trx = thr_get_trx(thr);
              

              so when BTR_NO_LOCKING_FLAG is set, thr can be NULL (and it is NULL in this stack trace: btr_insert_on_non_leaf_level_func → btr_cur_optimistic_insert → btr_cur_ins_lock_and_undo → lock_rec_insert_check_and_lock). The patch fixes this by removing nonnull attribute for thr. Another solution would be to move the check for BTR_NO_LOCKING_FLAG out of the function and keep the nonnull attribute.

              Show
              serg Sergei Golubchik added a comment - Jan Lindström , please take a look at the following patch: === modified file 'storage/innobase/include/lock0lock.h' --- storage/innobase/include/lock0lock.h 2014-05-07 15:32:23 +0000 +++ storage/innobase/include/lock0lock.h 2014-07-30 19:36:42 +0000 @@ -277,31 +277,31 @@ UNIV_INTERN dberr_t lock_rec_insert_check_and_lock( /*===========================*/ ulint flags, /*!< in: if BTR_NO_LOCKING_FLAG bit is set, does nothing */ const rec_t* rec, /*!< in: record after which to insert */ buf_block_t* block, /*!< in/out: buffer block of rec */ dict_index_t* index, /*!< in: index */ que_thr_t* thr, /*!< in: query thread */ mtr_t* mtr, /*!< in/out: mini-transaction */ ibool* inherit)/*!< out: set to TRUE if the new inserted record maybe should inherit LOCK_GAP type locks from the successor record */ - __attribute__((nonnull, warn_unused_result)); + __attribute__((nonnull(2,3,4,6,7), warn_unused_result)); /*********************************************************************//** (the same for xtradb, of course). Here's why: old declaration promises that thr can never be NULL, and gcc-4.9.0 trusts that and optimizes accordingly. But in fact, the function starts from lock_rec_insert_check_and_lock( /*===========================*/ ... ibool* inherit) { ... if (flags & BTR_NO_LOCKING_FLAG) { return (DB_SUCCESS); } trx = thr_get_trx(thr); so when BTR_NO_LOCKING_FLAG is set, thr can be NULL (and it is NULL in this stack trace: btr_insert_on_non_leaf_level_func → btr_cur_optimistic_insert → btr_cur_ins_lock_and_undo → lock_rec_insert_check_and_lock). The patch fixes this by removing nonnull attribute for thr. Another solution would be to move the check for BTR_NO_LOCKING_FLAG out of the function and keep the nonnull attribute.
              Hide
              jplindst Jan Lindström added a comment -

              Patch is corret, I just do not follow why bother to call this function at all if BTR_NO_LOCKING_FLAG is set. Removing the call(s) could need deeper fix.

              Show
              jplindst Jan Lindström added a comment - Patch is corret, I just do not follow why bother to call this function at all if BTR_NO_LOCKING_FLAG is set. Removing the call(s) could need deeper fix.

                People

                • Assignee:
                  serg Sergei Golubchik
                  Reporter:
                  grantk GrantK
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  5 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Time Tracking

                    Estimated:
                    Original Estimate - Not Specified
                    Not Specified
                    Remaining:
                    Remaining Estimate - 0 minutes
                    0m
                    Logged:
                    Time Spent - 3 hours, 30 minutes
                    3h 30m