Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 5.1.67, 5.2.14, 5.3.12, 5.5.36, 10.0.9
    • Fix Version/s: 5.5.37, 10.0.10
    • Component/s: None
    • Labels:
      None
    • Environment:
      Windows Server 2012 (only?)

      Description

      As reported by Elena: mysqld.exe crashes on shutdown on Windows Server 2012. Maybe, only the debug build is affected.

      The stacktrace looks like this:

       	mysqld.exe!_db_enter_(const char * _func_, const char * _file_, unsigned int _line_, _db_stack_frame_ * _stack_frame_)  Line 1101 + 0x5 bytes	C
      >	mysqld.exe!my_free(void * ptr)  Line 209	C
       	mysqld.exe!delete_dynamic(st_dynamic_array * array)  Line 302	C
       	mysqld.exe!cleanup_instrument_config()  Line 238	C++
       	mysqld.exe!cleanup_performance_schema()  Line 165	C++
       	mysqld.exe!shutdown_performance_schema()  Line 209	C++
       	mysqld.exe!mysqld_exit(int exit_code)  Line 1968	C++
       	mysqld.exe!unireg_abort(int exit_code)  Line 1948	C++
       	mysqld.exe!win_main(int argc, char * * argv)  Line 5441	C++
       	mysqld.exe!mysql_service(void * p)  Line 5560	C++
       	mysqld.exe!mysqld_main(int argc, char * * argv)  Line 5753	C++
       	mysqld.exe!main(int argc, char * * argv)  Line 26	C++
       	mysqld.exe!__tmainCRTStartup()  Line 278 + 0x19 bytes	C
       	mysqld.exe!mainCRTStartup()  Line 189	C
      

      CORRECTION: forgot the first frames in the stack. They are like this:

       	vrfcore.dll!000007fdca3537ed() 	
       	[Frames below may be incorrect and/or missing, no symbols loaded for vrfcore.dll]	
       	vfbasics.dll!000007fdca2ea777() 	
      >	mysqld.exe!code_state()  Line 345	C
       	mysqld.exe!_db_enter_(const char * _func_, const char * _file_, unsigned int _line_, _db_stack_frame_ * _stack_frame_)  Line 1101 + 0x5 bytes	C
       	mysqld.exe!my_free(void * ptr)  Line 209	C
      

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            psergey Sergei Petrunia added a comment -

            Debugging, I see that the crash happens inside this call:

            pthread_mutex_init(&THR_LOCK_dbug, NULL);

            pthread_mutex_init translates to InitializeCriticalSection on Windows. InitializeCriticalSection only requires that valid memory is passed to it (which is true).

            Show
            psergey Sergei Petrunia added a comment - Debugging, I see that the crash happens inside this call: pthread_mutex_init(&THR_LOCK_dbug, NULL); pthread_mutex_init translates to InitializeCriticalSection on Windows. InitializeCriticalSection only requires that valid memory is passed to it (which is true).
            Hide
            psergey Sergei Petrunia added a comment -

            My guess is that we're trying to initialize another critical section where the first critical section is already initialized. MSDN mentions that CRITICAL_SECTION objects cannot be moved in memory, so attempt to initialize one over another may be considered an invalid operation.

            Show
            psergey Sergei Petrunia added a comment - My guess is that we're trying to initialize another critical section where the first critical section is already initialized. MSDN mentions that CRITICAL_SECTION objects cannot be moved in memory, so attempt to initialize one over another may be considered an invalid operation.
            Hide
            psergey Sergei Petrunia added a comment -

            The following patch makes the crash go away:

            === modified file 'dbug/dbug.c'
            --- dbug/dbug.c 2013-11-20 11:05:39 +0000
            +++ dbug/dbug.c 2014-03-20 13:45:44 +0000
            @@ -342,6 +342,7 @@ static CODE_STATE *code_state(void)
                 sstdout->file= stdout;
                 sstderr->file= stderr;
                 pthread_mutex_init(&THR_LOCK_dbug, NULL);
            +       fprintf(stderr, "psergey: initing THR_LOCK_dbug\n");
                 bzero(&init_settings, sizeof(init_settings));
                 init_settings.out_file= sstderr;
                 init_settings.flags=OPEN_APPEND;
            @@ -1642,6 +1643,9 @@ void _db_end_()
             
               cs->stack= &init_settings;
               FreeState(cs, 0);
            +  //psergey:
            +  fprintf(stderr, "psergey: freeing THR_LOCK_dbug\n");
            +  pthread_mutex_destroy(&THR_LOCK_dbug);
               init_done= 0;
             }
            

            When running with the patch, I see:

            psergey: initing THR_LOCK_dbug

            (standard messages about MariaDB startup)

            psergey: freeing THR_LOCK_dbug
            psergey: initing THR_LOCK_dbug

            Show
            psergey Sergei Petrunia added a comment - The following patch makes the crash go away: === modified file 'dbug/dbug.c' --- dbug/dbug.c 2013-11-20 11:05:39 +0000 +++ dbug/dbug.c 2014-03-20 13:45:44 +0000 @@ -342,6 +342,7 @@ static CODE_STATE *code_state(void) sstdout->file= stdout; sstderr->file= stderr; pthread_mutex_init(&THR_LOCK_dbug, NULL); + fprintf(stderr, "psergey: initing THR_LOCK_dbug\n"); bzero(&init_settings, sizeof(init_settings)); init_settings.out_file= sstderr; init_settings.flags=OPEN_APPEND; @@ -1642,6 +1643,9 @@ void _db_end_() cs->stack= &init_settings; FreeState(cs, 0); + //psergey: + fprintf(stderr, "psergey: freeing THR_LOCK_dbug\n"); + pthread_mutex_destroy(&THR_LOCK_dbug); init_done= 0; } When running with the patch, I see: psergey: initing THR_LOCK_dbug (standard messages about MariaDB startup) psergey: freeing THR_LOCK_dbug psergey: initing THR_LOCK_dbug
            Hide
            psergey Sergei Petrunia added a comment -

            So, it could be that this particular Windows Server 2012 machine started being picky about programs placing one CRITICAL_SECTION object over another.

            Show
            psergey Sergei Petrunia added a comment - So, it could be that this particular Windows Server 2012 machine started being picky about programs placing one CRITICAL_SECTION object over another.
            Hide
            elenst Elena Stepanova added a comment -

            As Sergei found out, the machine had Application Verifier set up for mysqld.exe. It made the machine being picky about this critical section specifics.

            The crash only happens under verifier on debug builds, both 5.5 and 10.0. I don't know whether the verifier points out at a real problem here – if it does, then it should probably be fixed. If it's just the verifier's whim which does not reveal a code flaw, I suppose it can be left as is.

            How it happened:
            Application verifier is used in buildbot tests on this machine.
            We turn off appverif for mysqld.exe at the first test step, as the precaution measure, then turn it on for one test run on a non-debug build, and immediately turn it off again when the test finishes, and turn it off once again at the next step before collecting the data. So, it should always be off except for a single test run.
            Apparently, the server got rebooted during this very test run, as it happens sometimes with Windows machines and their critical upgrades. It turns out that the appverif configuration is sticky – once turned on, it stays on until it is explicitly turned off. So, once it happened, the server started crashing on the build step, when bootstrap is run and the initial data is created; when buildbot fails on a build step, it doesn't go further, so it never reached the first test step when the verifier would be turned off. I will add switching it off at the very beginning of the factory as another precaution.

            Show
            elenst Elena Stepanova added a comment - As Sergei found out, the machine had Application Verifier set up for mysqld.exe. It made the machine being picky about this critical section specifics. The crash only happens under verifier on debug builds, both 5.5 and 10.0. I don't know whether the verifier points out at a real problem here – if it does, then it should probably be fixed. If it's just the verifier's whim which does not reveal a code flaw, I suppose it can be left as is. How it happened: Application verifier is used in buildbot tests on this machine. We turn off appverif for mysqld.exe at the first test step, as the precaution measure, then turn it on for one test run on a non-debug build, and immediately turn it off again when the test finishes, and turn it off once again at the next step before collecting the data. So, it should always be off except for a single test run. Apparently, the server got rebooted during this very test run, as it happens sometimes with Windows machines and their critical upgrades. It turns out that the appverif configuration is sticky – once turned on, it stays on until it is explicitly turned off. So, once it happened, the server started crashing on the build step, when bootstrap is run and the initial data is created; when buildbot fails on a build step, it doesn't go further, so it never reached the first test step when the verifier would be turned off. I will add switching it off at the very beginning of the factory as another precaution.
            Hide
            psergey Sergei Petrunia added a comment -

            Sergei Golubchik, please review the fix (assuming fprintfs will be deleted).

            Show
            psergey Sergei Petrunia added a comment - Sergei Golubchik , please review the fix (assuming fprintfs will be deleted).

              People

              • Assignee:
                serg Sergei Golubchik
                Reporter:
                psergey Sergei Petrunia
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 10 minutes
                  10m