Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4334

Flush DBUG output in DBUG_ASSERT() in a way that does not cause crashes

    Details

    • Type: Task
    • Status: Open
    • Priority: Trivial
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      This revision adds a _db_flush() call to every DBUG_ASSERT():

          revid:monty@mysql.com-20110130104144-v3osvsbyl45zd5o3
      
          -#define DBUG_ASSERT(A) assert(A)
          +#define DBUG_ASSERT(A) do { _db_flush_(); assert(A); } while(0)
      

      Unfortunately this causes occasional crashes inside DBUG_ASSERT. I am starting
      to hit this a lot in my MDEV-26 tree, but it is also visible elsewhere, like
      here:

      http://buildbot.askmonty.org/buildbot/builders/kvm-dgcov-jaunty-i386/builds/2456/steps/test_2/logs/stdio

      I suspect that the original change is just wrong and that it needs protection
      against concurrent threads updating the DBUG data structures. Unfortunately
      this is not entirely trivial to do. I assume adding lock/unlock to every
      DBUG_ASSERT would make the test suite a lot slower, which is undesirable. On
      the other hand, the obvious

        #define DBUG_ASSERT(A) do { if (!A) _db_flush_(); assert(A); } while(0)
      

      evaluates A twice, which may cause problems, and assigning A to a temp
      variable also is not good, as we want the assert() macro to have access to the
      actual expression A to print it in the error log. So I am not sure how to fix
      this properly.

      For now, I will remove the _db_flush() from DBUG_ASSERT(), so I avoid
      crashes. So I filed this task for you in case you want to bring it back in, in
      a way that does not cause crash.

      Here are details of my analysis of the crash:

      #3  <signal handler called>
      #4  0x0000000000d0aa44 in _db_flush_ () at /home/knielsen/my/10.0/work-10.0-mdev26/dbug/dbug.c:2139
      #5  0x00000000005b3dd6 in next_event (rli=0x4820038) at /home/knielsen/my/10.0/work-10.0-mdev26/sql/slave.cc:5316
      #6  0x00000000005ad98e in exec_relay_log_event (thd=0x4859480, rli=0x4820038) at /home/knielsen/my/10.0/work-10.0-mdev26/sql/slave.cc:3042
      #7  0x00000000005b016f in handle_slave_sql (arg=0x481e940) at /home/knielsen/my/10.0/work-10.0-mdev26/sql/slave.cc:3965
      #8  0x0000000000ca07fd in pfs_spawn_thread (arg=0x4915890) at /home/knielsen/my/10.0/work-10.0-mdev26/storage/perfschema/pfs.cc:1015
      #9  0x00007f98cd89cb50 in start_thread (arg=<optimized out>) at pthread_create.c:304
      

      The crash is inside an assert:

      5316	      DBUG_ASSERT(rli->cur_log_fd == -1); // foreign descriptor
      
      #define DBUG_ASSERT(A) do { _db_flush_(); assert(A); } while(0)
      
      2135	void _db_flush_()
      2136	{
      2137	  CODE_STATE *cs;
      2138	  get_code_state_or_return;
      2139	  (void) fflush(cs->stack->out_file->file);
      2140	}
      

      The crash happens because cs->stack->out_file is NULL, as shown by
      disassembly:

      Dump of assembler code for function _db_flush_:
         0x0000000000d0aa20 <+0>:	push   %rbp
         0x0000000000d0aa21 <+1>:	mov    %rsp,%rbp
         0x0000000000d0aa24 <+4>:	sub    $0x10,%rsp
         0x0000000000d0aa28 <+8>:	callq  0xd06f48 <code_state>
         0x0000000000d0aa2d <+13>:	mov    %rax,-0x8(%rbp)
         0x0000000000d0aa31 <+17>:	cmpq   $0x0,-0x8(%rbp)
         0x0000000000d0aa36 <+22>:	je     0xd0aa4f <_db_flush_+47>
         0x0000000000d0aa38 <+24>:	mov    -0x8(%rbp),%rax
         0x0000000000d0aa3c <+28>:	mov    0x20(%rax),%rax
         0x0000000000d0aa40 <+32>:	mov    0x10(%rax),%rax
      => 0x0000000000d0aa44 <+36>:	mov    (%rax),%rax
         0x0000000000d0aa47 <+39>:	mov    %rax,%rdi
         0x0000000000d0aa4a <+42>:	callq  0x584280 <fflush@plt>
         0x0000000000d0aa4f <+47>:	leaveq 
         0x0000000000d0aa50 <+48>:	retq   
      End of assembler dump.
      (gdb) info reg
      rax            0x0	0
      

      However, the cs->stack->out_file is not NULL when printed in the
      debugger. Maybe because there is a race between different threads?

      (gdb) p cs->stack->out_file
      $19 = (sFILE *) 0x142a5c0
      (gdb) p *cs->stack->out_file
      $20 = {file = 0x7f98cc051880, used = 1073741823}
      

      The "used" value seems quite suspicious, looks corrupt. Maybe it was free()d?

      It looks to me that there is missing protection in _db_flush() against another
      thread changing cs->stack->out_file, eg FreeState() sets it to NULL... ?

        Gliffy Diagrams

          Attachments

            Activity

            There are no comments yet on this issue.

              People

              • Assignee:
                monty Michael Widenius
                Reporter:
                knielsen Kristian Nielsen
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: