Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7189

main.processlist fails sporadically in buildbot

    Details

      Description

      http://buildbot.askmonty.org/buildbot/builders/win32-debug2/builds/6568/steps/test/logs/stdio

      main.processlist                         [ fail ]
              Test ended at 2014-11-24 05:14:33
      
      CURRENT_TEST: main.processlist
      --- C:/bzr/bb-win32/win32-debug2/build/mysql-test/r/processlist.result	2014-10-29 01:18:28.187152000 -0700
      +++ C:\bzr\bb-win32\win32-debug2\build\mysql-test\r\processlist.reject	2014-11-24 05:14:30.816286800 -0800
      @@ -12,5 +12,5 @@
       0
       select command, time < 5 from information_schema.processlist where id != connection_id();
       command	time < 5
      -Sleep	1
      +Query	0
       set debug_sync='reset';
      
      mysqltest: Result content mismatch
      

      Variation with ps-protocol:
      http://buildbot.askmonty.org/buildbot/builders/kvm-dgcov-jaunty-i386/builds/4530/steps/test_3/logs/stdio

      main.processlist                         w2 [ fail ]
              Test ended at 2014-12-22 01:07:55
      
      CURRENT_TEST: main.processlist
      --- /home/buildbot/build/mysql-test/r/processlist.result	2014-12-21 20:37:16.794532000 +0200
      +++ /home/buildbot/build/mysql-test/r/processlist.reject	2014-12-22 01:07:55.454312832 +0200
      @@ -16,5 +16,5 @@
       SET DEBUG_SYNC = 'now WAIT_FOR query_done';
       select command, time < 5 from information_schema.processlist where id != connection_id();
       command	time < 5
      -Sleep	1
      +Execute	0
       set debug_sync='reset';
      
      mysqltest: Result length mismatch
      

      Another variation:
      http://buildbot.askmonty.org/buildbot/builders/work-amd64-valgrind/builds/6424

      main.processlist                         w1 [ fail ]
              Test ended at 2014-12-02 05:34:58
      
      CURRENT_TEST: main.processlist
      --- /var/lib/buildbot/maria-slave/work-opensuse-amd64/build/mysql-test/r/processlist.result	2014-12-01 23:07:01.000000000 +0200
      +++ /var/lib/buildbot/maria-slave/work-opensuse-amd64/build/mysql-test/r/processlist.reject	2014-12-02 05:34:58.000000000 +0200
      @@ -6,7 +6,6 @@
       1
       SET DEBUG_SYNC = 'now SIGNAL fill_schema_proceed';
       INFO	TIME	TIME_MS
      -NULL	0	0.000
       select sleep(5);
       sleep(5)
       0
      
      mysqltest: Result length mismatch
      

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              elenst Elena Stepanova added a comment -

              Updated affected versions according to the cross-reference report

              Show
              elenst Elena Stepanova added a comment - Updated affected versions according to the cross-reference report
              Hide
              elenst Elena Stepanova added a comment -

              Kristian Nielsen,

              I see you already made some attempts to fix the test, but apparently there are still some flaws.

              The first two variations in the description seem to be the same problem that you were fixing; but the 3rd one is puzzling, looking at the test I can't see how it can possibly happen.

              Show
              elenst Elena Stepanova added a comment - Kristian Nielsen , I see you already made some attempts to fix the test, but apparently there are still some flaws. The first two variations in the description seem to be the same problem that you were fixing; but the 3rd one is puzzling, looking at the test I can't see how it can possibly happen.
              Hide
              knielsen Kristian Nielsen added a comment -

              I think the third one is already fixed, in revid:knielsen@knielsen-hq.org-20141031114817-art3oxugmfgfd6ca

              This fix appeared in 5.5 around 2014-10-31 and in 10.0 around 2014-11-03, and there are no failures of that kind seen in cross-reference after those dates.

              However, I do not understand why it still fails (even in 5.5) with the first two variants.

              Show
              knielsen Kristian Nielsen added a comment - I think the third one is already fixed, in revid:knielsen@knielsen-hq.org-20141031114817-art3oxugmfgfd6ca This fix appeared in 5.5 around 2014-10-31 and in 10.0 around 2014-11-03, and there are no failures of that kind seen in cross-reference after those dates. However, I do not understand why it still fails (even in 5.5) with the first two variants.
              Hide
              knielsen Kristian Nielsen added a comment -

              Ok, I found it.

              The test fails only in --ps-protocol (variation 2). Cross-reference shows some
              failures without --ps-protocol (variation 1), but these are in trees where the
              existing fix in 5.5 was not yet merged at the time of the failure. Same for
              variation 3, also only in trees in which the fix is not yet merged from 5.5.

              The problem is that the test case sets up a DEBUG_SYNC that is meant to
              trigger at the end of the execution of the SELECT SLEEP(5). But in
              --ps-protocol, a query happens in two steps, Prepare followed by Execute. And
              the DEBUG_SYNC ends up triggering at the end of the Prepare step, which is too
              early, allowing a race to cause the test to fail sporadically.

              Show
              knielsen Kristian Nielsen added a comment - Ok, I found it. The test fails only in --ps-protocol (variation 2). Cross-reference shows some failures without --ps-protocol (variation 1), but these are in trees where the existing fix in 5.5 was not yet merged at the time of the failure. Same for variation 3, also only in trees in which the fix is not yet merged from 5.5. The problem is that the test case sets up a DEBUG_SYNC that is meant to trigger at the end of the execution of the SELECT SLEEP(5). But in --ps-protocol, a query happens in two steps, Prepare followed by Execute. And the DEBUG_SYNC ends up triggering at the end of the Prepare step, which is too early, allowing a race to cause the test to fail sporadically.
              Hide
              knielsen Kristian Nielsen added a comment -

              Pushed to 5.5.42:

              http://lists.askmonty.org/pipermail/commits/2015-January/007230.html

              Note that the test failures might be visible in Buildbot for some time yet,
              until the fix becomes merged from 5.5 into each tree.

              Show
              knielsen Kristian Nielsen added a comment - Pushed to 5.5.42: http://lists.askmonty.org/pipermail/commits/2015-January/007230.html Note that the test failures might be visible in Buildbot for some time yet, until the fix becomes merged from 5.5 into each tree.

                People

                • Assignee:
                  knielsen Kristian Nielsen
                  Reporter:
                  elenst Elena Stepanova
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  2 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: