Uploaded image for project: 'MariaDB Server'
  1. MDEV-3802

mysql_get_timeout_value() starts returning (unsigned)-1 in 10.0-monty

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects versions: 10.0.0
    • Fix versions: 10.0.0, 5.5.28
    • Components: None
    • Labels:
      None
    • Sprint:

      Description

      In 10.0-monty, we start seeing failures in main.non_blocking_api.
      The failure is seen on BSD, but the root problem exists on all platforms.

      The issue is that we get the flag MYSQL_WAIT_TIMEOUT back from
      eg. mysql_real_connect_cont(), however mysql_get_timeout_value() returns
      (unsigned)-1. This is incorrect, and a change from existing behaviour.

      The symptom in the test suite is that tests compute a timeout for poll(2) as
      mysql_get_timeout_value()*1000, which ends up as -1000 which is invalid for
      poll(2) on bsd (and incorrect in any case).

      If no timeout is desired, the MYSQL_WAIT_TIMEOUT flag should not be set.

      As far as I can see, the problem is a wrong merge of new VIO stuff in
      10.0/10.0-monty. It breaks the non-blocking client library code in 10.0-base
      rather badly:

      • The timeout values were changed from seconds to milliseconds, but the
        non-blocking part was not updated to reflect this.
      • vio_io_wait() does not seem to handle non-blocking operation at all, so
        will halt any application that uses it.
      • There are probably other problems hidden...

      An easy way to repeat the problem is to run client/async_example against a
      running server with strace:

      $ strace -e trace=poll bld/client/async_example 127.0.0.1 root rootpass > /dev/null
      poll([

      {fd=3, events=POLLOUT}

      ], 1, -1) = 1 ([

      {fd=3, revents=POLLOUT}

      ])
      poll([

      {fd=3, events=POLLIN}

      ], 1, -1000) = 1 ([

      {fd=3, revents=POLLIN}

      ])

      Note the second poll() call passing -1000 as timeout - this is incorrect, and
      is caused by above issue.

      However note that this is not the only problem. All of the new VIO stuff needs
      to be fixed for non-blocking operation.

      It is particularly important that it is 110% ensured that the non-blocking
      client code will never block - this would be a subtle problem that will not be
      easily seen in the test suite, but will cause large applications that use
      non-blocking mode to become slow or fail.

        Attachments

          Activity

            People

            • Assignee:
              knielsen Kristian Nielsen
              Reporter:
              knielsen Kristian Nielsen
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 1 day
                1d