Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.0.11, 10.1.0
    • Fix Version/s: 10.0.18
    • Labels:
    • Environment:
      CentOS 6, GTID based replication, 1 master 1 slave

      Description

      I've been updating one of my applications from OQGraphv2 to v3 and while initial tests showed everything working, I'm now getting troublesome crashes when I put some load on the OQGraph table.

      The application is basically a webservice (in PHP) that queries the OQGraph table (schema attached) and returns the result. The query in itself should be fairly simple:

      SELECT db.* FROM db_history AS db INNER JOIN version_history AS v ON db.nodeID = v.linkid WHERE origid = 1 AND destid = 3 AND latch = 'dijkstras';

      When doing single sequential requests in the browser, everything works fine. But once I started to put some load on the webservice (using siege as load testing tool), MariaDB crashes quickly after 1-2 requests (crash dump attached), always with the same crash dump.

      I should note that the database setup is a master-slave replication setup. I'm not sure that has anything to do with the crashes I'm seeing though.

      I can reproduce this fairly easily and reliably in my test environment, on both the slave and the master node, but have not been successful so far in producing a test case that does not involve running siege.

        Gliffy Diagrams

          Attachments

          1. backtrace_2.log
            6 kB
          2. backtrace.log
            8 kB
          3. carbon.log
            3 kB
          4. crash.log
            4 kB
          5. data.sql
            2 kB
          6. debug.log
            410 kB
          7. explain.log
            3 kB
          8. oqgraph_crash_without_latch.log
            863 kB
          9. oqgraph_load.core.tar.xz
            1.12 MB
          10. schema.sql
            0.8 kB
          11. show_variables.log
            385 kB
          12. threaddump_2.log
            61 kB
          13. threaddump.log
            88 kB
          14. valgrind.log
            10 kB
          15. variables
            69 kB

            Issue Links

              Activity

              Hide
              serg Sergei Golubchik added a comment -

              Yes. If you plan to fix the bug (or do some change, whatever) in both 10.0 anf 10.1, then you only need to do it and provide a pull request for 10.0 — I'll take care of propagating it to 10.1. Of course, if you want something to be done in 10.1 only, then you just create a pull request for 10.1.

              Show
              serg Sergei Golubchik added a comment - Yes. If you plan to fix the bug (or do some change, whatever) in both 10.0 anf 10.1, then you only need to do it and provide a pull request for 10.0 — I'll take care of propagating it to 10.1. Of course, if you want something to be done in 10.1 only, then you just create a pull request for 10.1.
              Show
              andymc73 Andrew McDonnell added a comment - https://github.com/MariaDB/server/pull/17
              Hide
              andymc73 Andrew McDonnell added a comment -

              I did a forward merge test to 10.1, there are a couple of instances where there is a merge conflict that needs to be resolved C,B or similar

              I pushed a branch that has the correct result if you need something to compare against

              https://github.com/pastcompute/server/tree/10.1-oqgraph-6282-6345-6784-cherrypick-test

              Show
              andymc73 Andrew McDonnell added a comment - I did a forward merge test to 10.1, there are a couple of instances where there is a merge conflict that needs to be resolved C,B or similar I pushed a branch that has the correct result if you need something to compare against https://github.com/pastcompute/server/tree/10.1-oqgraph-6282-6345-6784-cherrypick-test
              Hide
              pprkut Heinz Wiesinger added a comment -

              I checked pull/16 and pull/17. Both fix the problem and result in a crash-free load test here

              Thank you very much!

              Show
              pprkut Heinz Wiesinger added a comment - I checked pull/16 and pull/17. Both fix the problem and result in a crash-free load test here Thank you very much!
              Hide
              andymc73 Andrew McDonnell added a comment -

              No worries!

              BTW pull #16 has been cancelled although it is a good test -

              My understanding is that #17 will get merged into 10.0 branch first, then Sergei will at some point merge 10.0 -> 10.1

              Show
              andymc73 Andrew McDonnell added a comment - No worries! BTW pull #16 has been cancelled although it is a good test - My understanding is that #17 will get merged into 10.0 branch first, then Sergei will at some point merge 10.0 -> 10.1

                People

                • Assignee:
                  serg Sergei Golubchik
                  Reporter:
                  pprkut Heinz Wiesinger
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  5 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Time Tracking

                    Estimated:
                    Original Estimate - Not Specified
                    Not Specified
                    Remaining:
                    Remaining Estimate - 0 minutes
                    0m
                    Logged:
                    Time Spent - 20 minutes
                    20m