Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4726

Race in mysql-test/suite/rpl/t/rpl_gtid_stop_start.test

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.0.3
    • Fix Version/s: 10.0.9
    • Component/s: None
    • Labels:
      None

      Description

      There's a race in mysql-test/suite/rpl/t/rpl_gtid_stop_start.test as follows:

      The rpl_gtid_stop_start test used SELECT COUNT(*) as a wait_condition when waiting for rows to appear from the replication stream. Since the table it is waiting on is MyISAM, rows appearing there do not guarantee in any way that the GTID information will have been updated by the slave thread (and can't guarantee that). This causes a race in the test which is especially exacerbated by making the gtid_slave_pos table InnoDB, since then the COMMIT that is done as part of recording the GTID is bound to take some tens of milliseconds.

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              jeremycole Jeremy Cole added a comment -

              I attached a patch which changes the wait_condition to wait on the proper GTID to appear instead of waiting on a row count. I believe this to be the only actual way to wait without a race when using MyISAM for gtid_slave_pos; nonetheless this is safe and effective anyway.

              Show
              jeremycole Jeremy Cole added a comment - I attached a patch which changes the wait_condition to wait on the proper GTID to appear instead of waiting on a row count. I believe this to be the only actual way to wait without a race when using MyISAM for gtid_slave_pos; nonetheless this is safe and effective anyway.
              Hide
              jeremycole Jeremy Cole added a comment -

              We've had another failing test this morning which is also due to using SELECT COUNT as a wait_condition. Really, all of the tests in the rpl suite should be reviewed to not use SELECT COUNT as a wait_condition, lest they may all flake eventually.

              Show
              jeremycole Jeremy Cole added a comment - We've had another failing test this morning which is also due to using SELECT COUNT as a wait_condition. Really, all of the tests in the rpl suite should be reviewed to not use SELECT COUNT as a wait_condition, lest they may all flake eventually.
              Hide
              elenst Elena Stepanova added a comment - - edited

              I'm not quite sure Kristian meant it to be a MyISAM table, maybe it just happened this way. I'll check with him before modifying tests.

              Show
              elenst Elena Stepanova added a comment - - edited I'm not quite sure Kristian meant it to be a MyISAM table, maybe it just happened this way. I'll check with him before modifying tests.
              Hide
              elenst Elena Stepanova added a comment -

              Hi Kristian,

              See the above – did you mean the tables to be MyISAM?
              If you did, I can modify the tests as Jeremy suggested. If you didn't, I can add explicit engine=InnoDB in all cases when there is no engine in the DDL.

              Show
              elenst Elena Stepanova added a comment - Hi Kristian, See the above – did you mean the tables to be MyISAM? If you did, I can modify the tests as Jeremy suggested. If you didn't, I can add explicit engine=InnoDB in all cases when there is no engine in the DDL.
              Hide
              knielsen Kristian Nielsen added a comment -

              Thanks Jeremy for the nice analysis!

              I do not explicitly need the table to be MyISAM, on the other hand I did try
              in the various test cases to use a mix of MyISAM and InnoDB, to improve
              coverage of different combinations.

              The wait_condition.inc with COUNT was anyway meant as a workaround for not
              yet having proper MASTER_GTID_WAIT(), so would be good to fix.

              The best solution is probably to implement include/save_master_gtid_pos.inc
              and include/sync_with_master_gtid.inc, doing basically what Jeremy
              suggests. And then rewrite include/sync_with_master_gtid.inc when
              MASTER_GTID_WAIT() is implemented to use it. If noone beats me to it I can
              look into that after the parallel replication / MDEV-4506 stuff (which
              currently has priority).

              Show
              knielsen Kristian Nielsen added a comment - Thanks Jeremy for the nice analysis! I do not explicitly need the table to be MyISAM, on the other hand I did try in the various test cases to use a mix of MyISAM and InnoDB, to improve coverage of different combinations. The wait_condition.inc with COUNT was anyway meant as a workaround for not yet having proper MASTER_GTID_WAIT(), so would be good to fix. The best solution is probably to implement include/save_master_gtid_pos.inc and include/sync_with_master_gtid.inc, doing basically what Jeremy suggests. And then rewrite include/sync_with_master_gtid.inc when MASTER_GTID_WAIT() is implemented to use it. If noone beats me to it I can look into that after the parallel replication / MDEV-4506 stuff (which currently has priority).
              Hide
              knielsen Kristian Nielsen added a comment -

              Should be fixed now. The synchronisation is now made with MASTER_GTID_WAIT() in the places where SELECT COUNT=N was used before. Let me know if I missed some places.

              Show
              knielsen Kristian Nielsen added a comment - Should be fixed now. The synchronisation is now made with MASTER_GTID_WAIT() in the places where SELECT COUNT =N was used before. Let me know if I missed some places.

                People

                • Assignee:
                  knielsen Kristian Nielsen
                  Reporter:
                  jeremycole Jeremy Cole
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  4 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: