Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-587

LP:713561 - "Duplicate entry" error and time datatype cause slave provisioning to fail

    Details

    • Type: Bug
    • Status: Closed
    • Resolution: Incomplete
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      When executing a RQG test that does non-concurrent INSERT into various tables and then uses mysqldump to clone a new slave, the new slave diverges from the master. The slave starts properly and does apply all binlog events from the master, however when the master and the slave are dumped and diffed, they are no longer identical.

      Some observations:

      • It appears that the situation only happens when duplicate key errors are seen on the master due to the randomness of the workload.
      • The duplicate key error is reported against the table that has a non-auto-increment PK, however the diff reports that the table that contains no PK at all is the one that has diverged
      • In some instances, the test reports that the slave thread has failed with an error "Slave SQL: Error 'Table 'test.table1_innodb_int_autoinc' doesn't exist' on opening tables, Error_code: 1146", however the table does exist on the slave
      • Maybe the issue is related to the different rules regarding InnoDB rollback depending on the type of error – statement v.s. transaction rollback

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            philipstoev Philip Stoev added a comment -

            Re: "Duplicate entry" error causes slave provisioning using binlog_snapshot_position to fail
            This issue is not specific to using binlog_snapshot_position, it is observed when using old-style mysqldump as well. It affects both MariaDB and MySQL, and requires the use of the time datatype. Maybe it is caused by different time zones on the master and on the slave, however MTR operates in GMT so it is not easy to provision a slave having the exact same timezone configuration.

            Show
            philipstoev Philip Stoev added a comment - Re: "Duplicate entry" error causes slave provisioning using binlog_snapshot_position to fail This issue is not specific to using binlog_snapshot_position, it is observed when using old-style mysqldump as well. It affects both MariaDB and MySQL, and requires the use of the time datatype. Maybe it is caused by different time zones on the master and on the slave, however MTR operates in GMT so it is not easy to provision a slave having the exact same timezone configuration.
            Hide
            ratzpo Rasmus Johansson added a comment -

            Launchpad bug id: 713561

            Show
            ratzpo Rasmus Johansson added a comment - Launchpad bug id: 713561
            Hide
            elenst Elena Stepanova added a comment -

            While there is not enough information to fully confirm it, the bug MDEV-4255 provides a good story which might explain discrepancy of the data reported here.

            The origin of the problem in MDEV-4255 is that when the dump is restored on slave, the rows are written into tables in a different order comparing to master.
            The same is done in CloneSlave reporter, which was probably used for tests here.

            There are some cases when the order of rows becomes important for further replication, even on 5.5 and even with MBR which is supposed to be reasonably safe. One of such cases is described in MDEV-4255, there were other, more subtle ones, observed during replication tests.
            If the issue reported here was observed on 5.1, or if SBR was used, it could be even simpler since there are many unsafe statements which might make the data diverge.

            In regard to the note that the problem was specific to time types, I can only guess that it was caused by the nature of RQG flow. Date/time fields most often use wrong literals (like datetime_field = 'a'), so they end up with zero values ('0000-00-00 00:00:00'), and hence duplicate key errors. I also initially observed the problem described in MDEV-4255 on datetime values, although it turned out to be unrelated to the type as such.

            Possibly there were other reasons as well, and the theory above doesn't explain "table doesn't exist" errors mentioned here, but it's impossible to analyze further based on the description alone.

            Show
            elenst Elena Stepanova added a comment - While there is not enough information to fully confirm it, the bug MDEV-4255 provides a good story which might explain discrepancy of the data reported here. The origin of the problem in MDEV-4255 is that when the dump is restored on slave, the rows are written into tables in a different order comparing to master. The same is done in CloneSlave reporter, which was probably used for tests here. There are some cases when the order of rows becomes important for further replication, even on 5.5 and even with MBR which is supposed to be reasonably safe. One of such cases is described in MDEV-4255 , there were other, more subtle ones, observed during replication tests. If the issue reported here was observed on 5.1, or if SBR was used, it could be even simpler since there are many unsafe statements which might make the data diverge. In regard to the note that the problem was specific to time types, I can only guess that it was caused by the nature of RQG flow. Date/time fields most often use wrong literals (like datetime_field = 'a'), so they end up with zero values ('0000-00-00 00:00:00'), and hence duplicate key errors. I also initially observed the problem described in MDEV-4255 on datetime values, although it turned out to be unrelated to the type as such. Possibly there were other reasons as well, and the theory above doesn't explain "table doesn't exist" errors mentioned here, but it's impossible to analyze further based on the description alone.

              People

              • Assignee:
                Unassigned
                Reporter:
                philipstoev Philip Stoev
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: