Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4992

'DROP DATABASE IF EXISTS' causes replication to break (does not break in MySQL)

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 10.0.4, 5.5.33
    • Fix Version/s: 10.0, 5.5
    • Component/s: None
    • Labels:
    • Environment:
      Ubuntu 5.5.31-MariaDB-1~lucid-log and 5.5.32-MariaDB-1~precise-log

      Description

      We have a script that refreshes our staging DB environment. Our staging environment is a master <-> master with each master having another slave.

      I have been replacing mysql database servers with mariadb servers (5.5) and have run in to this most interesting issue.

      masterdb1 - mysql
      masterdb2 - mariadb
      slavedb1 - mysql (slave of masterdb1)
      slavedd2 - maria (slave of masterdb2)

      This refresh script takes a dump of the production DB and loads into masterdb1 BUT it changes the database name used in the DROP DATABASE, CREATE DATABASE and USE commands so that we will not interrupt the automatic testing until we have the complete database loaded. Then we drop all of the tables in the target DB and rename all of the tables from the freshly loaded database so they are now in the correctly named database. This is only background for the bug.

      The problem is the script issues this command (in case it exited prematurely)

      DROP DATABASE IF EXISTS temp_vstreams;
      

      The first server I changed to mariadb was slavedb2. This worked fine.

      Then I changed masterdb2 (the master for slavedb2). The DORP DB command is issued on masterdb1 and it works (the db does not exist) and it is replicated to masterdb2 where the command is also successful. The command is replicated to slavedb2 where it fails:

      Last_SQL_Error: Query caused different errors on master and slave.     Error on master: message (format)='Cannot load from mysql.%s. The table is probably corrupted' error code=1548 ; Error on slave: actual message='no error', error code=0. Default database: 'temp_vstreams'. Query: 'DROP DATABASE IF EXISTS temp_vstreams'
      

      They are not exactly the same version as the os is newer on masterdb2.

      masterdb2:

       
      Welcome to the MariaDB monitor.  Commands end with ; or \g.
      Your MariaDB connection id is 49
      Server version: 5.5.32-MariaDB-1~precise-log mariadb.org binary distribution
      
      Copyright (c) 2000, 2013, Oracle, Monty Program Ab and others.
      

      slavedb2:

      Welcome to the MariaDB monitor.  Commands end with ; or \g.
      Your MariaDB connection id is 313625
      Server version: 5.5.31-MariaDB-1~lucid-log mariadb.org binary distribution
      
      Copyright (c) 2000, 2013, Oracle, Monty Program Ab and others.
      

      The replication setting appear to be identical.

      This is running in AWS and I could arrange access or run whatever diagnostics that might help you (us) figure this out.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            frankflynn Frank Flynn added a comment -

            Correct - this is just the standard MariaDB not Galera.

            But in preparing the files to send to you I have resolved the issue. I still believe there is a bug which can be fixed (perhaps through a better message). What was happening was DROP DATABASE on the master was not causing any errors (the error log is empty) but it was generating 2 warnings - these broke replication on the slave.

            use fmf_temp;
            Database changed
            drop database if exists fmf_temp;
            Query OK, 1 row affected (0.00 sec)

            Database changed
            Query OK, 0 rows affected, 2 warnings (0.00 sec)

            MariaDB [(none)]> show warnings;
            -----------------------------------------------------------------------

            Level Code Message

            -----------------------------------------------------------------------

            Error 1548 Cannot load from mysql.proc. The table is probably corrupted
            Error 1545 Failed to open mysql.event

            -----------------------------------------------------------------------

            We never saw any warnings because the statement was run on masterdb1 (which had no warnings) then replicated to masterdb2 (where it would generate a warning but never show it to anyone) then replicated to slavedb2 (where it broke replication because the error messages were different).

            running mysql_upgrade --force (needed because it said it was already upgraded to 5.5.32-MariaDB) did fix the issue because now there were no warnings on masterdb2.

            I'm not sure how you'd like to proceed - the issues is resolved for me. The bug is there is a statement that will break replication but it is not being logged anywhere making it difficult to track down.

            Show
            frankflynn Frank Flynn added a comment - Correct - this is just the standard MariaDB not Galera. But in preparing the files to send to you I have resolved the issue. I still believe there is a bug which can be fixed (perhaps through a better message). What was happening was DROP DATABASE on the master was not causing any errors (the error log is empty) but it was generating 2 warnings - these broke replication on the slave. use fmf_temp; Database changed drop database if exists fmf_temp; Query OK, 1 row affected (0.00 sec) Database changed Query OK, 0 rows affected, 2 warnings (0.00 sec) MariaDB [(none)] > show warnings; ------ ---- ------------------------------------------------------------- Level Code Message ------ ---- ------------------------------------------------------------- Error 1548 Cannot load from mysql.proc. The table is probably corrupted Error 1545 Failed to open mysql.event ------ ---- ------------------------------------------------------------- We never saw any warnings because the statement was run on masterdb1 (which had no warnings) then replicated to masterdb2 (where it would generate a warning but never show it to anyone) then replicated to slavedb2 (where it broke replication because the error messages were different). running mysql_upgrade --force (needed because it said it was already upgraded to 5.5.32-MariaDB) did fix the issue because now there were no warnings on masterdb2. I'm not sure how you'd like to proceed - the issues is resolved for me. The bug is there is a statement that will break replication but it is not being logged anywhere making it difficult to track down.
            Hide
            elenst Elena Stepanova added a comment -

            Hi Frank,

            > use fmf_temp;
            > Database changed
            > drop database if exists fmf_temp;
            > Query OK, 1 row affected (0.00 sec)
            >
            > Database changed
            > Query OK, 0 rows affected, 2 warnings (0.00 sec)

            Could you please comment on this fragment?

            So, first you switched to fmf_temp, the response was "Database changed". That's expected.
            Then, you did "drop database if exists fmf_temp". The response was "Query OK", no warnings. That's good.
            But then, we see two other responses without any queries:

            Database changed
            Query OK, 0 rows affected, 2 warnings (0.00 sec)

            Where do they come from? What were you executing here?

            Thanks.

            Show
            elenst Elena Stepanova added a comment - Hi Frank, > use fmf_temp; > Database changed > drop database if exists fmf_temp; > Query OK, 1 row affected (0.00 sec) > > Database changed > Query OK, 0 rows affected, 2 warnings (0.00 sec) Could you please comment on this fragment? So, first you switched to fmf_temp, the response was "Database changed". That's expected. Then, you did "drop database if exists fmf_temp". The response was "Query OK", no warnings. That's good. But then, we see two other responses without any queries: Database changed Query OK, 0 rows affected, 2 warnings (0.00 sec) Where do they come from? What were you executing here? Thanks.
            Hide
            frankflynn Frank Flynn added a comment -

            Not exactly - since this is a script I ran three commands on the same line and got the results for all commands in one block - I should have written it like this:

            MariaDB [(none)]>create database fmf_temp;
            Query OK, 1 row affected (0.00 sec)

            MariaDB [(none)]>use fmf_temp;
            Database changed

            MariaDB [(fmf_temp)]> drop database if exists fmf_temp;
            Query OK, 0 rows affected, 2 warnings (0.00 sec)

            MariaDB [(none)]> show warnings;
            -----------------------------------------------------------------------

            Level Code Message

            -----------------------------------------------------------------------

            Error 1548 Cannot load from mysql.proc. The table is probably corrupted
            Error 1545 Failed to open mysql.event

            -----------------------------------------------------------------------

            The issues being that the masterdb2 had not have mysqlupgrade run on it (at least running mysqlupgrade on masterdb2 solved the issue) and the bigger issue was that since masterdb1 ran this with no warnings and replicated to to masterdb2 where the warnings did happen shouldn't masterdb2 have stopped replication saying the commands produced different errors? Instead masterdb2 ran these commands and produced warnings (which no one saw because it was the replication thread - they were not written to the error log either) then slavedb2 got the command and ran it with no warnings and broke because it's errors / warnings were different (none) than masterdb2.

            Trivial for me to fix - difficult to find.

            Show
            frankflynn Frank Flynn added a comment - Not exactly - since this is a script I ran three commands on the same line and got the results for all commands in one block - I should have written it like this: MariaDB [(none)] >create database fmf_temp; Query OK, 1 row affected (0.00 sec) MariaDB [(none)] >use fmf_temp; Database changed MariaDB [(fmf_temp)] > drop database if exists fmf_temp; Query OK, 0 rows affected, 2 warnings (0.00 sec) MariaDB [(none)] > show warnings; ------ ---- ------------------------------------------------------------- Level Code Message ------ ---- ------------------------------------------------------------- Error 1548 Cannot load from mysql.proc. The table is probably corrupted Error 1545 Failed to open mysql.event ------ ---- ------------------------------------------------------------- The issues being that the masterdb2 had not have mysqlupgrade run on it (at least running mysqlupgrade on masterdb2 solved the issue) and the bigger issue was that since masterdb1 ran this with no warnings and replicated to to masterdb2 where the warnings did happen shouldn't masterdb2 have stopped replication saying the commands produced different errors? Instead masterdb2 ran these commands and produced warnings (which no one saw because it was the replication thread - they were not written to the error log either) then slavedb2 got the command and ran it with no warnings and broke because it's errors / warnings were different (none) than masterdb2. Trivial for me to fix - difficult to find.
            Hide
            elenst Elena Stepanova added a comment -

            Thanks Frank,
            It's much clearer now.

            I'd say there's definitely a bug here. The bug is in the confusion between errors and warnings of level "error".

            When the server executes the DROP, it considers it successful, but generates warnings.
            However, when the master writes the event into the binary log, it adds the error code of one of the warnings (I guess because of the level "error").
            The slave, in turn, also considers DROP successful, although it produces warnings as well. But it compares the non-zero error code which it reads from the binary log only with its own zero error code; it does not take into account the warnings it just generated.

            Thus, replication failure is caused by the asymmetry between master and slave behavior. Either the master shouldn't write the error code into the binary log when it's actually a code of a warning, or the slave should compare the master error code from the binary log not only to its own query error code, but also to warning codes.

            It works exactly the same way on MySQL (checked 5.5, 5.6, 5.7). You might want to file a bug report at bugs.mysql.com as well.

            Show
            elenst Elena Stepanova added a comment - Thanks Frank, It's much clearer now. I'd say there's definitely a bug here. The bug is in the confusion between errors and warnings of level "error". When the server executes the DROP, it considers it successful, but generates warnings. However, when the master writes the event into the binary log, it adds the error code of one of the warnings (I guess because of the level "error"). The slave, in turn, also considers DROP successful, although it produces warnings as well. But it compares the non-zero error code which it reads from the binary log only with its own zero error code; it does not take into account the warnings it just generated. Thus, replication failure is caused by the asymmetry between master and slave behavior. Either the master shouldn't write the error code into the binary log when it's actually a code of a warning, or the slave should compare the master error code from the binary log not only to its own query error code, but also to warning codes. It works exactly the same way on MySQL (checked 5.5, 5.6, 5.7). You might want to file a bug report at bugs.mysql.com as well.
            Hide
            elenst Elena Stepanova added a comment -

            Here is also serg's quick analysis:

            <serg> with DROP there's an error when trying to drop SPs
            <serg> and later the error flag is cleared, because the DB and tables in it were, in fact, dropped. So the DROP operation itself was successful.
            <serg> it's not push_warning(), but my_error() - real error generating function
            <serg> and later thd->clear_error()
            <serg> the bug, I think, is that
            <serg> this code calls thd->clear_error() but only after writing a query to binlog
            <serg> which, indeed, looks plain wrong

            Show
            elenst Elena Stepanova added a comment - Here is also serg's quick analysis: <serg> with DROP there's an error when trying to drop SPs <serg> and later the error flag is cleared, because the DB and tables in it were, in fact, dropped. So the DROP operation itself was successful. <serg> it's not push_warning(), but my_error() - real error generating function <serg> and later thd->clear_error() <serg> the bug, I think, is that <serg> this code calls thd->clear_error() but only after writing a query to binlog <serg> which, indeed, looks plain wrong

              People

              • Assignee:
                Unassigned
                Reporter:
                frankflynn Frank Flynn
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: