Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-605

LP:865108 - Could not execute Delete_rows event

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Since a few weeks occasionally replication from a 5.2.8 to a 5.2.8 install fails where it was running fine before with no changes in config according to our puppet files. The only change I see on that server is an upgrade from 5.2.7, but I'm not 100% sure that is when it started happening. the repl stops with the full following:

      MariaDB [(none)]> show slave status\G
      *************************** 1. row ***************************
                     Slave_IO_State: Waiting for master to send event
                        Master_Host: 192.168.1.203
                        Master_User: repl
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mariadb-bin.001802
                Read_Master_Log_Pos: 87563717
                     Relay_Log_File: relay-bin.000705
                      Relay_Log_Pos: 259550619
              Relay_Master_Log_File: mariadb-bin.001797
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: No
                    Replicate_Do_DB: 
                Replicate_Ignore_DB: 
                 Replicate_Do_Table: 
             Replicate_Ignore_Table: 
            Replicate_Wild_Do_Table: 
        Replicate_Wild_Ignore_Table: 
                         Last_Errno: 1032
                         Last_Error: Could not execute Delete_rows event on table zabbix.history_uint; Can't find record in 'history_uint', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mariadb-bin.001797, end_log_pos 259552671
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 259550472
                    Relay_Log_Space: 1353580511
                    Until_Condition: None
                     Until_Log_File: 
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File: 
                 Master_SSL_CA_Path: 
                    Master_SSL_Cert: 
                  Master_SSL_Cipher: 
                     Master_SSL_Key: 
              Seconds_Behind_Master: NULL
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error: 
                     Last_SQL_Errno: 1032
                     Last_SQL_Error: Could not execute Delete_rows event on table zabbix.history_uint; Can't find record in 'history_uint', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mariadb-bin.001797, end_log_pos 259552671
      1 row in set (0.00 sec)
      

      I presume you boys need more info, so feel free to ask

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            walterheck Walter Heck added a comment -

            Re: Could not execute Delete_rows event
            after reporting this bug, I was told in IRC that it was probably just data drift. I couldn't dispute that at that time, so I left it. Now I find almost exactly the same problem, except chances are microscopic that it's data drift this time. I cloned a slave by stopping the original slave, rsyncing the datadir and binary logs over and starting it in the new location with the same my.cnf. Within hours the new slave stopped with the following error, while the original machien has been humming along for months.

            Could not execute Update_rows event on table yomamma.albums; Can't find record in 'albums', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.002006, end_log_pos 680669835

            That seems too much of a coincidence to be data drift, right?

            Show
            walterheck Walter Heck added a comment - Re: Could not execute Delete_rows event after reporting this bug, I was told in IRC that it was probably just data drift. I couldn't dispute that at that time, so I left it. Now I find almost exactly the same problem, except chances are microscopic that it's data drift this time. I cloned a slave by stopping the original slave, rsyncing the datadir and binary logs over and starting it in the new location with the same my.cnf. Within hours the new slave stopped with the following error, while the original machien has been humming along for months. Could not execute Update_rows event on table yomamma.albums; Can't find record in 'albums', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.002006, end_log_pos 680669835 That seems too much of a coincidence to be data drift, right?
            Hide
            knielsen Kristian Nielsen added a comment -

            Re: Could not execute Delete_rows event
            Well, is it data drift or not?
            You need to compare the table between the master and the slave to check.
            Or at least check if the row that the replication is complaining about is indeed missing on the slave.

            If the row is missing, the problem seems to be data drift, and the bug happened earlier, it is necessary to track down which event was replicated incorrectly to cause this.

            If the row is missing, it seems to be a problem with the replication of a specific event, ideally we need the relevant binlog and table data to reproduce.

            Show
            knielsen Kristian Nielsen added a comment - Re: Could not execute Delete_rows event Well, is it data drift or not? You need to compare the table between the master and the slave to check. Or at least check if the row that the replication is complaining about is indeed missing on the slave. If the row is missing, the problem seems to be data drift, and the bug happened earlier, it is necessary to track down which event was replicated incorrectly to cause this. If the row is missing, it seems to be a problem with the replication of a specific event, ideally we need the relevant binlog and table data to reproduce.
            Hide
            ratzpo Rasmus Johansson added a comment -

            Launchpad bug id: 865108

            Show
            ratzpo Rasmus Johansson added a comment - Launchpad bug id: 865108
            Hide
            elenst Elena Stepanova added a comment -

            There has been no response in the LP bug report, and we don't have any information to analyze here, so closing it as incomplete.

            Show
            elenst Elena Stepanova added a comment - There has been no response in the LP bug report, and we don't have any information to analyze here, so closing it as incomplete.
            Hide
            j j added a comment - - edited

            Hello,

            I just experienced this issue with MariaDB 5.5.40. One node fell over with the same error, but the other kept on chugging along. What information is needed? I have the mysqlbinlog output, the exact query, and log crash information. I can also guarantee that this is not resolved.

            Show
            j j added a comment - - edited Hello, I just experienced this issue with MariaDB 5.5.40. One node fell over with the same error, but the other kept on chugging along. What information is needed? I have the mysqlbinlog output, the exact query, and log crash information. I can also guarantee that this is not resolved.
            Hide
            elenst Elena Stepanova added a comment -

            Hi,

            First of all, what do you mean by node? Are you running a Galera cluster or traditional replication? Please describe your replication topology.
            Investigation paths can be quite different depending on the answer, so I won't start asking next questions until we know which path to choose.

            Show
            elenst Elena Stepanova added a comment - Hi, First of all, what do you mean by node? Are you running a Galera cluster or traditional replication? Please describe your replication topology. Investigation paths can be quite different depending on the answer, so I won't start asking next questions until we know which path to choose.
            Hide
            j j added a comment -

            The environment uses Galera WAN replication through an IPsec VPN with 100ms latency. We have have three nodes, one in each datacenter:

            • MariaDB 5.5.40 in datacenter A.
            • MariaDB 5.5.40 in datacenter B.
            • Galera Arbitrator 25.3.5.rXXXX in datacenter C.
            Show
            j j added a comment - The environment uses Galera WAN replication through an IPsec VPN with 100ms latency. We have have three nodes, one in each datacenter: MariaDB 5.5.40 in datacenter A. MariaDB 5.5.40 in datacenter B. Galera Arbitrator 25.3.5.rXXXX in datacenter C.
            Hide
            elenst Elena Stepanova added a comment -

            Thanks.

            Then, if you don't mind, please create a separate bug report. While the error looks the same (it's generic by nature), it has nothing to do with the reasons of the original report, whatever they were. Galera works quite differently from the traditional replication, e.g. the main suspect "data drift" that was mentioned in earlier comments to this report can hardly apply to your case.

            In the new report, please provide the exact error that you got, preferably a wide quote from the error log and the structure of the table on which it happened. Then the assignee of the report will ask additional questions.

            Show
            elenst Elena Stepanova added a comment - Thanks. Then, if you don't mind, please create a separate bug report. While the error looks the same (it's generic by nature), it has nothing to do with the reasons of the original report, whatever they were. Galera works quite differently from the traditional replication, e.g. the main suspect "data drift" that was mentioned in earlier comments to this report can hardly apply to your case. In the new report, please provide the exact error that you got, preferably a wide quote from the error log and the structure of the table on which it happened. Then the assignee of the report will ask additional questions.

              People

              • Assignee:
                Unassigned
                Reporter:
                walterheck Walter Heck
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: