Slave loses master binlog filename when master crashes in the middle of writing an event group

Description

If the master crashes in the middle of writing an event group to the binlog,
the slave can receive this partial event group once the master is restarted.

The slave code is written to handle this by recognising the master restart on
the format description event logged at the restart (it has a special flag only
set for the first format description logged after restart). The slave will
rollback the partial transaction and drop all temporary tables.

But there is a bug in the update of master binlog position. The update of the
current master filename normally happens during processing of the Rotate
events. However, due to the crash, no Rotate event occurs at the end of the
master-bin.000001. And the fake Rotate event sent from the master at reconnect
after restart is not processed, because Rotate events in the middle of an
event group are not processed.

The result is that the slave ends up in a state where it has processed events
in master-bin.000002, but the filename part of the replication position
(eg. SHOW SLAVE STATUS) is still master-bin.000001.

I checked MySQL, it seems they avoid this problem during binlog recovery at
master restart after a crash. If they find a partial event group at the end of
the last master binlog, they truncate the file to before that event.

Here is an MTR test case. The test case fails by --sync-with-master not
completing because of the incorrect binlog position. This test case uses
parallel replication, but I tested that the same bug is present for
non-parallel replication.

Environment

None

Assignee

Kristian Nielsen

Reporter

Kristian Nielsen

Labels

Affects versions

Priority

Critical
Configure