We're updating the issue view to help you get more done. 

[PATCH] Slave disconnects and fails to reconnect on Error_code: 1159

Description

While replicating, slave server randomly prints this error and disconnects from master:

[ERROR] Slave I/O: The slave I/O thread stops because a fatal error is encountered when it try to get the value of SERVER_ID variable from master. Error: , Error_code: 1159
[Note] Slave I/O thread exiting, read up to log 'mysql-bin.xxxxxx', position xxxxxx

Where error code 1159 is in fact ER_NET_READ_INTERRUPTED: Got timeout reading communication packets

Executing STOP SLAVE; START SLAVE; on the slave server resumes the replication without any problem. The slave server should reconnect automatically though, which doesn't happen.

I believe the issue is in mariadb-sources/sql/slave.cc

There is a function called is_network_error(), which checks if the given error is network related. It's missing a check for ER_NET_READ_INTERRUPTED. Patch is very trivial:

1 2 3 4 5 6 7 8 9 --- sql/slave.cc<----->2013-07-17 09:51:31.000000000 -0500 +++ sql/slave.cc<-->2014-02-19 02:06:55.591593796 -0600 @@ -1215,6 +1215,7 @@ bool is_network_error(uint errorno) errorno == ER_CON_COUNT_ERROR || errorno == ER_CONNECTION_KILLED || errorno == ER_NEW_ABORTING_CONNECTION || + errorno == ER_NET_READ_INTERRUPTED || errorno == ER_SERVER_SHUTDOWN) return TRUE;

Then mariadb will know that it was network related error and will try to reconnect automatically.

Environment

Linux (slackware)

Status

Assignee

Kristian Nielsen

Reporter

Tomas Matejicek

Labels

Fix versions

Affects versions

5.5.35

Priority

Major