-
Sven Sandberg authored
Background: The GTID feature (WL#3584) introduces a new replication protocol. In the new protocol, the slave connects to the master with a different initial handshake: the slave sends the master the set of GTIDs that the slave has executed. The master then sends the slave all transactions that are *not* in this set. If the master has executed a transaction that has a GTID that the slave does not have, and this transaction was in a binary log that was purged, then the connection attempt fails and the slave IO thread stops with error 1236. This is expected (continuing at that point could corrupt the slave's database). Moreover, when the slave reconnects to the same master, e.g., after STOP SLAVE;START SLAVE, then the handshake also includes binary log filename and offset, so that (1) the master does not have to scan the binary log for GTIDS (2) the master will gracefully continue at the correct position even if the slave was stopped in the middle of a transaction. Problem: If the slave reconnects to the same master, e.g., by executing STOP SLAVE and then START SLAVE with no CHANGE MASTER in between, and the master has ever purged one or more binary logs, then the connection attempt *always* failed and the slave stopped with error 1236. This happened even if the slave already had executed all transactions in the purged binary log. The reason was that the modified handshake sent the filename and position and an empty GTID set, instead of the correct GTID set. The check that GTID_LOST must be contained in the slave's GTID set then failed as soon as GTID_LOST was nonempty. Solution: Always send the GTID set.
Sven Sandberg authoredBackground: The GTID feature (WL#3584) introduces a new replication protocol. In the new protocol, the slave connects to the master with a different initial handshake: the slave sends the master the set of GTIDs that the slave has executed. The master then sends the slave all transactions that are *not* in this set. If the master has executed a transaction that has a GTID that the slave does not have, and this transaction was in a binary log that was purged, then the connection attempt fails and the slave IO thread stops with error 1236. This is expected (continuing at that point could corrupt the slave's database). Moreover, when the slave reconnects to the same master, e.g., after STOP SLAVE;START SLAVE, then the handshake also includes binary log filename and offset, so that (1) the master does not have to scan the binary log for GTIDS (2) the master will gracefully continue at the correct position even if the slave was stopped in the middle of a transaction. Problem: If the slave reconnects to the same master, e.g., by executing STOP SLAVE and then START SLAVE with no CHANGE MASTER in between, and the master has ever purged one or more binary logs, then the connection attempt *always* failed and the slave stopped with error 1236. This happened even if the slave already had executed all transactions in the purged binary log. The reason was that the modified handshake sent the filename and position and an empty GTID set, instead of the correct GTID set. The check that GTID_LOST must be contained in the slave's GTID set then failed as soon as GTID_LOST was nonempty. Solution: Always send the GTID set.
Loading