Skip to content
  • Sven Sandberg's avatar
    79cdbb06
    BUG#14756691: CANNOT RECONNECT WITH GTID PROTOCOL IF MASTER PURGED BINARY LOG · 79cdbb06
    Sven Sandberg authored
    Background: The GTID feature (WL#3584) introduces a new replication protocol.
    In the new protocol, the slave connects to the master with a different
    initial handshake: the slave sends the master the set of GTIDs that the slave
    has executed. The master then sends the slave all transactions that are *not*
    in this set. If the master has executed a transaction that has a GTID that
    the slave does not have, and this transaction was in a binary log that was
    purged, then the connection attempt fails and the slave IO thread stops with
    error 1236. This is expected (continuing at that point could corrupt the
    slave's database). Moreover, when the slave reconnects to the same master,
    e.g., after STOP SLAVE;START SLAVE, then the handshake also includes binary
    log filename and offset, so that (1) the master does not have to scan the
    binary log for GTIDS (2) the master will gracefully continue at the correct
    position even if the slave was stopped in the middle of a transaction.
    
    Problem: If the slave reconnects to the same master, e.g., by executing STOP
    SLAVE and then START SLAVE with no CHANGE MASTER in between, and the master
    has ever purged one or more binary logs, then the connection attempt *always*
    failed and the slave stopped with error 1236. This happened even if the slave
    already had executed all transactions in the purged binary log. The reason was
    that the modified handshake sent the filename and position and an empty GTID
    set, instead of the correct GTID set. The check that GTID_LOST must be
    contained in the slave's GTID set then failed as soon as GTID_LOST was
    nonempty.
    
    Solution: Always send the GTID set.
    79cdbb06
    BUG#14756691: CANNOT RECONNECT WITH GTID PROTOCOL IF MASTER PURGED BINARY LOG
    Sven Sandberg authored
    Background: The GTID feature (WL#3584) introduces a new replication protocol.
    In the new protocol, the slave connects to the master with a different
    initial handshake: the slave sends the master the set of GTIDs that the slave
    has executed. The master then sends the slave all transactions that are *not*
    in this set. If the master has executed a transaction that has a GTID that
    the slave does not have, and this transaction was in a binary log that was
    purged, then the connection attempt fails and the slave IO thread stops with
    error 1236. This is expected (continuing at that point could corrupt the
    slave's database). Moreover, when the slave reconnects to the same master,
    e.g., after STOP SLAVE;START SLAVE, then the handshake also includes binary
    log filename and offset, so that (1) the master does not have to scan the
    binary log for GTIDS (2) the master will gracefully continue at the correct
    position even if the slave was stopped in the middle of a transaction.
    
    Problem: If the slave reconnects to the same master, e.g., by executing STOP
    SLAVE and then START SLAVE with no CHANGE MASTER in between, and the master
    has ever purged one or more binary logs, then the connection attempt *always*
    failed and the slave stopped with error 1236. This happened even if the slave
    already had executed all transactions in the purged binary log. The reason was
    that the modified handshake sent the filename and position and an empty GTID
    set, instead of the correct GTID set. The check that GTID_LOST must be
    contained in the slave's GTID set then failed as soon as GTID_LOST was
    nonempty.
    
    Solution: Always send the GTID set.
Loading