-
Venkatesh Venugopal authored
ALL THE RIGHT PARAMETERS Problem ------- When a multi threaded slave (with GTID + Auto Position) crashes at a state where gaps are present in MTS execution, during restart, crash recovey of relay logs will fail if the relay logs that are required to fill the gaps are missing. Analysis -------- When GTID is enabled, MTS need not care about gaps. If the auto position protocol is enabled, in the initial handshake, the slave sends a GTID set containing the transactions that it has already committed. The master responds by sending all transactions recorded in its binary log whose GTID is not included in the GTID set sent by the slave. This exchange ensures that the master only sends the transactions with a GTID that the slave has not already received or committed. So, when auto_position is enabled, restarting the server after an OS crash should make the recovery successful. But, due to this bug, server was still trying to look at the relay logs to fill the gaps and was failing to read the partial events, as the un-synced relay log events were lost during the OS crash. Fix --- During recovery, check if Auto Position is enabled or not. If enabled, set the recovery_parallel_workers to 0 and hint the server to skip MTS recovery, as the autoposition protocol will take care of filling the gaps. This patch makes recovery of multi threaded slave always succeed when GTID Auto Position regardless of the value of --relay-log-recovery and repository type. RB:22329
Venkatesh Venugopal authoredALL THE RIGHT PARAMETERS Problem ------- When a multi threaded slave (with GTID + Auto Position) crashes at a state where gaps are present in MTS execution, during restart, crash recovey of relay logs will fail if the relay logs that are required to fill the gaps are missing. Analysis -------- When GTID is enabled, MTS need not care about gaps. If the auto position protocol is enabled, in the initial handshake, the slave sends a GTID set containing the transactions that it has already committed. The master responds by sending all transactions recorded in its binary log whose GTID is not included in the GTID set sent by the slave. This exchange ensures that the master only sends the transactions with a GTID that the slave has not already received or committed. So, when auto_position is enabled, restarting the server after an OS crash should make the recovery successful. But, due to this bug, server was still trying to look at the relay logs to fill the gaps and was failing to read the partial events, as the un-synced relay log events were lost during the OS crash. Fix --- During recovery, check if Auto Position is enabled or not. If enabled, set the recovery_parallel_workers to 0 and hint the server to skip MTS recovery, as the autoposition protocol will take care of filling the gaps. This patch makes recovery of multi threaded slave always succeed when GTID Auto Position regardless of the value of --relay-log-recovery and repository type. RB:22329
Loading