-
Sujatha Sivakumar authored
MYSQL.SLAVE_RELAY_LOG_INFO Problem: ======= RESET SLAVE / RESET SLAVE ALL will not remove errant relay log entries from the mysql.slave_relay_log_info table for group replication channels. Analysis: ========= Existing code for RESET SLAVE / RESET SLAVE ALL command doesn't include group replication specific 'group_replication_applier' and 'group_replication_recovery' channels during RESET SLAVE [ALL] operation. Because of this the group replication specific channels are not affected by RESET SLAVE [ALL] command. Fix: === Implemented code changes to include group replication specific channels to be part of RESET SLAVE [ALL] command. Please note that the RESET SLAVE [ALL] command does the RESET operation only when the group member is OFFLINE. Executing RESET SLAVE [ALL] on an ONLINE group member will result in an error. Bug#20280946: RESTARTING SLAVE SERVER POST 'RESET SLAVE' COMMAND, CLEANS UP SLAVE SETTING Problem: ======== If we restart slave server post 'RESET SLAVE' command, then slave server forgets the slave configuration. We need to run CHM command again to do the replication setup. Analysis: ========= RESET SLAVE followed by a restart of the slave has the same effect as RESET SLAVE ALL command. It will clear recovery channel specific in memory credentials. In case of highly available systems like GR reset slave for cleanup followed by a server crash will prevent the member from rejoining the group as the recovery credentials are now gone. Fix: === During RESET SLAVE command identify the channel which is in initialized state and do the clean up for that channel. Preserve the channel specific connection credentials in crash safe master info repository table. This will ensure that the credentials are always available in spite of restarts or crash. BUG#27636289: RPL BREAKS WITH RESTART AFTER RESET SLAVE IF --RELAY-LOG-PURGE=0 Problem: ======== If slave server is restarted followed by reset master on both master and slave, reset slave on slave server. slave goes to an ERROR state as shown below. Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replica' The slave fails with the above ERROR even though master and slave both has empty GLOBAL.GTID_EXECUTED. Analysis: ======== The relay log purge procedure called by RESET SLAVE is purging all existing relay log files and generating the first one before the received_gtid_set of the channel be cleared. So, the received_gtid_set being used to generate the PREVIOUS_GTIDS of the first relay log file after a RESET SLAVE contains old (garbage) information. When this first relay log file with wrong PREVIOUS_GTIDS is still present after a slave server restart, the slave will assume use its PREVIOUS_GTIDS, leading to ER_MASTER_FATAL_ERROR_READING_BINLOG with the following error message: "Slave has more GTIDs than the master has, using the master's SERVER_UUID." Fix: === During RESET SLAVE firstly cleanup the received_gtid_set and then purge the relay log files.
Sujatha Sivakumar authoredMYSQL.SLAVE_RELAY_LOG_INFO Problem: ======= RESET SLAVE / RESET SLAVE ALL will not remove errant relay log entries from the mysql.slave_relay_log_info table for group replication channels. Analysis: ========= Existing code for RESET SLAVE / RESET SLAVE ALL command doesn't include group replication specific 'group_replication_applier' and 'group_replication_recovery' channels during RESET SLAVE [ALL] operation. Because of this the group replication specific channels are not affected by RESET SLAVE [ALL] command. Fix: === Implemented code changes to include group replication specific channels to be part of RESET SLAVE [ALL] command. Please note that the RESET SLAVE [ALL] command does the RESET operation only when the group member is OFFLINE. Executing RESET SLAVE [ALL] on an ONLINE group member will result in an error. Bug#20280946: RESTARTING SLAVE SERVER POST 'RESET SLAVE' COMMAND, CLEANS UP SLAVE SETTING Problem: ======== If we restart slave server post 'RESET SLAVE' command, then slave server forgets the slave configuration. We need to run CHM command again to do the replication setup. Analysis: ========= RESET SLAVE followed by a restart of the slave has the same effect as RESET SLAVE ALL command. It will clear recovery channel specific in memory credentials. In case of highly available systems like GR reset slave for cleanup followed by a server crash will prevent the member from rejoining the group as the recovery credentials are now gone. Fix: === During RESET SLAVE command identify the channel which is in initialized state and do the clean up for that channel. Preserve the channel specific connection credentials in crash safe master info repository table. This will ensure that the credentials are always available in spite of restarts or crash. BUG#27636289: RPL BREAKS WITH RESTART AFTER RESET SLAVE IF --RELAY-LOG-PURGE=0 Problem: ======== If slave server is restarted followed by reset master on both master and slave, reset slave on slave server. slave goes to an ERROR state as shown below. Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replica' The slave fails with the above ERROR even though master and slave both has empty GLOBAL.GTID_EXECUTED. Analysis: ======== The relay log purge procedure called by RESET SLAVE is purging all existing relay log files and generating the first one before the received_gtid_set of the channel be cleared. So, the received_gtid_set being used to generate the PREVIOUS_GTIDS of the first relay log file after a RESET SLAVE contains old (garbage) information. When this first relay log file with wrong PREVIOUS_GTIDS is still present after a slave server restart, the slave will assume use its PREVIOUS_GTIDS, leading to ER_MASTER_FATAL_ERROR_READING_BINLOG with the following error message: "Slave has more GTIDs than the master has, using the master's SERVER_UUID." Fix: === During RESET SLAVE firstly cleanup the received_gtid_set and then purge the relay log files.
Loading