-
Sujatha Sivakumar authored
SYNC_MASTER_INFO Problem: ======== With master_info_repository=TABLE and variable sync_master_info=0, we should only see table mysql.slave_master_info updated on log file rotation. That is the documented and intended design, and can significantly reduce load on slaves, allowing much greater replication throughput. The code handling [already-executed] gtid event skips uses a dummy heartbeat (treated as a fake rotate event) to signal the end of a skip period. The slave handling code, which takes that event and forces a master info sync on every heartbeat - even these dummy ones. This can, even in a somewhat simple setup, lead to a ~5X increase in write load on a slave. Huge amount of wasted resources, very limiting to replication throughput. Analysis: ========= Receiver thread upon writing events into relay log, updates its progress in Master_info object. i.e It updates the 'master_log_file' and 'master_log_pos' of 'mi' object as per the last event that is written. For all events other than skipped/ignored events their positions are flushed using the following call with variable force= false. flush_master_info(mi, false /*force*/, lock_count == 0 /*need_lock*/, false /*flush_relay_log*/) 'force=true' means do not respect sync period and flush information. When 'force' is false, flush will only happen if it is time to flush. At present a forced flush is done in following cases. Receiver thread is coming to halt. During change master command execution. This will ensure that upon restart, we start fetching from where we stopped. But doing forced flush for skipped/ignored events results in performance degradation. Skipped and Ignored events are listed below. HEARTBEAT_LOG_EVENT: During GTID protocol, if the master skips transactions, a heartbeat event is sent to the slave at the end of last skipped transaction to update coordinates. I/O thread receives the heartbeat event and updates mi only if the received heartbeat position is greater than mi->get_master_log_pos(). This event is written to the relay log as an ignored Rotate event. SQL thread reads the rotate event only to update the coordinates corresponding to the last skipped transaction. PREVIOUS_GTIDS_LOG_EVENT: This event is sent by master to show the slave that it is making progress. It contains the rotated master binary log specific information. These coordinates are written as ignored events. Events that originated from same server: These are the events received through circular replication. For each skipped event its position is tracked in rli->ign_master_log_name_end and rli->ign_master_log_pos_end. Upon receiver thread stop these positions are flushed to 'slave_master_info' table. These ignored events are special because the applier doesn't have to apply them. It has to just advance its position. Otherwise ignored events are similar to other events. Forced flush can be avoided for HEARTBEAT events and events received through circular replication. Force flush can be done only for the PREVIOUS_GTID_LOG_EVENT which has rotated master log file name and positions. Fix: === Avoid force flush to 'slave_master_info' on HEARTBEAT and IGNORED events received through circular replication. RB: 21432
Sujatha Sivakumar authoredSYNC_MASTER_INFO Problem: ======== With master_info_repository=TABLE and variable sync_master_info=0, we should only see table mysql.slave_master_info updated on log file rotation. That is the documented and intended design, and can significantly reduce load on slaves, allowing much greater replication throughput. The code handling [already-executed] gtid event skips uses a dummy heartbeat (treated as a fake rotate event) to signal the end of a skip period. The slave handling code, which takes that event and forces a master info sync on every heartbeat - even these dummy ones. This can, even in a somewhat simple setup, lead to a ~5X increase in write load on a slave. Huge amount of wasted resources, very limiting to replication throughput. Analysis: ========= Receiver thread upon writing events into relay log, updates its progress in Master_info object. i.e It updates the 'master_log_file' and 'master_log_pos' of 'mi' object as per the last event that is written. For all events other than skipped/ignored events their positions are flushed using the following call with variable force= false. flush_master_info(mi, false /*force*/, lock_count == 0 /*need_lock*/, false /*flush_relay_log*/) 'force=true' means do not respect sync period and flush information. When 'force' is false, flush will only happen if it is time to flush. At present a forced flush is done in following cases. Receiver thread is coming to halt. During change master command execution. This will ensure that upon restart, we start fetching from where we stopped. But doing forced flush for skipped/ignored events results in performance degradation. Skipped and Ignored events are listed below. HEARTBEAT_LOG_EVENT: During GTID protocol, if the master skips transactions, a heartbeat event is sent to the slave at the end of last skipped transaction to update coordinates. I/O thread receives the heartbeat event and updates mi only if the received heartbeat position is greater than mi->get_master_log_pos(). This event is written to the relay log as an ignored Rotate event. SQL thread reads the rotate event only to update the coordinates corresponding to the last skipped transaction. PREVIOUS_GTIDS_LOG_EVENT: This event is sent by master to show the slave that it is making progress. It contains the rotated master binary log specific information. These coordinates are written as ignored events. Events that originated from same server: These are the events received through circular replication. For each skipped event its position is tracked in rli->ign_master_log_name_end and rli->ign_master_log_pos_end. Upon receiver thread stop these positions are flushed to 'slave_master_info' table. These ignored events are special because the applier doesn't have to apply them. It has to just advance its position. Otherwise ignored events are similar to other events. Forced flush can be avoided for HEARTBEAT events and events received through circular replication. Force flush can be done only for the PREVIOUS_GTID_LOG_EVENT which has rotated master log file name and positions. Fix: === Avoid force flush to 'slave_master_info' on HEARTBEAT and IGNORED events received through circular replication. RB: 21432
Loading