Skip to content
  • Sujatha Sivakumar's avatar
    f76cd6de
    Bug#28815555: HEARTBEATS/FAKEROTATE CAUSE A FORCED · f76cd6de
    Sujatha Sivakumar authored
    SYNC_MASTER_INFO
    
    Problem:
    ========
    With master_info_repository=TABLE and variable
    sync_master_info=0, we should only see table
    mysql.slave_master_info updated on log file rotation. That
    is the documented and intended design, and can significantly
    reduce load on slaves, allowing much greater replication
    throughput.
    
    The code handling [already-executed] gtid event skips uses a
    dummy heartbeat (treated as a fake rotate event) to signal
    the end of a skip period. The slave handling code, which
    takes that event and forces a master info sync on every
    heartbeat - even these dummy ones. This can, even in a
    somewhat simple setup, lead to a ~5X increase in write load
    on a slave. Huge amount of wasted resources, very limiting
    to replication throughput.
    
    Analysis:
    =========
    Receiver thread upon writing events into relay log, updates
    its progress in Master_info object. i.e It updates the
    'master_log_file' and 'master_log_pos' of 'mi' object as per
    the last event that is written. For all events other than
    skipped/ignored events their positions are flushed using the
    following call with variable force= false.
    
    flush_master_info(mi, false /*force*/, lock_count == 0
                      /*need_lock*/, false /*flush_relay_log*/)
    
    'force=true' means do not respect sync period and flush
    information. When 'force' is false, flush will only happen
    if it is time to flush.
    
    At present a forced flush is done in following cases.
    Receiver thread is coming to halt.
    During change master command execution.
    
    This will ensure that upon restart, we start fetching from
    where we stopped.  But doing forced flush for
    skipped/ignored events  results in performance degradation.
    
    Skipped and Ignored events are listed below.
    
    HEARTBEAT_LOG_EVENT: During GTID protocol, if the master
    skips transactions, a heartbeat event is sent to the slave
    at the end of last skipped transaction to update
    coordinates. I/O thread receives the heartbeat event and
    updates mi only if the received heartbeat position is
    greater than mi->get_master_log_pos(). This event is written
    to the relay log as an ignored Rotate event. SQL thread
    reads the rotate event only to update the coordinates
    corresponding to the last skipped transaction.
    
    PREVIOUS_GTIDS_LOG_EVENT:
    This event is sent by master to show the slave that it is
    making progress. It contains the rotated master binary log
    specific information. These coordinates are written as
    ignored events.
    
    Events that originated from same server:
    These are the events received through circular replication.
    For each skipped event its position is tracked in
    rli->ign_master_log_name_end and
    rli->ign_master_log_pos_end. Upon receiver thread stop
    these positions are flushed to 'slave_master_info' table.
    
    These ignored events are special because the applier doesn't
    have to apply them. It has to just advance its position.
    Otherwise ignored events are similar to other events.
    Forced flush can be avoided for HEARTBEAT events and events
    received through circular replication.
    
    Force flush can be done only for the PREVIOUS_GTID_LOG_EVENT
    which has rotated master log file name and positions.
    
    Fix:
    ===
    Avoid force flush to 'slave_master_info' on HEARTBEAT and
    IGNORED events received through circular replication.
    
    RB: 21432
    f76cd6de
    Bug#28815555: HEARTBEATS/FAKEROTATE CAUSE A FORCED
    Sujatha Sivakumar authored
    SYNC_MASTER_INFO
    
    Problem:
    ========
    With master_info_repository=TABLE and variable
    sync_master_info=0, we should only see table
    mysql.slave_master_info updated on log file rotation. That
    is the documented and intended design, and can significantly
    reduce load on slaves, allowing much greater replication
    throughput.
    
    The code handling [already-executed] gtid event skips uses a
    dummy heartbeat (treated as a fake rotate event) to signal
    the end of a skip period. The slave handling code, which
    takes that event and forces a master info sync on every
    heartbeat - even these dummy ones. This can, even in a
    somewhat simple setup, lead to a ~5X increase in write load
    on a slave. Huge amount of wasted resources, very limiting
    to replication throughput.
    
    Analysis:
    =========
    Receiver thread upon writing events into relay log, updates
    its progress in Master_info object. i.e It updates the
    'master_log_file' and 'master_log_pos' of 'mi' object as per
    the last event that is written. For all events other than
    skipped/ignored events their positions are flushed using the
    following call with variable force= false.
    
    flush_master_info(mi, false /*force*/, lock_count == 0
                      /*need_lock*/, false /*flush_relay_log*/)
    
    'force=true' means do not respect sync period and flush
    information. When 'force' is false, flush will only happen
    if it is time to flush.
    
    At present a forced flush is done in following cases.
    Receiver thread is coming to halt.
    During change master command execution.
    
    This will ensure that upon restart, we start fetching from
    where we stopped.  But doing forced flush for
    skipped/ignored events  results in performance degradation.
    
    Skipped and Ignored events are listed below.
    
    HEARTBEAT_LOG_EVENT: During GTID protocol, if the master
    skips transactions, a heartbeat event is sent to the slave
    at the end of last skipped transaction to update
    coordinates. I/O thread receives the heartbeat event and
    updates mi only if the received heartbeat position is
    greater than mi->get_master_log_pos(). This event is written
    to the relay log as an ignored Rotate event. SQL thread
    reads the rotate event only to update the coordinates
    corresponding to the last skipped transaction.
    
    PREVIOUS_GTIDS_LOG_EVENT:
    This event is sent by master to show the slave that it is
    making progress. It contains the rotated master binary log
    specific information. These coordinates are written as
    ignored events.
    
    Events that originated from same server:
    These are the events received through circular replication.
    For each skipped event its position is tracked in
    rli->ign_master_log_name_end and
    rli->ign_master_log_pos_end. Upon receiver thread stop
    these positions are flushed to 'slave_master_info' table.
    
    These ignored events are special because the applier doesn't
    have to apply them. It has to just advance its position.
    Otherwise ignored events are similar to other events.
    Forced flush can be avoided for HEARTBEAT events and events
    received through circular replication.
    
    Force flush can be done only for the PREVIOUS_GTID_LOG_EVENT
    which has rotated master log file name and positions.
    
    Fix:
    ===
    Avoid force flush to 'slave_master_info' on HEARTBEAT and
    IGNORED events received through circular replication.
    
    RB: 21432
Loading