Skip to content
  • Venkatesh Duggirala's avatar
    9ab03d0d
    Bug#19975697 5.6: SLAVE IO_THREAD MAY GET STUCK WHEN USING · 9ab03d0d
    Venkatesh Duggirala authored
    GTID AND LOW SLAVE_NET_TIMEOUTS
    
    Problem: When GTID is enabled, dump thread is not checking
    the necessity of heartbeat event while it is scanning
    through the binary log files and skipping some GTID groups
    which were already present at Slave.
    
    Analysis: Dump thread sends a heartbeat event to Slave if
    there are no events to send for "heartbeat_period" seconds
    to make the connection between Master and Slave active.
    But when dump thread is scanning a binary log file and
    if it finds many GTID groups(/events) that needs to be skipped,
    it is not looking for this time period and not
    looking to send heartbeat event to Slave.
    
    There are two problems with the existing code in this
    scenario:
    
    Problem 1: If dump thread is spending more time in skipping the groups
    (many groups that needs to be skipped) and is not sending any
    heartbeat event, Slave thinks that Master is dead and it will
    try to reestablish the connection.
    Problem 2): Dump thread has two while loops to process the events at Master
    side, a ) outer loop: to process all binary log files one by one
          b ) inner loop: to process all the events one by one in a file
    
    Outer loop is having a flag 'thd->killed' to check if dump thread is killed
    in between processing the different files, if so, it exists the while loop.
    
    But Inner loop is not having any checks like this which end up in processing
    the full binary file( if it is huge, taking more time) which is unnecessary
    if this dump thread is killed due to some reason (One reason could be that
    this dump thread could have been detected as Zombie thread by another new
    dump request from Slave).
    
    Fix:
    1) Dump thread will now check whether it is time to send an heartbeat event
    before skipping an event. If so, it will send one heartbeat event to Slave.
    2) Inner loop also checks for thd->killed flag to avoid unnecessary work.
    9ab03d0d
    Bug#19975697 5.6: SLAVE IO_THREAD MAY GET STUCK WHEN USING
    Venkatesh Duggirala authored
    GTID AND LOW SLAVE_NET_TIMEOUTS
    
    Problem: When GTID is enabled, dump thread is not checking
    the necessity of heartbeat event while it is scanning
    through the binary log files and skipping some GTID groups
    which were already present at Slave.
    
    Analysis: Dump thread sends a heartbeat event to Slave if
    there are no events to send for "heartbeat_period" seconds
    to make the connection between Master and Slave active.
    But when dump thread is scanning a binary log file and
    if it finds many GTID groups(/events) that needs to be skipped,
    it is not looking for this time period and not
    looking to send heartbeat event to Slave.
    
    There are two problems with the existing code in this
    scenario:
    
    Problem 1: If dump thread is spending more time in skipping the groups
    (many groups that needs to be skipped) and is not sending any
    heartbeat event, Slave thinks that Master is dead and it will
    try to reestablish the connection.
    Problem 2): Dump thread has two while loops to process the events at Master
    side, a ) outer loop: to process all binary log files one by one
          b ) inner loop: to process all the events one by one in a file
    
    Outer loop is having a flag 'thd->killed' to check if dump thread is killed
    in between processing the different files, if so, it exists the while loop.
    
    But Inner loop is not having any checks like this which end up in processing
    the full binary file( if it is huge, taking more time) which is unnecessary
    if this dump thread is killed due to some reason (One reason could be that
    this dump thread could have been detected as Zombie thread by another new
    dump request from Slave).
    
    Fix:
    1) Dump thread will now check whether it is time to send an heartbeat event
    before skipping an event. If so, it will send one heartbeat event to Slave.
    2) Inner loop also checks for thd->killed flag to avoid unnecessary work.
Loading