Skip to content
  • Venkatesh Venugopal's avatar
    1dcad705
    Bug#28830834: MTS NOT REPLICATION CRASH-SAFE WITH GTID AND · 1dcad705
    Venkatesh Venugopal authored
                  ALL THE RIGHT PARAMETERS
    
    Problem
    -------
    When a multi threaded slave (with GTID + Auto Position)
    crashes at a state where gaps are present in MTS execution,
    during restart, crash recovey of relay logs will fail if the
    relay logs that are required to fill the gaps are missing.
    
    Analysis
    --------
    When GTID is enabled, MTS need not care about gaps.
    
    If the auto position protocol is enabled, in the initial
    handshake, the slave sends a GTID set containing the
    transactions that it has already committed. The master
    responds by sending all transactions recorded in its binary
    log whose GTID is not included in the GTID set sent by the
    slave. This exchange ensures that the master only sends the
    transactions with a GTID that the slave has not already
    received or committed.
    
    So, when auto_position is enabled,  restarting the server
    after an OS crash should make the recovery successful.
    
    But, due to this bug, server was still trying to look at the
    relay logs to fill the gaps and was failing to read the
    partial events, as the un-synced relay log events were lost
    during the OS crash.
    
    Fix
    ---
    During recovery, check if Auto Position is enabled or not.
    If enabled, set the recovery_parallel_workers to 0 and hint
    the server to skip MTS recovery, as the autoposition
    protocol will take care of filling the gaps.
    
    This patch makes recovery of multi threaded slave always
    succeed when GTID Auto Position regardless of the value of
    --relay-log-recovery and repository type.
    
    RB:22329
    1dcad705
    Bug#28830834: MTS NOT REPLICATION CRASH-SAFE WITH GTID AND
    Venkatesh Venugopal authored
                  ALL THE RIGHT PARAMETERS
    
    Problem
    -------
    When a multi threaded slave (with GTID + Auto Position)
    crashes at a state where gaps are present in MTS execution,
    during restart, crash recovey of relay logs will fail if the
    relay logs that are required to fill the gaps are missing.
    
    Analysis
    --------
    When GTID is enabled, MTS need not care about gaps.
    
    If the auto position protocol is enabled, in the initial
    handshake, the slave sends a GTID set containing the
    transactions that it has already committed. The master
    responds by sending all transactions recorded in its binary
    log whose GTID is not included in the GTID set sent by the
    slave. This exchange ensures that the master only sends the
    transactions with a GTID that the slave has not already
    received or committed.
    
    So, when auto_position is enabled,  restarting the server
    after an OS crash should make the recovery successful.
    
    But, due to this bug, server was still trying to look at the
    relay logs to fill the gaps and was failing to read the
    partial events, as the un-synced relay log events were lost
    during the OS crash.
    
    Fix
    ---
    During recovery, check if Auto Position is enabled or not.
    If enabled, set the recovery_parallel_workers to 0 and hint
    the server to skip MTS recovery, as the autoposition
    protocol will take care of filling the gaps.
    
    This patch makes recovery of multi threaded slave always
    succeed when GTID Auto Position regardless of the value of
    --relay-log-recovery and repository type.
    
    RB:22329
Loading