Skip to content
  • Sujatha Sivakumar's avatar
    c72d8f95
    Bug#17450876:REPLICATION STOP WITH "ERROR IN XID_LOG_EVENT: · c72d8f95
    Sujatha Sivakumar authored
    COMMIT COULD NOT BE COMPLETED"
    
    Problem:
    ========
    When a SQL thread which is waiting for commit lock is killed
    and restarted it causes a transaction to be skipped on slave.
    
    Analysis:
    ========
    when SQL thread is at a state where a DML is waiting for MDL
    commit lock if SQL thread is killed then position are getting
    updated in memory. i.e in the existing design positions are
    flushed before the actual commit because of this rli object
    will have its positions updated but the transaction is yet
    to be committed.  When the SQL thread is restarted it reads
    position from the rli object and hence the last transaction
    gets skipped on slave.
    
    Fix:
    ===
    When SQL thread is killed at a stage where it is waiting for
    commit lock, the commit fails and an error is reported back
    saying "Commit could not be completed and Query execution
    was interrupted".  As part of fix SQL threads positions that
    existed before the commit are persisted and they are
    restored back on error.
    
    Similar symptoms exist in case of MTS as well. In MTS
    "The slave coordinator and worker threads are stopped,
    possibly leaving data in inconsistent state" error is
    reported. In MTS a bitmap is maintained for successful
    commits. This bit map is cleared on error and the old
    positions are retrieved from the checkpoint which points to
    old positions.
    c72d8f95
    Bug#17450876:REPLICATION STOP WITH "ERROR IN XID_LOG_EVENT:
    Sujatha Sivakumar authored
    COMMIT COULD NOT BE COMPLETED"
    
    Problem:
    ========
    When a SQL thread which is waiting for commit lock is killed
    and restarted it causes a transaction to be skipped on slave.
    
    Analysis:
    ========
    when SQL thread is at a state where a DML is waiting for MDL
    commit lock if SQL thread is killed then position are getting
    updated in memory. i.e in the existing design positions are
    flushed before the actual commit because of this rli object
    will have its positions updated but the transaction is yet
    to be committed.  When the SQL thread is restarted it reads
    position from the rli object and hence the last transaction
    gets skipped on slave.
    
    Fix:
    ===
    When SQL thread is killed at a stage where it is waiting for
    commit lock, the commit fails and an error is reported back
    saying "Commit could not be completed and Query execution
    was interrupted".  As part of fix SQL threads positions that
    existed before the commit are persisted and they are
    restored back on error.
    
    Similar symptoms exist in case of MTS as well. In MTS
    "The slave coordinator and worker threads are stopped,
    possibly leaving data in inconsistent state" error is
    reported. In MTS a bitmap is maintained for successful
    commits. This bit map is cleared on error and the old
    positions are retrieved from the checkpoint which points to
    old positions.
Loading