Skip to content
  • Sujatha Sivakumar's avatar
    cedcb093
    Bug#16856735: SLAVE DEADLOCK CAUSED BY STOP SLAVE, · cedcb093
    Sujatha Sivakumar authored
    SHOW SLAVE STATUS AND GLOBAL READ LOCK.
    
    Problem:
    =======
    SLAVE DEADLOCK CAUSED BY STOP SLAVE, SHOW SLAVE STATUS AND
    GLOBAL READ LOCK
    
    Analysis:
    ========
    "FLUSH TABLES WITH READ LOCK" command blocks all DML and
    DDL operations by taking "global read lock".  Hence when a
    DML or DDL statement is received at slave, post "global read
    lock" acquisition this operation will be blocked.  This 
    causes slave sql thread to be blocked.
    
    At this time when  STOP SLAVE command is issued in new slave
    connection this command will acquire "LOCK_active_mi" and it
    waits till the "sql thread" terminates.
    
    Now in the first connection issue SHOW SLAVE STATUS command
    which needs 'LOCK_active_mi' will be blocked till the above
    lock is released by "STOP SLAVE".  Since "UNLOCK TABLES" can
    only be done in first connection which is not possible any 
    more, which causes a deadlock.
    
    Fix:
    ===
    Provide a new option to make the "STOP SLAVE" command to
    timeout after specified number of seconds rather than
    waiting for an year.
    
    During testing the fix another deadlock issue was identified
    with "STOP SLAVE" + "MTS" + "FLUSH TABLES WITH READ LOCK".
    This was caused as "rli->run_lock" was moved as par of
    BUG#13635612 fix to eliminate false suspects.  But since
    this change is not necessary the "rli->run_lock" is moved
    back to the appropriate place.
    cedcb093
    Bug#16856735: SLAVE DEADLOCK CAUSED BY STOP SLAVE,
    Sujatha Sivakumar authored
    SHOW SLAVE STATUS AND GLOBAL READ LOCK.
    
    Problem:
    =======
    SLAVE DEADLOCK CAUSED BY STOP SLAVE, SHOW SLAVE STATUS AND
    GLOBAL READ LOCK
    
    Analysis:
    ========
    "FLUSH TABLES WITH READ LOCK" command blocks all DML and
    DDL operations by taking "global read lock".  Hence when a
    DML or DDL statement is received at slave, post "global read
    lock" acquisition this operation will be blocked.  This 
    causes slave sql thread to be blocked.
    
    At this time when  STOP SLAVE command is issued in new slave
    connection this command will acquire "LOCK_active_mi" and it
    waits till the "sql thread" terminates.
    
    Now in the first connection issue SHOW SLAVE STATUS command
    which needs 'LOCK_active_mi' will be blocked till the above
    lock is released by "STOP SLAVE".  Since "UNLOCK TABLES" can
    only be done in first connection which is not possible any 
    more, which causes a deadlock.
    
    Fix:
    ===
    Provide a new option to make the "STOP SLAVE" command to
    timeout after specified number of seconds rather than
    waiting for an year.
    
    During testing the fix another deadlock issue was identified
    with "STOP SLAVE" + "MTS" + "FLUSH TABLES WITH READ LOCK".
    This was caused as "rli->run_lock" was moved as par of
    BUG#13635612 fix to eliminate false suspects.  But since
    this change is not necessary the "rli->run_lock" is moved
    back to the appropriate place.
Loading