Skip to content
  • Sujatha Sivakumar's avatar
    12ec3438
    Bug#21440793: "DEADLOCK" ON START SLAVE WITH CRASH-SAFE MTS · 12ec3438
    Sujatha Sivakumar authored
    SLAVE AND AUTOCOMMIT=OFF
    
    Problem:
    =======
    Enable CRASH-SAFE MTS and do start slave. Once slave is up
    stop the slave and set autocommit=0. Executing start slave
    on this session will hang. The hang will be there for the
    duration of lock wait timeout. After the lock wait timeout,
    when start slave proceeds further it will crash during a
    call to 'Rpl_info::remove_info'.
    
    Analysis:
    ========
    The problem happened because the
    Rpl_info_table::do_check_info() methods, used on slave
    internal structures initialization, didn't finish the
    transaction started to access info tables. Since tables were
    still locked by the access from do_check_info(), the
    following initialization procedures failed to acquire locks
    on the info tables.
    
    Regarding crash during MTS recovery process workers are
    created to complete the MTS recovery.  When the above
    mentioned initialization problem happens within
    'Create_worker' function call an error that says
    "Failed to initialize the worker info structure" is reported
    and the "worker" object gets deleted within the
    'Create_worker' call. Upon returning back to
    'Relay_log_info::mts_finalize_recovery' call code will try
    to access the worker object which was already deleted
    resulting in a crash.
    
    Fix:
    ===
    When info tables are used and autocommit= 0 we force a new
    transaction to start and commit to avoid deadlocks on
    START SLAVE with CRASH-SAFE MTS slave.
    
    Added code to verify the existence of worker object before
    accessing the worker object in finalize recovery code.
    sujatha:~/bug_repo/Bug21440793_mysql-5.6$ git pull
    Already up-to-date.
    12ec3438
    Bug#21440793: "DEADLOCK" ON START SLAVE WITH CRASH-SAFE MTS
    Sujatha Sivakumar authored
    SLAVE AND AUTOCOMMIT=OFF
    
    Problem:
    =======
    Enable CRASH-SAFE MTS and do start slave. Once slave is up
    stop the slave and set autocommit=0. Executing start slave
    on this session will hang. The hang will be there for the
    duration of lock wait timeout. After the lock wait timeout,
    when start slave proceeds further it will crash during a
    call to 'Rpl_info::remove_info'.
    
    Analysis:
    ========
    The problem happened because the
    Rpl_info_table::do_check_info() methods, used on slave
    internal structures initialization, didn't finish the
    transaction started to access info tables. Since tables were
    still locked by the access from do_check_info(), the
    following initialization procedures failed to acquire locks
    on the info tables.
    
    Regarding crash during MTS recovery process workers are
    created to complete the MTS recovery.  When the above
    mentioned initialization problem happens within
    'Create_worker' function call an error that says
    "Failed to initialize the worker info structure" is reported
    and the "worker" object gets deleted within the
    'Create_worker' call. Upon returning back to
    'Relay_log_info::mts_finalize_recovery' call code will try
    to access the worker object which was already deleted
    resulting in a crash.
    
    Fix:
    ===
    When info tables are used and autocommit= 0 we force a new
    transaction to start and commit to avoid deadlocks on
    START SLAVE with CRASH-SAFE MTS slave.
    
    Added code to verify the existence of worker object before
    accessing the worker object in finalize recovery code.
    sujatha:~/bug_repo/Bug21440793_mysql-5.6$ git pull
    Already up-to-date.
Loading