-
Sujatha Sivakumar authored
SHOW SLAVE STATUS AND GLOBAL READ LOCK. Problem: ======= SLAVE DEADLOCK CAUSED BY STOP SLAVE, SHOW SLAVE STATUS AND GLOBAL READ LOCK Analysis: ======== "FLUSH TABLES WITH READ LOCK" command blocks all DML and DDL operations by taking "global read lock". Hence when a DML or DDL statement is received at slave, post "global read lock" acquisition this operation will be blocked. This causes slave sql thread to be blocked. At this time when STOP SLAVE command is issued in new slave connection this command will acquire "LOCK_active_mi" and it waits till the "sql thread" terminates. Now in the first connection issue SHOW SLAVE STATUS command which needs 'LOCK_active_mi' will be blocked till the above lock is released by "STOP SLAVE". Since "UNLOCK TABLES" can only be done in first connection which is not possible any more, which causes a deadlock. Fix: === Provide a new option to make the "STOP SLAVE" command to timeout after specified number of seconds rather than waiting for an year. During testing the fix another deadlock issue was identified with "STOP SLAVE" + "MTS" + "FLUSH TABLES WITH READ LOCK". This was caused as "rli->run_lock" was moved as par of BUG#13635612 fix to eliminate false suspects. But since this change is not necessary the "rli->run_lock" is moved back to the appropriate place.
Sujatha Sivakumar authoredSHOW SLAVE STATUS AND GLOBAL READ LOCK. Problem: ======= SLAVE DEADLOCK CAUSED BY STOP SLAVE, SHOW SLAVE STATUS AND GLOBAL READ LOCK Analysis: ======== "FLUSH TABLES WITH READ LOCK" command blocks all DML and DDL operations by taking "global read lock". Hence when a DML or DDL statement is received at slave, post "global read lock" acquisition this operation will be blocked. This causes slave sql thread to be blocked. At this time when STOP SLAVE command is issued in new slave connection this command will acquire "LOCK_active_mi" and it waits till the "sql thread" terminates. Now in the first connection issue SHOW SLAVE STATUS command which needs 'LOCK_active_mi' will be blocked till the above lock is released by "STOP SLAVE". Since "UNLOCK TABLES" can only be done in first connection which is not possible any more, which causes a deadlock. Fix: === Provide a new option to make the "STOP SLAVE" command to timeout after specified number of seconds rather than waiting for an year. During testing the fix another deadlock issue was identified with "STOP SLAVE" + "MTS" + "FLUSH TABLES WITH READ LOCK". This was caused as "rli->run_lock" was moved as par of BUG#13635612 fix to eliminate false suspects. But since this change is not necessary the "rli->run_lock" is moved back to the appropriate place.
Loading