-
Sujatha Sivakumar authored
COMMIT COULD NOT BE COMPLETED" Problem: ======== When a SQL thread which is waiting for commit lock is killed and restarted it causes a transaction to be skipped on slave. Analysis: ======== when SQL thread is at a state where a DML is waiting for MDL commit lock if SQL thread is killed then position are getting updated in memory. i.e in the existing design positions are flushed before the actual commit because of this rli object will have its positions updated but the transaction is yet to be committed. When the SQL thread is restarted it reads position from the rli object and hence the last transaction gets skipped on slave. Fix: === When SQL thread is killed at a stage where it is waiting for commit lock, the commit fails and an error is reported back saying "Commit could not be completed and Query execution was interrupted". As part of fix SQL threads positions that existed before the commit are persisted and they are restored back on error. Similar symptoms exist in case of MTS as well. In MTS "The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state" error is reported. In MTS a bitmap is maintained for successful commits. This bit map is cleared on error and the old positions are retrieved from the checkpoint which points to old positions.
Sujatha Sivakumar authoredCOMMIT COULD NOT BE COMPLETED" Problem: ======== When a SQL thread which is waiting for commit lock is killed and restarted it causes a transaction to be skipped on slave. Analysis: ======== when SQL thread is at a state where a DML is waiting for MDL commit lock if SQL thread is killed then position are getting updated in memory. i.e in the existing design positions are flushed before the actual commit because of this rli object will have its positions updated but the transaction is yet to be committed. When the SQL thread is restarted it reads position from the rli object and hence the last transaction gets skipped on slave. Fix: === When SQL thread is killed at a stage where it is waiting for commit lock, the commit fails and an error is reported back saying "Commit could not be completed and Query execution was interrupted". As part of fix SQL threads positions that existed before the commit are persisted and they are restored back on error. Similar symptoms exist in case of MTS as well. In MTS "The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state" error is reported. In MTS a bitmap is maintained for successful commits. This bit map is cleared on error and the old positions are retrieved from the checkpoint which points to old positions.
Loading