-
Venkatesh Duggirala authored
ERROR CASES Problem: ======== When two workers threads are involved in "Commit order deadlock", slave worker thread is retrying a transaction even when the transaction error is a 'non temporary' error. Analysis: ========= As part of BUG#20136704 fix, commit order deadlock checking mechanism is introduced. Every time when a transaction needs to wait for another transaction to release a row lock, innodb will call a slave function to check if there is an order commit deadlock. If it found an order commit deadlock, It will set a deadlock flag to the slave worker which is holding the row lock. Thereafter, the worker will roll its transaction back and retry it again. At this point, this retrying transaction *should* have no non-temporary errors. Having a non-temporary error may be a sign of: a) Slave has diverged from the master; b) There is an issue in the logical clock allowing a transaction to be applied in parallel with its dependencies (the two transactions are trying to change the same record in parallel). For (a), a retry of this transaction will produce the same error. For (b), this transaction might succeed upon retry, allowing the slave to progress without manual intervention, but it is a sign of problems in LC generation at the master. Fix: ==== Slave server will make the worker to retry a transaction (which is involved in commit order deadlock) only if there are no fatal errors.
Venkatesh Duggirala authoredERROR CASES Problem: ======== When two workers threads are involved in "Commit order deadlock", slave worker thread is retrying a transaction even when the transaction error is a 'non temporary' error. Analysis: ========= As part of BUG#20136704 fix, commit order deadlock checking mechanism is introduced. Every time when a transaction needs to wait for another transaction to release a row lock, innodb will call a slave function to check if there is an order commit deadlock. If it found an order commit deadlock, It will set a deadlock flag to the slave worker which is holding the row lock. Thereafter, the worker will roll its transaction back and retry it again. At this point, this retrying transaction *should* have no non-temporary errors. Having a non-temporary error may be a sign of: a) Slave has diverged from the master; b) There is an issue in the logical clock allowing a transaction to be applied in parallel with its dependencies (the two transactions are trying to change the same record in parallel). For (a), a retry of this transaction will produce the same error. For (b), this transaction might succeed upon retry, allowing the slave to progress without manual intervention, but it is a sign of problems in LC generation at the master. Fix: ==== Slave server will make the worker to retry a transaction (which is involved in commit order deadlock) only if there are no fatal errors.
Loading