sql/rpl_mts_submode.cc · 3a6ab0930968371c67710a6539520547d334c5f7 · Rasoul Jahanshahi / Mysql Server

Sep 21, 2017

BUG#25585436 ASSERT `!RLI_DESCRIPTION_EVENT || IS_PARALLEL_EXEC()' AT · cf2aa612

Joao Gramacho authored Sep 21, 2017

             RPL_RLI.CC:2393

Problem and Analysis
--------------------

The issues happen when the MTS coordinator is trying to determine which
worker should handle a new transaction being scheduled that depends on
a transaction still in progress in a worker because of MTS logical
clock.

If the worker fails to apply the transaction, instead of being
notified that there was an error and the logical clock the coordinator
was waiting will never be reached, the coordinator is assigning the
transaction to itself before seen that an error happened.

When the MTS coordinator becomes aware of the error, it has already
scheduled (and applied) some events of the new transaction, messing with
some of the cleanup logics.

Requesting the MTS coordinator (STOP SLAVE) while it is applying a
transaction that should be handled by workers are making debug binaries
to hit the assert.

Fix
---

Make schedule_next_event return an error when a dependent transaction
is aware of the failure of a transaction it was waiting on.

cf2aa612

BUG#25585436 ASSERT `!RLI_DESCRIPTION_EVENT || IS_PARALLEL_EXEC()' AT

Joao Gramacho authored Sep 21, 2017

             RPL_RLI.CC:2393

Problem and Analysis
--------------------

The issues happen when the MTS coordinator is trying to determine which
worker should handle a new transaction being scheduled that depends on
a transaction still in progress in a worker because of MTS logical
clock.

If the worker fails to apply the transaction, instead of being
notified that there was an error and the logical clock the coordinator
was waiting will never be reached, the coordinator is assigning the
transaction to itself before seen that an error happened.

When the MTS coordinator becomes aware of the error, it has already
scheduled (and applied) some events of the new transaction, messing with
some of the cleanup logics.

Requesting the MTS coordinator (STOP SLAVE) while it is applying a
transaction that should be handled by workers are making debug binaries
to hit the assert.

Fix
---

Make schedule_next_event return an error when a dependent transaction
is aware of the failure of a transaction it was waiting on.