Skip to content
  • Kapil Agrawal's avatar
    742a720f
    Bug#26883680 COMMIT_ORDER_MANAGER CAN'T TERMINATE MTS WORKER PROPERLY WHEN DEADLOCK HAPPENS · 742a720f
    Kapil Agrawal authored
    This patch is for mysql-5.7
    
    Background
    ----------
    In general, if a replication applier thread fails to execute a
    transaction because of an InnoDB deadlock or because the transaction's
    execution time exceeded InnoDB's innodb_lock_wait_timeout, it
    automatically retries slave_transaction_retries times before stopping
    with an error.
    
    
    And when --slave_preserve_commit_order is enabled, the replica server
    ensures that transactions are externalized on the replica in the same
    order as they appear in the replica's relay log, and prevents gaps in
    the sequence of transactions that have been executed from the relay log.
    If a thread's execution is completed before its preceding thread, then
    the executing thread waits until all previous transactions are committed
    before committing.
    
    
    Problem
    When --slave_preserve_commit_order is enabled on slave and if the
    waiting thread has locked the rows which are needed by the thread
    executing the previous transaction(as per their order in the relay log),
    then the innodb deadlock detection algorithm detects the deadlock
    between workers and will ask the waiting thread to rollback (only if its
    sequence number is lesser than that of the waiting thread).
    
    
    When this happens, the waiting thread wakes up from the cond_wait(SPCO)
    and it gets to know that it was asked to rollback by its preceding
    transaction as it was holding a lock that is needed by the other
    transaction to progress. It then rolls back its transaction so that the
    the preceding transaction can be committed and retries the transaction.
    
    
    The above logic sometimes caused the worker thread to miss the signals
    resulting in the replica server to hang. One of such hang is mentioned
    below.
    
    Consider a replica server which is configured with
    slave_parallel_workers=3, slave_parallel_type=LOGICAL_CLOCK,
    slave_preserve_commit_order=1 and slave_transaction_retries=0. When MTS
    is enabled, it is quite possible that workers execute out of order
    causing the below state.
    
    Worker 1 - Processing the events of Transaction T1
    Worker 2 - Executed Transaction T2 and is waiting for T1 to commit.
    Worker 3 - Processing the events of Transaction T3
    
    If T1 and T2 are modifying same rows in InnodB, then the worker 1
    detects deadlock and asks worker 2 to rollback by signalling.
    
    Worker 2 wakes up from the cond_wait. It gets to know that it was
    asked to roll back by the other transaction and returns with an
    error.
    
    Worker 2 rolls back the transaction and comes to the retry part of
    the code and checks the value of slave_transaction_retries. Since it
    is 0, it returns from the handle_slave_worker loop and enters the
    error handling part.
    
    As part of error handling, Worker 2 notifies the co-ordinator that it
    is exiting, and then calls report_rollback() function to unregister
    itself from the SPCO queue.
    
    While executing report_rollback(), Worker 2 will again enter
    wait_for_its_turn(). But before entering the wait, it checks the
    commit_order_deadlock flag. Since the flag is already set, Worker2
    immediately returns from the function with error.
    
    Co-ordinator thread gets this information and sets the
    rli->abort_slave=1 to stop replication and waits till all workers
    exit.
    
    Worker 2 exits. There is no worker 2 from here onwards.
    
    Now the status is,
    Worker 1 - Processing the events of Transaction T1
    Worker 2 - Not running.
    Worker 3 - Processing the events of Transaction T3
    
    Now the worker 1 proceeds and executes the transaction and enters the
    Commit_order_manager::wait_for_its_turn.
    
    Worker 1 finds out that the previous worker(Worker 2) failed because
    of an error.
    
    Worker 1 signals next transaction/worker to proceed.
    
    Worker 3 executes the transaction and enters the
    Commit_order_manager::wait_for_its_turn.
    
    Worker 1 rolls back the transaction and eventually exits.
    
    There will be no one to signal Worker 3 and thus waits forever.
    
    This resulted in a system hang as the co-ordinator thread will be
    waiting for the worker thread to finish and the worker thread will be
    waiting for the signal to proceed with the commit.
    
    mysql> show processlist;
    +----+-------------+-----------------+------+---------+------+-----------------------------------------+
    | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
    +----+-------------+-----------------+------+---------+------+-----------------------------------------+
    | 2 | root | localhost:55708 | test | Query | 0 | starting | show processlist | 0 | 0 |
    | 3 | system user | | NULL | Connect | 107 | Waiting for master to send event | NULL | 0 | 0 |
    | 4 | system user | | NULL | Connect | 77 | Waiting for workers to exit | NULL | 0 | 0 |
    | 7 | system user | | NULL | Connect | 84 | Waiting for preceding transaction to commit | NULL | 0 | 0 |
    +----+-------------+-----------------+------+---------+------+-----------------------------------------+
    
    Analysis
    -------
    Considering the above flow, the root cause for the deadlock seems to be
    in the step 5 since Worker 2 which is in the middle of the commit order
    queue exits abruptly without signaling Worker 3 causing the Worker 3 to
    wait forever on the signal.
    
    If worker 2, had waited for its turn during report_rollback(), instead
    of returning immediately from Commit_order_manager::wait_for_its_turn()
    after checking the m_order_commit_deadlock flag, then Worker 3 would
    have received signal from Worker 2 when Worker 2 proceeded (after
    getting woken up when Worker 1 signals) and there would have been no
    cases of missing signals.
    
    Fix
    ---
    To fix first issue, we have created a class `Retry_context_sentry` to perform
    proper cleanup for the workers when in `retry_transaction` function.
    
    RB#25242
    742a720f
    Bug#26883680 COMMIT_ORDER_MANAGER CAN'T TERMINATE MTS WORKER PROPERLY WHEN DEADLOCK HAPPENS
    Kapil Agrawal authored
    This patch is for mysql-5.7
    
    Background
    ----------
    In general, if a replication applier thread fails to execute a
    transaction because of an InnoDB deadlock or because the transaction's
    execution time exceeded InnoDB's innodb_lock_wait_timeout, it
    automatically retries slave_transaction_retries times before stopping
    with an error.
    
    
    And when --slave_preserve_commit_order is enabled, the replica server
    ensures that transactions are externalized on the replica in the same
    order as they appear in the replica's relay log, and prevents gaps in
    the sequence of transactions that have been executed from the relay log.
    If a thread's execution is completed before its preceding thread, then
    the executing thread waits until all previous transactions are committed
    before committing.
    
    
    Problem
    When --slave_preserve_commit_order is enabled on slave and if the
    waiting thread has locked the rows which are needed by the thread
    executing the previous transaction(as per their order in the relay log),
    then the innodb deadlock detection algorithm detects the deadlock
    between workers and will ask the waiting thread to rollback (only if its
    sequence number is lesser than that of the waiting thread).
    
    
    When this happens, the waiting thread wakes up from the cond_wait(SPCO)
    and it gets to know that it was asked to rollback by its preceding
    transaction as it was holding a lock that is needed by the other
    transaction to progress. It then rolls back its transaction so that the
    the preceding transaction can be committed and retries the transaction.
    
    
    The above logic sometimes caused the worker thread to miss the signals
    resulting in the replica server to hang. One of such hang is mentioned
    below.
    
    Consider a replica server which is configured with
    slave_parallel_workers=3, slave_parallel_type=LOGICAL_CLOCK,
    slave_preserve_commit_order=1 and slave_transaction_retries=0. When MTS
    is enabled, it is quite possible that workers execute out of order
    causing the below state.
    
    Worker 1 - Processing the events of Transaction T1
    Worker 2 - Executed Transaction T2 and is waiting for T1 to commit.
    Worker 3 - Processing the events of Transaction T3
    
    If T1 and T2 are modifying same rows in InnodB, then the worker 1
    detects deadlock and asks worker 2 to rollback by signalling.
    
    Worker 2 wakes up from the cond_wait. It gets to know that it was
    asked to roll back by the other transaction and returns with an
    error.
    
    Worker 2 rolls back the transaction and comes to the retry part of
    the code and checks the value of slave_transaction_retries. Since it
    is 0, it returns from the handle_slave_worker loop and enters the
    error handling part.
    
    As part of error handling, Worker 2 notifies the co-ordinator that it
    is exiting, and then calls report_rollback() function to unregister
    itself from the SPCO queue.
    
    While executing report_rollback(), Worker 2 will again enter
    wait_for_its_turn(). But before entering the wait, it checks the
    commit_order_deadlock flag. Since the flag is already set, Worker2
    immediately returns from the function with error.
    
    Co-ordinator thread gets this information and sets the
    rli->abort_slave=1 to stop replication and waits till all workers
    exit.
    
    Worker 2 exits. There is no worker 2 from here onwards.
    
    Now the status is,
    Worker 1 - Processing the events of Transaction T1
    Worker 2 - Not running.
    Worker 3 - Processing the events of Transaction T3
    
    Now the worker 1 proceeds and executes the transaction and enters the
    Commit_order_manager::wait_for_its_turn.
    
    Worker 1 finds out that the previous worker(Worker 2) failed because
    of an error.
    
    Worker 1 signals next transaction/worker to proceed.
    
    Worker 3 executes the transaction and enters the
    Commit_order_manager::wait_for_its_turn.
    
    Worker 1 rolls back the transaction and eventually exits.
    
    There will be no one to signal Worker 3 and thus waits forever.
    
    This resulted in a system hang as the co-ordinator thread will be
    waiting for the worker thread to finish and the worker thread will be
    waiting for the signal to proceed with the commit.
    
    mysql> show processlist;
    +----+-------------+-----------------+------+---------+------+-----------------------------------------+
    | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
    +----+-------------+-----------------+------+---------+------+-----------------------------------------+
    | 2 | root | localhost:55708 | test | Query | 0 | starting | show processlist | 0 | 0 |
    | 3 | system user | | NULL | Connect | 107 | Waiting for master to send event | NULL | 0 | 0 |
    | 4 | system user | | NULL | Connect | 77 | Waiting for workers to exit | NULL | 0 | 0 |
    | 7 | system user | | NULL | Connect | 84 | Waiting for preceding transaction to commit | NULL | 0 | 0 |
    +----+-------------+-----------------+------+---------+------+-----------------------------------------+
    
    Analysis
    -------
    Considering the above flow, the root cause for the deadlock seems to be
    in the step 5 since Worker 2 which is in the middle of the commit order
    queue exits abruptly without signaling Worker 3 causing the Worker 3 to
    wait forever on the signal.
    
    If worker 2, had waited for its turn during report_rollback(), instead
    of returning immediately from Commit_order_manager::wait_for_its_turn()
    after checking the m_order_commit_deadlock flag, then Worker 3 would
    have received signal from Worker 2 when Worker 2 proceeded (after
    getting woken up when Worker 1 signals) and there would have been no
    cases of missing signals.
    
    Fix
    ---
    To fix first issue, we have created a class `Retry_context_sentry` to perform
    proper cleanup for the workers when in `retry_transaction` function.
    
    RB#25242
Loading