mysql-test/extra/rpl_tests/rpl_gtid_mts_relay_log_recovery.test · 45cf75598cf1f64c98fa367a100901c7deb70c37 · Rasoul Jahanshahi / Mysql Server

Jun 26, 2019

Bug#28830834: MTS NOT REPLICATION CRASH-SAFE WITH GTID AND · 1dcad705

Venkatesh Venugopal authored Jun 26, 2019

              ALL THE RIGHT PARAMETERS

Problem
-------
When a multi threaded slave (with GTID + Auto Position)
crashes at a state where gaps are present in MTS execution,
during restart, crash recovey of relay logs will fail if the
relay logs that are required to fill the gaps are missing.

Analysis
--------
When GTID is enabled, MTS need not care about gaps.

If the auto position protocol is enabled, in the initial
handshake, the slave sends a GTID set containing the
transactions that it has already committed. The master
responds by sending all transactions recorded in its binary
log whose GTID is not included in the GTID set sent by the
slave. This exchange ensures that the master only sends the
transactions with a GTID that the slave has not already
received or committed.

So, when auto_position is enabled,  restarting the server
after an OS crash should make the recovery successful.

But, due to this bug, server was still trying to look at the
relay logs to fill the gaps and was failing to read the
partial events, as the un-synced relay log events were lost
during the OS crash.

Fix
---
During recovery, check if Auto Position is enabled or not.
If enabled, set the recovery_parallel_workers to 0 and hint
the server to skip MTS recovery, as the autoposition
protocol will take care of filling the gaps.

This patch makes recovery of multi threaded slave always
succeed when GTID Auto Position regardless of the value of
--relay-log-recovery and repository type.

RB:22329

1dcad705

Bug#28830834: MTS NOT REPLICATION CRASH-SAFE WITH GTID AND

Venkatesh Venugopal authored Jun 26, 2019

              ALL THE RIGHT PARAMETERS

Problem
-------
When a multi threaded slave (with GTID + Auto Position)
crashes at a state where gaps are present in MTS execution,
during restart, crash recovey of relay logs will fail if the
relay logs that are required to fill the gaps are missing.

Analysis
--------
When GTID is enabled, MTS need not care about gaps.

If the auto position protocol is enabled, in the initial
handshake, the slave sends a GTID set containing the
transactions that it has already committed. The master
responds by sending all transactions recorded in its binary
log whose GTID is not included in the GTID set sent by the
slave. This exchange ensures that the master only sends the
transactions with a GTID that the slave has not already
received or committed.

So, when auto_position is enabled,  restarting the server
after an OS crash should make the recovery successful.

But, due to this bug, server was still trying to look at the
relay logs to fill the gaps and was failing to read the
partial events, as the un-synced relay log events were lost
during the OS crash.

Fix
---
During recovery, check if Auto Position is enabled or not.
If enabled, set the recovery_parallel_workers to 0 and hint
the server to skip MTS recovery, as the autoposition
protocol will take care of filling the gaps.

This patch makes recovery of multi threaded slave always
succeed when GTID Auto Position regardless of the value of
--relay-log-recovery and repository type.

RB:22329