-
Frazer Clement authored
This patch modifies the Ndb replication slave layer code to monitor for epoch skipping behaviour. Specifically, there have been issues where the generic replication layer slave retry-on-temp-error code has not functioned correctly, resulting in an epoch transaction encountering a temp error being skipped entirely instead of being retried. Retry-on-temp error is critical to replication correctness, and is explicitly used to have multi-pass-apply when using transactional conflict detection. To avoid this situation recurring in future, the Ndb slave code is modified here to check that every epoch which is started (identified by an ndb_apply_status write_row event) is completed before a new epoch is started. The exception to the rule occurs when the Slave SQL thread is stopped and restarted (and hence a CHANGE MASTER could occur). This gives some protection against replication layer errors, and avoids data corruption / harder to debug downstream + later symptoms. This can be considered an extension of the existing check for epoch decline.
Frazer Clement authoredThis patch modifies the Ndb replication slave layer code to monitor for epoch skipping behaviour. Specifically, there have been issues where the generic replication layer slave retry-on-temp-error code has not functioned correctly, resulting in an epoch transaction encountering a temp error being skipped entirely instead of being retried. Retry-on-temp error is critical to replication correctness, and is explicitly used to have multi-pass-apply when using transactional conflict detection. To avoid this situation recurring in future, the Ndb slave code is modified here to check that every epoch which is started (identified by an ndb_apply_status write_row event) is completed before a new epoch is started. The exception to the rule occurs when the Slave SQL thread is stopped and restarted (and hence a CHANGE MASTER could occur). This gives some protection against replication layer errors, and avoids data corruption / harder to debug downstream + later symptoms. This can be considered an extension of the existing check for epoch decline.
Loading