-
Frazer Clement authored
MIS$ The previous debugging push gave enough context to understand the intermittent failures with this testcase. The testcase is restoring a Master backup on a Slave, then using the restore point epoch to resume replication. The testcase checks to ensure that the backup restore point and the binlog content following this point are mutually exclusive, based on the number of inserted rows restored by each. However, the logic to set the binlog position based on the restore epoch was assuming that there would always be a following epoch in the master's ndb_binlog_index file, which could be used to set the position. If this was not the case then the query returned no results, and the previous settings for the file and position variables were used, giving non deterministic results. Presumably this is caused by the backup being slower on slow hosts, so it has a restore point after all of the changes (causing epochs to be recorded) are finished. This problem is fixed by improving the logic to set the binlog position to handle the 3 scenarios : a) An epoch record exists for the restore point (Probably never hit for a restore, but common for channel cutover) b) An epoch record exists after the restore point (Common when there is ongoing activity on the master c) No epoch record exists after the restore point, but one does exist before it (Common when the master is quiet) So if the backup completion point is after the end of the changes then we will go to case c) and we will choose a restore point after the backup. To handle the case where there is nothing to restore from the log, we map a max row value of NULL from the log to 200. Approved by : Priyanka Sangam <priyanka.sangam@oracle.com>
Frazer Clement authoredMIS$ The previous debugging push gave enough context to understand the intermittent failures with this testcase. The testcase is restoring a Master backup on a Slave, then using the restore point epoch to resume replication. The testcase checks to ensure that the backup restore point and the binlog content following this point are mutually exclusive, based on the number of inserted rows restored by each. However, the logic to set the binlog position based on the restore epoch was assuming that there would always be a following epoch in the master's ndb_binlog_index file, which could be used to set the position. If this was not the case then the query returned no results, and the previous settings for the file and position variables were used, giving non deterministic results. Presumably this is caused by the backup being slower on slow hosts, so it has a restore point after all of the changes (causing epochs to be recorded) are finished. This problem is fixed by improving the logic to set the binlog position to handle the 3 scenarios : a) An epoch record exists for the restore point (Probably never hit for a restore, but common for channel cutover) b) An epoch record exists after the restore point (Common when there is ongoing activity on the master c) No epoch record exists after the restore point, but one does exist before it (Common when the master is quiet) So if the backup completion point is after the end of the changes then we will go to case c) and we will choose a restore point after the backup. To handle the case where there is nothing to restore from the log, we map a max row value of NULL from the log to 200. Approved by : Priyanka Sangam <priyanka.sangam@oracle.com>
Loading