Skip to content
  • Frazer Clement's avatar
    0c50b422
    Bug #30411728 NDB_RPL.NDB_RPL_BACKUP_EPOCH FAILS IN PB2 WITH RESULT CONTENT · 0c50b422
    Frazer Clement authored
    MIS$
    
    The previous debugging push gave enough context to understand the
    intermittent failures with this testcase.
    
    The testcase is restoring a Master backup on a Slave, then using
    the restore point epoch to resume replication.
    The testcase checks to ensure that the backup restore point
    and the binlog content following this point are mutually exclusive,
    based on the number of inserted rows restored by each.
    
    However, the logic to set the binlog position based on the restore
    epoch was assuming that there would always be a following epoch in
    the master's ndb_binlog_index file, which could be used to set
    the position.
    
    If this was not the case then the query returned no results, and
    the previous settings for the file and position variables were
    used, giving non deterministic results.
    
    Presumably this is caused by the backup being slower on slow hosts,
    so it has a restore point after all of the changes (causing epochs
    to be recorded) are finished.
    
    This problem is fixed by improving the logic to set the binlog
    position to handle the 3 scenarios :
     a) An epoch record exists for the restore point
        (Probably never hit for a restore, but common for channel cutover)
     b) An epoch record exists after the restore point
        (Common when there is ongoing activity on the master
     c) No epoch record exists after the restore point, but one does
        exist before it
        (Common when the master is quiet)
    
    So if the backup completion point is after the end of the changes
    then we will go to case c) and we will choose a restore point after
    the backup.
    
    To handle the case where there is nothing to restore from the log,
    we map a max row value of NULL from the log to 200.
    
    Approved by : Priyanka Sangam <priyanka.sangam@oracle.com>
    0c50b422
    Bug #30411728 NDB_RPL.NDB_RPL_BACKUP_EPOCH FAILS IN PB2 WITH RESULT CONTENT
    Frazer Clement authored
    MIS$
    
    The previous debugging push gave enough context to understand the
    intermittent failures with this testcase.
    
    The testcase is restoring a Master backup on a Slave, then using
    the restore point epoch to resume replication.
    The testcase checks to ensure that the backup restore point
    and the binlog content following this point are mutually exclusive,
    based on the number of inserted rows restored by each.
    
    However, the logic to set the binlog position based on the restore
    epoch was assuming that there would always be a following epoch in
    the master's ndb_binlog_index file, which could be used to set
    the position.
    
    If this was not the case then the query returned no results, and
    the previous settings for the file and position variables were
    used, giving non deterministic results.
    
    Presumably this is caused by the backup being slower on slow hosts,
    so it has a restore point after all of the changes (causing epochs
    to be recorded) are finished.
    
    This problem is fixed by improving the logic to set the binlog
    position to handle the 3 scenarios :
     a) An epoch record exists for the restore point
        (Probably never hit for a restore, but common for channel cutover)
     b) An epoch record exists after the restore point
        (Common when there is ongoing activity on the master
     c) No epoch record exists after the restore point, but one does
        exist before it
        (Common when the master is quiet)
    
    So if the backup completion point is after the end of the changes
    then we will go to case c) and we will choose a restore point after
    the backup.
    
    To handle the case where there is nothing to restore from the log,
    we map a max row value of NULL from the log to 200.
    
    Approved by : Priyanka Sangam <priyanka.sangam@oracle.com>
Loading