Skip to content
  • Frazer Clement's avatar
    7025a4c5
    Bug #27497461 ERROR 1032: CAN'T FIND RECORD IN TABLE. DATA INACCESSIBLE AFTE · 7025a4c5
    Frazer Clement authored
    Problem with backup consistency under load.
    
    Description
    -----------
    
    Ndb online backups have a fuzzy DATA part and a redo/undo LOG part.
    
    To restore to a consistent state it is necessary to ensure that the
    LOG contains all of the changes spanning the capture of the fuzzy
    DATA part and beyond to a consistent snapshot point.
    
    This is achieved by waiting for a cluster GCI boundary to be passed
    *after* the DATA capture is completed before stopping change logging
    and recording the stop GCI in the backup metadata.
    
    At restore time, the LOG is replayed up to the stop GCI, restoring
    the system to the state it had at the consistent stop GCI.
    
    However, under load it is possible for a GCI boundary to be chosen
    which is too early, and does not span the last parts of the data
    captured.
    
    This can mean that when the backup is restored, there are inconsistencies
    in some of the data which changed towards the end of the backup.
    
    This may be noticed as broken constraints, or for example, corrupted
    Blob entries.
    
    Solution
    --------
    
    Ensure that StopGCI chosen is always spanning the duration of the
    capture of the fuzzy DATA.
    Ensure that Backup LOG always contains all data within StopGCI.
    Improve test coverage to check backup consistency under load.
    7025a4c5
    Bug #27497461 ERROR 1032: CAN'T FIND RECORD IN TABLE. DATA INACCESSIBLE AFTE
    Frazer Clement authored
    Problem with backup consistency under load.
    
    Description
    -----------
    
    Ndb online backups have a fuzzy DATA part and a redo/undo LOG part.
    
    To restore to a consistent state it is necessary to ensure that the
    LOG contains all of the changes spanning the capture of the fuzzy
    DATA part and beyond to a consistent snapshot point.
    
    This is achieved by waiting for a cluster GCI boundary to be passed
    *after* the DATA capture is completed before stopping change logging
    and recording the stop GCI in the backup metadata.
    
    At restore time, the LOG is replayed up to the stop GCI, restoring
    the system to the state it had at the consistent stop GCI.
    
    However, under load it is possible for a GCI boundary to be chosen
    which is too early, and does not span the last parts of the data
    captured.
    
    This can mean that when the backup is restored, there are inconsistencies
    in some of the data which changed towards the end of the backup.
    
    This may be noticed as broken constraints, or for example, corrupted
    Blob entries.
    
    Solution
    --------
    
    Ensure that StopGCI chosen is always spanning the duration of the
    capture of the fuzzy DATA.
    Ensure that Backup LOG always contains all data within StopGCI.
    Improve test coverage to check backup consistency under load.
Loading