-
Frazer Clement authored
Problem with backup consistency under load. Description ----------- Ndb online backups have a fuzzy DATA part and a redo/undo LOG part. To restore to a consistent state it is necessary to ensure that the LOG contains all of the changes spanning the capture of the fuzzy DATA part and beyond to a consistent snapshot point. This is achieved by waiting for a cluster GCI boundary to be passed *after* the DATA capture is completed before stopping change logging and recording the stop GCI in the backup metadata. At restore time, the LOG is replayed up to the stop GCI, restoring the system to the state it had at the consistent stop GCI. However, under load it is possible for a GCI boundary to be chosen which is too early, and does not span the last parts of the data captured. This can mean that when the backup is restored, there are inconsistencies in some of the data which changed towards the end of the backup. This may be noticed as broken constraints, or for example, corrupted Blob entries. Solution -------- Ensure that StopGCI chosen is always spanning the duration of the capture of the fuzzy DATA. Ensure that Backup LOG always contains all data within StopGCI. Improve test coverage to check backup consistency under load.
Frazer Clement authoredProblem with backup consistency under load. Description ----------- Ndb online backups have a fuzzy DATA part and a redo/undo LOG part. To restore to a consistent state it is necessary to ensure that the LOG contains all of the changes spanning the capture of the fuzzy DATA part and beyond to a consistent snapshot point. This is achieved by waiting for a cluster GCI boundary to be passed *after* the DATA capture is completed before stopping change logging and recording the stop GCI in the backup metadata. At restore time, the LOG is replayed up to the stop GCI, restoring the system to the state it had at the consistent stop GCI. However, under load it is possible for a GCI boundary to be chosen which is too early, and does not span the last parts of the data captured. This can mean that when the backup is restored, there are inconsistencies in some of the data which changed towards the end of the backup. This may be noticed as broken constraints, or for example, corrupted Blob entries. Solution -------- Ensure that StopGCI chosen is always spanning the duration of the capture of the fuzzy DATA. Ensure that Backup LOG always contains all data within StopGCI. Improve test coverage to check backup consistency under load.
Loading