-
Frazer Clement authored
For cluster shutdown, or ALL STOP / ALL RESTART management actions, it is possible for different nodes to attempt to stop on different GCI boundaries. If they succeed on stopping on different GCIs then a following System Restart (SR) will be slower as the nodes with earlier stop GCI must undergo a 'takeover' process as part of SR. Alternatively, if the set of nodes failing on the first GCI boundary makes the surviving nodes non-viable, then the surviving nodes suffer an arbitration failure. This arbitration failure has the positive effect of causing them to 'stop' in the correct GCI, but the negative effect of appearing to be a bug / testcase failure / being ugly. A testcase is added to testSystemRestart which delays shutdown to show the problem. Extra synchronisation is added to 'graceful' shutdown to reduce the chance that different data nodes attempt to shutdown in different GCIs. This should avoid spurious arbitration errors during shutdowns/ restarts (perhaps more common with larger clusters) and also potentially reduce the use of Takeover during SR. Approved by : Maitrayi Sabaratnam <maitrayi.sabaratnam@oracle.com>
Frazer Clement authoredFor cluster shutdown, or ALL STOP / ALL RESTART management actions, it is possible for different nodes to attempt to stop on different GCI boundaries. If they succeed on stopping on different GCIs then a following System Restart (SR) will be slower as the nodes with earlier stop GCI must undergo a 'takeover' process as part of SR. Alternatively, if the set of nodes failing on the first GCI boundary makes the surviving nodes non-viable, then the surviving nodes suffer an arbitration failure. This arbitration failure has the positive effect of causing them to 'stop' in the correct GCI, but the negative effect of appearing to be a bug / testcase failure / being ugly. A testcase is added to testSystemRestart which delays shutdown to show the problem. Extra synchronisation is added to 'graceful' shutdown to reduce the chance that different data nodes attempt to shutdown in different GCIs. This should avoid spurious arbitration errors during shutdowns/ restarts (perhaps more common with larger clusters) and also potentially reduce the use of Takeover during SR. Approved by : Maitrayi Sabaratnam <maitrayi.sabaratnam@oracle.com>
Loading