-
Mauritz Sundell authored
Problem ======= With the delay of sending NODE_FAILREP introduced in QMGR it will be more likely that TAKE_OVERTCCONF signal to DBTC arrives before NODE_FAILREP has arrived. Both due to how the local path of NODE_FAILREP have changed and that there are no synchronization between when other nodes may send TAKE_OVERTCCONF to DBTC and receiving DBTC have received NODE_FAILREP, or even if QMGR has yet seen COMMIT_FAILCONF. If TAKE_OVERTCCONF arrives early it will fail the node with: DBTC (Line: 10809) 0x00000002 Check i != end failed Solution ======== The last seen failure number that a DBTC has seen is sent along with the TAKE_OVERTCCONF signal. If the recieving DBTC has not yet seen the node failure for the taken over node it can resend the signal to itself with a delay if it have not yet seen all failures that sender has. If receiving DBTC have seen all failures that sending DBTC has, and taken over node is not marked as failed, node will fail as before. Note ==== Also remove support for TAKE_OVERTCCONF signal with only one word which is not sent from any data node newer than version 7.0.6. Reviewed-by:
Mikael Ronström <mikael.ronstrom@oracle.com>
Mauritz Sundell authoredProblem ======= With the delay of sending NODE_FAILREP introduced in QMGR it will be more likely that TAKE_OVERTCCONF signal to DBTC arrives before NODE_FAILREP has arrived. Both due to how the local path of NODE_FAILREP have changed and that there are no synchronization between when other nodes may send TAKE_OVERTCCONF to DBTC and receiving DBTC have received NODE_FAILREP, or even if QMGR has yet seen COMMIT_FAILCONF. If TAKE_OVERTCCONF arrives early it will fail the node with: DBTC (Line: 10809) 0x00000002 Check i != end failed Solution ======== The last seen failure number that a DBTC has seen is sent along with the TAKE_OVERTCCONF signal. If the recieving DBTC has not yet seen the node failure for the taken over node it can resend the signal to itself with a delay if it have not yet seen all failures that sender has. If receiving DBTC have seen all failures that sending DBTC has, and taken over node is not marked as failed, node will fail as before. Note ==== Also remove support for TAKE_OVERTCCONF signal with only one word which is not sent from any data node newer than version 7.0.6. Reviewed-by:
Mikael Ronström <mikael.ronstrom@oracle.com>
Loading