Skip to content
  • Mauritz Sundell's avatar
    512749d4
    Bug#29832974 Handle early TAKE_OVERTCCONF signals. · 512749d4
    Mauritz Sundell authored
    
    
    Problem
    =======
    With the delay of sending NODE_FAILREP introduced in QMGR it will be more
    likely that TAKE_OVERTCCONF signal to DBTC arrives before NODE_FAILREP has
    arrived.
    
    Both due to how the local path of NODE_FAILREP have changed and that there
    are no synchronization between when other nodes may send TAKE_OVERTCCONF to
    DBTC and receiving DBTC have received NODE_FAILREP, or even if QMGR has yet
    seen COMMIT_FAILCONF.
    
    If TAKE_OVERTCCONF arrives early it will fail the node with:
    
    DBTC (Line: 10809) 0x00000002 Check i != end failed
    
    Solution
    ========
    The last seen failure number that a DBTC has seen is sent along with the
    TAKE_OVERTCCONF signal.
    
    If the recieving DBTC has not yet seen the node failure for the taken over
    node it can resend the signal to itself with a delay if it have not yet
    seen all failures that sender has.
    
    If receiving DBTC have seen all failures that sending DBTC has, and taken
    over node is not marked as failed, node will fail as before.
    
    Note
    ====
    Also remove support for TAKE_OVERTCCONF signal with only one word which is
    not sent from any data node newer than version 7.0.6.
    
    Reviewed-by: default avatarMikael Ronström <mikael.ronstrom@oracle.com>
    512749d4
    Bug#29832974 Handle early TAKE_OVERTCCONF signals.
    Mauritz Sundell authored
    
    
    Problem
    =======
    With the delay of sending NODE_FAILREP introduced in QMGR it will be more
    likely that TAKE_OVERTCCONF signal to DBTC arrives before NODE_FAILREP has
    arrived.
    
    Both due to how the local path of NODE_FAILREP have changed and that there
    are no synchronization between when other nodes may send TAKE_OVERTCCONF to
    DBTC and receiving DBTC have received NODE_FAILREP, or even if QMGR has yet
    seen COMMIT_FAILCONF.
    
    If TAKE_OVERTCCONF arrives early it will fail the node with:
    
    DBTC (Line: 10809) 0x00000002 Check i != end failed
    
    Solution
    ========
    The last seen failure number that a DBTC has seen is sent along with the
    TAKE_OVERTCCONF signal.
    
    If the recieving DBTC has not yet seen the node failure for the taken over
    node it can resend the signal to itself with a delay if it have not yet
    seen all failures that sender has.
    
    If receiving DBTC have seen all failures that sending DBTC has, and taken
    over node is not marked as failed, node will fail as before.
    
    Note
    ====
    Also remove support for TAKE_OVERTCCONF signal with only one word which is
    not sent from any data node newer than version 7.0.6.
    
    Reviewed-by: default avatarMikael Ronström <mikael.ronstrom@oracle.com>
Loading