Skip to content
  • Nuno Carvalho's avatar
    7d543a84
    BUG#32050454: GR_ACF_SWITCH_HIGHEST_WEIGHT_AUTO TESTCASE ASSERT FAIL ON WEEKLY TRUNK · 7d543a84
    Nuno Carvalho authored
    During gr_acf_switch_highest_weight_auto test there was a code
    assert being triggered. That assert validated that when the IO
    Monitor thread was running, there was at least one IO thread with
    SOURCE_CONNECTION_AUTO_FAILOVER=1 running.
    
    The issue was that Monitor IO thread does restart the IO thread in
    two steps:
      1) stop IO thread;
      2) start IO thread.
    The replica locks are only acquired on each step, that is between
    stop and start locks are not hold.
    This does allow:
      a) Monitor IO thread stops IO thread;
      b) STOP REPLICA is executed;
      c) Monitor IO thread starts IO thread.
    The assert was being triggered on b), since IO thread is stopped but
    the Monitor IO thread is running.
    Also this can leave IO threads and Monitor IO thread running, thence
    why we see the Monitor IO health check connections  being killed on
    the sources (BUG#32050607: GR_ACF_MSR_2GROUPS_FAILOVER TESTCASE
    FAILING ON WEEKLY TRUNK) on non-debug builds, where the assert is
    stripped.
    
    To solve the above issue, now the Monitor IO thread does restart the
    IO thread holding the replica locks during the complete procedure.
    
    RB: 25403
    7d543a84
    BUG#32050454: GR_ACF_SWITCH_HIGHEST_WEIGHT_AUTO TESTCASE ASSERT FAIL ON WEEKLY TRUNK
    Nuno Carvalho authored
    During gr_acf_switch_highest_weight_auto test there was a code
    assert being triggered. That assert validated that when the IO
    Monitor thread was running, there was at least one IO thread with
    SOURCE_CONNECTION_AUTO_FAILOVER=1 running.
    
    The issue was that Monitor IO thread does restart the IO thread in
    two steps:
      1) stop IO thread;
      2) start IO thread.
    The replica locks are only acquired on each step, that is between
    stop and start locks are not hold.
    This does allow:
      a) Monitor IO thread stops IO thread;
      b) STOP REPLICA is executed;
      c) Monitor IO thread starts IO thread.
    The assert was being triggered on b), since IO thread is stopped but
    the Monitor IO thread is running.
    Also this can leave IO threads and Monitor IO thread running, thence
    why we see the Monitor IO health check connections  being killed on
    the sources (BUG#32050607: GR_ACF_MSR_2GROUPS_FAILOVER TESTCASE
    FAILING ON WEEKLY TRUNK) on non-debug builds, where the assert is
    stripped.
    
    To solve the above issue, now the Monitor IO thread does restart the
    IO thread holding the replica locks during the complete procedure.
    
    RB: 25403
Loading