-
Nuno Carvalho authored
During gr_acf_switch_highest_weight_auto test there was a code assert being triggered. That assert validated that when the IO Monitor thread was running, there was at least one IO thread with SOURCE_CONNECTION_AUTO_FAILOVER=1 running. The issue was that Monitor IO thread does restart the IO thread in two steps: 1) stop IO thread; 2) start IO thread. The replica locks are only acquired on each step, that is between stop and start locks are not hold. This does allow: a) Monitor IO thread stops IO thread; b) STOP REPLICA is executed; c) Monitor IO thread starts IO thread. The assert was being triggered on b), since IO thread is stopped but the Monitor IO thread is running. Also this can leave IO threads and Monitor IO thread running, thence why we see the Monitor IO health check connections being killed on the sources (BUG#32050607: GR_ACF_MSR_2GROUPS_FAILOVER TESTCASE FAILING ON WEEKLY TRUNK) on non-debug builds, where the assert is stripped. To solve the above issue, now the Monitor IO thread does restart the IO thread holding the replica locks during the complete procedure. RB: 25403
Nuno Carvalho authoredDuring gr_acf_switch_highest_weight_auto test there was a code assert being triggered. That assert validated that when the IO Monitor thread was running, there was at least one IO thread with SOURCE_CONNECTION_AUTO_FAILOVER=1 running. The issue was that Monitor IO thread does restart the IO thread in two steps: 1) stop IO thread; 2) start IO thread. The replica locks are only acquired on each step, that is between stop and start locks are not hold. This does allow: a) Monitor IO thread stops IO thread; b) STOP REPLICA is executed; c) Monitor IO thread starts IO thread. The assert was being triggered on b), since IO thread is stopped but the Monitor IO thread is running. Also this can leave IO threads and Monitor IO thread running, thence why we see the Monitor IO health check connections being killed on the sources (BUG#32050607: GR_ACF_MSR_2GROUPS_FAILOVER TESTCASE FAILING ON WEEKLY TRUNK) on non-debug builds, where the assert is stripped. To solve the above issue, now the Monitor IO thread does restart the IO thread holding the replica locks during the complete procedure. RB: 25403
Loading