-
Nuno Carvalho authored
After extensive investigation and discussion, we concluded that the bind failures on Windows are expected due to two reasons: a) Group Replication does use SO_EXCLUSIVEADDRUSE on its sockets on Windows. This disallows that a second application can eavesdrop on a socket of another application. This option on some situations requires that a given time delta elapses between socket close, socket reopen. Exactly what we are seeing on PB2 and Hudson. https://msdn.microsoft.com/en-us/library/windows/desktop/ms740621(v=vs.85).aspx b) On Group Replication we have both client and server sockets, due to the event driven of our group communication library, we do not control the socket close order. When the server socket closes after the client socket, we jump into one of the situations on which SO_EXCLUSIVEADDRUSE option does require a given time delta between socket close and reopen. To minimize this behaviour, we did try several approaches, despite all had reduced the number if bind failures, none did make the number shrink to 0. Which leaves us with only one ultimate solution: make MTR deal with bind failures in a graceful fashion. The approach is: After START GROUP_REPLICATION, we check if the command did fail due to a bind failure, if it did: on Windows systems: the test will skip itself; on other systems: the test will fail.
Nuno Carvalho authoredAfter extensive investigation and discussion, we concluded that the bind failures on Windows are expected due to two reasons: a) Group Replication does use SO_EXCLUSIVEADDRUSE on its sockets on Windows. This disallows that a second application can eavesdrop on a socket of another application. This option on some situations requires that a given time delta elapses between socket close, socket reopen. Exactly what we are seeing on PB2 and Hudson. https://msdn.microsoft.com/en-us/library/windows/desktop/ms740621(v=vs.85).aspx b) On Group Replication we have both client and server sockets, due to the event driven of our group communication library, we do not control the socket close order. When the server socket closes after the client socket, we jump into one of the situations on which SO_EXCLUSIVEADDRUSE option does require a given time delta between socket close and reopen. To minimize this behaviour, we did try several approaches, despite all had reduced the number if bind failures, none did make the number shrink to 0. Which leaves us with only one ultimate solution: make MTR deal with bind failures in a graceful fashion. The approach is: After START GROUP_REPLICATION, we check if the command did fail due to a bind failure, if it did: on Windows systems: the test will skip itself; on other systems: the test will fail.
Loading