Skip to content
  • Nuno Carvalho's avatar
    2cc06083
    BUG#24964923: SPORADIC GR LOCAL PORT BIND FAILURES ON WINDOWS · 2cc06083
    Nuno Carvalho authored
    After extensive investigation and discussion, we concluded that the
    bind failures on Windows are expected due to two reasons:
    a) Group Replication does use SO_EXCLUSIVEADDRUSE on its sockets
       on Windows. This disallows that a second application can
       eavesdrop on a socket of another application.
       This option on some situations requires that a given time delta
       elapses between socket close, socket reopen. Exactly what we are
       seeing on PB2 and Hudson.
       https://msdn.microsoft.com/en-us/library/windows/desktop/ms740621(v=vs.85).aspx
    b) On Group Replication we have both client and server sockets, due
       to the event driven of our group communication library, we do
       not control the socket close order.
       When the server socket closes after the client socket, we jump
       into one of the situations on which SO_EXCLUSIVEADDRUSE option
       does require a given time delta between socket close and reopen.
    
    To minimize this behaviour, we did try several approaches, despite
    all had reduced the number if bind failures, none did make the
    number shrink to 0.
    Which leaves us with only one ultimate solution: make MTR deal
    with bind failures in a graceful fashion.
    The approach is:
      After START GROUP_REPLICATION, we check if the command did
      fail due to a bind failure, if it did:
        on Windows systems: the test will skip itself;
        on other systems:   the test will fail.
    2cc06083
    BUG#24964923: SPORADIC GR LOCAL PORT BIND FAILURES ON WINDOWS
    Nuno Carvalho authored
    After extensive investigation and discussion, we concluded that the
    bind failures on Windows are expected due to two reasons:
    a) Group Replication does use SO_EXCLUSIVEADDRUSE on its sockets
       on Windows. This disallows that a second application can
       eavesdrop on a socket of another application.
       This option on some situations requires that a given time delta
       elapses between socket close, socket reopen. Exactly what we are
       seeing on PB2 and Hudson.
       https://msdn.microsoft.com/en-us/library/windows/desktop/ms740621(v=vs.85).aspx
    b) On Group Replication we have both client and server sockets, due
       to the event driven of our group communication library, we do
       not control the socket close order.
       When the server socket closes after the client socket, we jump
       into one of the situations on which SO_EXCLUSIVEADDRUSE option
       does require a given time delta between socket close and reopen.
    
    To minimize this behaviour, we did try several approaches, despite
    all had reduced the number if bind failures, none did make the
    number shrink to 0.
    Which leaves us with only one ultimate solution: make MTR deal
    with bind failures in a graceful fashion.
    The approach is:
      After START GROUP_REPLICATION, we check if the command did
      fail due to a bind failure, if it did:
        on Windows systems: the test will skip itself;
        on other systems:   the test will fail.
Loading