Skip to content
  • Davi Arnaut's avatar
    dc30732c
    Bug#11762221 - 54790: Use of non-blocking mode for sockets limits performance · dc30732c
    Davi Arnaut authored
    Bug#11758972 - 51244: wait_timeout fails on OpenSolaris
    
    The problem was that a optimization for the case when the server
    uses alarms for timeouts could cause a slowdown when socket
    timeouts are used instead. In case alarms are used for timeouts,
    a non-blocking read is attempted first in order to avoid the
    cost of setting up a alarm and if this non-blocking read fails,
    the socket mode is changed to blocking and a alarm is armed.
    
    If socket timeout is used, there is no point in attempting a
    non-blocking read first as the timeout will be automatically
    armed by the OS. Yet the server would attempt a non-blocking
    read first and later switch the socket to blocking mode. This
    could inadvertently impact performance as switching the blocking
    mode of a socket requires at least two calls into the kernel
    on Linux, apart from problems inherited by the scalability
    of fcntl(2).
    
    The solution is to remove alarm based timeouts from the
    protocol layer and push timeout handling down to the virtual
    I/O layer. This approach allows the handling of socket timeouts
    on a platform-specific basis. The blocking mode of the socket
    is no longer exported and VIO read and write operations either
    complete or fail with a error or timeout.
    
    On Linux, the MSG_DONTWAIT flag is used to enable non-blocking
    send and receive operations. If the operation would block,
    poll() is used to wait for readiness or until a timeout occurs.
    This strategy avoids the need to set the socket timeout and
    blocking mode twice per query.
    
    On Windows, as before, the timeout is set on a per-socket
    fashion. In all remaining operating systems, the socket is set
    to non-blocking mode and poll() is used to wait for readiness
    or until a timeout occurs.
    
    In order to cleanup the code after the removal of alarm based
    timeouts, the low level packet reading loop is unrolled into
    two specific sequences: reading the packet header and the
    payload. This makes error handling easier down the road.
    
    In conclusion, benchmarks have shown that these changes do not
    introduce any performance hits and actually slightly improves
    the server throughput for higher numbers of threads.
    
    - Incompatible changes:
    
    A timeout is now always applied to a individual receive or
    send I/O operation. In contrast, a alarm based timeout was
    applied to an entire send or receive packet operation. That
    is, before this patch the timeout was really a time limit
    for sending or reading one packet.
    
    Building and running MySQL on POSIX systems now requires
    support for poll() and O_NONBLOCK. These should be available
    in any modern POSIX system. In other words, except for Windows,
    legacy (non-POSIX) systems which only support O_NDELAY and
    select() are no longer supported.
    
    On Windows, the default value for MYSQL_OPT_CONNECT_TIMEOUT
    is no longer 20 seconds. The default value now is no timeout
    (infinite), the same as in all other platforms.
    
    Packets bigger than the maximum allowed packet size are no
    longer skipped. Before this patch, if a application sent a
    packet bigger than the maximum allowed packet size, or if
    the server failed to allocate a buffer sufficiently large
    to hold the packet, the server would keep reading the packet
    until its end. Now the session is simply disconnected if the
    server cannot handle such large packets.
    
    The client socket buffer is no longer cleared (drained)
    before sending commands to the server. Before this patch,
    any data left in the socket buffer would be drained (removed)
    before a command was sent to the server, in order to work
    around bugs where the server would violate the protocol and
    send more data. The only check left is a debug-only assertion
    to ensure that the socket buffer is empty.
    dc30732c
    Bug#11762221 - 54790: Use of non-blocking mode for sockets limits performance
    Davi Arnaut authored
    Bug#11758972 - 51244: wait_timeout fails on OpenSolaris
    
    The problem was that a optimization for the case when the server
    uses alarms for timeouts could cause a slowdown when socket
    timeouts are used instead. In case alarms are used for timeouts,
    a non-blocking read is attempted first in order to avoid the
    cost of setting up a alarm and if this non-blocking read fails,
    the socket mode is changed to blocking and a alarm is armed.
    
    If socket timeout is used, there is no point in attempting a
    non-blocking read first as the timeout will be automatically
    armed by the OS. Yet the server would attempt a non-blocking
    read first and later switch the socket to blocking mode. This
    could inadvertently impact performance as switching the blocking
    mode of a socket requires at least two calls into the kernel
    on Linux, apart from problems inherited by the scalability
    of fcntl(2).
    
    The solution is to remove alarm based timeouts from the
    protocol layer and push timeout handling down to the virtual
    I/O layer. This approach allows the handling of socket timeouts
    on a platform-specific basis. The blocking mode of the socket
    is no longer exported and VIO read and write operations either
    complete or fail with a error or timeout.
    
    On Linux, the MSG_DONTWAIT flag is used to enable non-blocking
    send and receive operations. If the operation would block,
    poll() is used to wait for readiness or until a timeout occurs.
    This strategy avoids the need to set the socket timeout and
    blocking mode twice per query.
    
    On Windows, as before, the timeout is set on a per-socket
    fashion. In all remaining operating systems, the socket is set
    to non-blocking mode and poll() is used to wait for readiness
    or until a timeout occurs.
    
    In order to cleanup the code after the removal of alarm based
    timeouts, the low level packet reading loop is unrolled into
    two specific sequences: reading the packet header and the
    payload. This makes error handling easier down the road.
    
    In conclusion, benchmarks have shown that these changes do not
    introduce any performance hits and actually slightly improves
    the server throughput for higher numbers of threads.
    
    - Incompatible changes:
    
    A timeout is now always applied to a individual receive or
    send I/O operation. In contrast, a alarm based timeout was
    applied to an entire send or receive packet operation. That
    is, before this patch the timeout was really a time limit
    for sending or reading one packet.
    
    Building and running MySQL on POSIX systems now requires
    support for poll() and O_NONBLOCK. These should be available
    in any modern POSIX system. In other words, except for Windows,
    legacy (non-POSIX) systems which only support O_NDELAY and
    select() are no longer supported.
    
    On Windows, the default value for MYSQL_OPT_CONNECT_TIMEOUT
    is no longer 20 seconds. The default value now is no timeout
    (infinite), the same as in all other platforms.
    
    Packets bigger than the maximum allowed packet size are no
    longer skipped. Before this patch, if a application sent a
    packet bigger than the maximum allowed packet size, or if
    the server failed to allocate a buffer sufficiently large
    to hold the packet, the server would keep reading the packet
    until its end. Now the session is simply disconnected if the
    server cannot handle such large packets.
    
    The client socket buffer is no longer cleared (drained)
    before sending commands to the server. Before this patch,
    any data left in the socket buffer would be drained (removed)
    before a command was sent to the server, in order to work
    around bugs where the server would violate the protocol and
    send more data. The only check left is a debug-only assertion
    to ensure that the socket buffer is empty.
Loading