-
Davi Arnaut authored
Bug#11758972 - 51244: wait_timeout fails on OpenSolaris The problem was that a optimization for the case when the server uses alarms for timeouts could cause a slowdown when socket timeouts are used instead. In case alarms are used for timeouts, a non-blocking read is attempted first in order to avoid the cost of setting up a alarm and if this non-blocking read fails, the socket mode is changed to blocking and a alarm is armed. If socket timeout is used, there is no point in attempting a non-blocking read first as the timeout will be automatically armed by the OS. Yet the server would attempt a non-blocking read first and later switch the socket to blocking mode. This could inadvertently impact performance as switching the blocking mode of a socket requires at least two calls into the kernel on Linux, apart from problems inherited by the scalability of fcntl(2). The solution is to remove alarm based timeouts from the protocol layer and push timeout handling down to the virtual I/O layer. This approach allows the handling of socket timeouts on a platform-specific basis. The blocking mode of the socket is no longer exported and VIO read and write operations either complete or fail with a error or timeout. On Linux, the MSG_DONTWAIT flag is used to enable non-blocking send and receive operations. If the operation would block, poll() is used to wait for readiness or until a timeout occurs. This strategy avoids the need to set the socket timeout and blocking mode twice per query. On Windows, as before, the timeout is set on a per-socket fashion. In all remaining operating systems, the socket is set to non-blocking mode and poll() is used to wait for readiness or until a timeout occurs. In order to cleanup the code after the removal of alarm based timeouts, the low level packet reading loop is unrolled into two specific sequences: reading the packet header and the payload. This makes error handling easier down the road. In conclusion, benchmarks have shown that these changes do not introduce any performance hits and actually slightly improves the server throughput for higher numbers of threads. - Incompatible changes: A timeout is now always applied to a individual receive or send I/O operation. In contrast, a alarm based timeout was applied to an entire send or receive packet operation. That is, before this patch the timeout was really a time limit for sending or reading one packet. Building and running MySQL on POSIX systems now requires support for poll() and O_NONBLOCK. These should be available in any modern POSIX system. In other words, except for Windows, legacy (non-POSIX) systems which only support O_NDELAY and select() are no longer supported. On Windows, the default value for MYSQL_OPT_CONNECT_TIMEOUT is no longer 20 seconds. The default value now is no timeout (infinite), the same as in all other platforms. Packets bigger than the maximum allowed packet size are no longer skipped. Before this patch, if a application sent a packet bigger than the maximum allowed packet size, or if the server failed to allocate a buffer sufficiently large to hold the packet, the server would keep reading the packet until its end. Now the session is simply disconnected if the server cannot handle such large packets. The client socket buffer is no longer cleared (drained) before sending commands to the server. Before this patch, any data left in the socket buffer would be drained (removed) before a command was sent to the server, in order to work around bugs where the server would violate the protocol and send more data. The only check left is a debug-only assertion to ensure that the socket buffer is empty.
Davi Arnaut authoredBug#11758972 - 51244: wait_timeout fails on OpenSolaris The problem was that a optimization for the case when the server uses alarms for timeouts could cause a slowdown when socket timeouts are used instead. In case alarms are used for timeouts, a non-blocking read is attempted first in order to avoid the cost of setting up a alarm and if this non-blocking read fails, the socket mode is changed to blocking and a alarm is armed. If socket timeout is used, there is no point in attempting a non-blocking read first as the timeout will be automatically armed by the OS. Yet the server would attempt a non-blocking read first and later switch the socket to blocking mode. This could inadvertently impact performance as switching the blocking mode of a socket requires at least two calls into the kernel on Linux, apart from problems inherited by the scalability of fcntl(2). The solution is to remove alarm based timeouts from the protocol layer and push timeout handling down to the virtual I/O layer. This approach allows the handling of socket timeouts on a platform-specific basis. The blocking mode of the socket is no longer exported and VIO read and write operations either complete or fail with a error or timeout. On Linux, the MSG_DONTWAIT flag is used to enable non-blocking send and receive operations. If the operation would block, poll() is used to wait for readiness or until a timeout occurs. This strategy avoids the need to set the socket timeout and blocking mode twice per query. On Windows, as before, the timeout is set on a per-socket fashion. In all remaining operating systems, the socket is set to non-blocking mode and poll() is used to wait for readiness or until a timeout occurs. In order to cleanup the code after the removal of alarm based timeouts, the low level packet reading loop is unrolled into two specific sequences: reading the packet header and the payload. This makes error handling easier down the road. In conclusion, benchmarks have shown that these changes do not introduce any performance hits and actually slightly improves the server throughput for higher numbers of threads. - Incompatible changes: A timeout is now always applied to a individual receive or send I/O operation. In contrast, a alarm based timeout was applied to an entire send or receive packet operation. That is, before this patch the timeout was really a time limit for sending or reading one packet. Building and running MySQL on POSIX systems now requires support for poll() and O_NONBLOCK. These should be available in any modern POSIX system. In other words, except for Windows, legacy (non-POSIX) systems which only support O_NDELAY and select() are no longer supported. On Windows, the default value for MYSQL_OPT_CONNECT_TIMEOUT is no longer 20 seconds. The default value now is no timeout (infinite), the same as in all other platforms. Packets bigger than the maximum allowed packet size are no longer skipped. Before this patch, if a application sent a packet bigger than the maximum allowed packet size, or if the server failed to allocate a buffer sufficiently large to hold the packet, the server would keep reading the packet until its end. Now the session is simply disconnected if the server cannot handle such large packets. The client socket buffer is no longer cleared (drained) before sending commands to the server. Before this patch, any data left in the socket buffer would be drained (removed) before a command was sent to the server, in order to work around bugs where the server would violate the protocol and send more data. The only check left is a debug-only assertion to ensure that the socket buffer is empty.
Loading