mysql-test/t/myisam_debug.test · mysql-cluster-7.4.31 · Rasoul Jahanshahi / Mysql Server

May 31, 2011

Bug#11762221 - 54790: Use of non-blocking mode for sockets limits performance · dc30732c

Davi Arnaut authored May 31, 2011

Bug#11758972 - 51244: wait_timeout fails on OpenSolaris

The problem was that a optimization for the case when the server
uses alarms for timeouts could cause a slowdown when socket
timeouts are used instead. In case alarms are used for timeouts,
a non-blocking read is attempted first in order to avoid the
cost of setting up a alarm and if this non-blocking read fails,
the socket mode is changed to blocking and a alarm is armed.

If socket timeout is used, there is no point in attempting a
non-blocking read first as the timeout will be automatically
armed by the OS. Yet the server would attempt a non-blocking
read first and later switch the socket to blocking mode. This
could inadvertently impact performance as switching the blocking
mode of a socket requires at least two calls into the kernel
on Linux, apart from problems inherited by the scalability
of fcntl(2).

The solution is to remove alarm based timeouts from the
protocol layer and push timeout handling down to the virtual
I/O layer. This approach allows the handling of socket timeouts
on a platform-specific basis. The blocking mode of the socket
is no longer exported and VIO read and write operations either
complete or fail with a error or timeout.

On Linux, the MSG_DONTWAIT flag is used to enable non-blocking
send and receive operations. If the operation would block,
poll() is used to wait for readiness or until a timeout occurs.
This strategy avoids the need to set the socket timeout and
blocking mode twice per query.

On Windows, as before, the timeout is set on a per-socket
fashion. In all remaining operating systems, the socket is set
to non-blocking mode and poll() is used to wait for readiness
or until a timeout occurs.

In order to cleanup the code after the removal of alarm based
timeouts, the low level packet reading loop is unrolled into
two specific sequences: reading the packet header and the
payload. This makes error handling easier down the road.

In conclusion, benchmarks have shown that these changes do not
introduce any performance hits and actually slightly improves
the server throughput for higher numbers of threads.

- Incompatible changes:

A timeout is now always applied to a individual receive or
send I/O operation. In contrast, a alarm based timeout was
applied to an entire send or receive packet operation. That
is, before this patch the timeout was really a time limit
for sending or reading one packet.

Building and running MySQL on POSIX systems now requires
support for poll() and O_NONBLOCK. These should be available
in any modern POSIX system. In other words, except for Windows,
legacy (non-POSIX) systems which only support O_NDELAY and
select() are no longer supported.

On Windows, the default value for MYSQL_OPT_CONNECT_TIMEOUT
is no longer 20 seconds. The default value now is no timeout
(infinite), the same as in all other platforms.

Packets bigger than the maximum allowed packet size are no
longer skipped. Before this patch, if a application sent a
packet bigger than the maximum allowed packet size, or if
the server failed to allocate a buffer sufficiently large
to hold the packet, the server would keep reading the packet
until its end. Now the session is simply disconnected if the
server cannot handle such large packets.

The client socket buffer is no longer cleared (drained)
before sending commands to the server. Before this patch,
any data left in the socket buffer would be drained (removed)
before a command was sent to the server, in order to work
around bugs where the server would violate the protocol and
send more data. The only check left is a debug-only assertion
to ensure that the socket buffer is empty.

dc30732c

Bug#11762221 - 54790: Use of non-blocking mode for sockets limits performance

Davi Arnaut authored May 31, 2011

Bug#11758972 - 51244: wait_timeout fails on OpenSolaris

The problem was that a optimization for the case when the server
uses alarms for timeouts could cause a slowdown when socket
timeouts are used instead. In case alarms are used for timeouts,
a non-blocking read is attempted first in order to avoid the
cost of setting up a alarm and if this non-blocking read fails,
the socket mode is changed to blocking and a alarm is armed.

If socket timeout is used, there is no point in attempting a
non-blocking read first as the timeout will be automatically
armed by the OS. Yet the server would attempt a non-blocking
read first and later switch the socket to blocking mode. This
could inadvertently impact performance as switching the blocking
mode of a socket requires at least two calls into the kernel
on Linux, apart from problems inherited by the scalability
of fcntl(2).

The solution is to remove alarm based timeouts from the
protocol layer and push timeout handling down to the virtual
I/O layer. This approach allows the handling of socket timeouts
on a platform-specific basis. The blocking mode of the socket
is no longer exported and VIO read and write operations either
complete or fail with a error or timeout.

On Linux, the MSG_DONTWAIT flag is used to enable non-blocking
send and receive operations. If the operation would block,
poll() is used to wait for readiness or until a timeout occurs.
This strategy avoids the need to set the socket timeout and
blocking mode twice per query.

On Windows, as before, the timeout is set on a per-socket
fashion. In all remaining operating systems, the socket is set
to non-blocking mode and poll() is used to wait for readiness
or until a timeout occurs.

In order to cleanup the code after the removal of alarm based
timeouts, the low level packet reading loop is unrolled into
two specific sequences: reading the packet header and the
payload. This makes error handling easier down the road.

In conclusion, benchmarks have shown that these changes do not
introduce any performance hits and actually slightly improves
the server throughput for higher numbers of threads.

- Incompatible changes:

A timeout is now always applied to a individual receive or
send I/O operation. In contrast, a alarm based timeout was
applied to an entire send or receive packet operation. That
is, before this patch the timeout was really a time limit
for sending or reading one packet.

Building and running MySQL on POSIX systems now requires
support for poll() and O_NONBLOCK. These should be available
in any modern POSIX system. In other words, except for Windows,
legacy (non-POSIX) systems which only support O_NDELAY and
select() are no longer supported.

On Windows, the default value for MYSQL_OPT_CONNECT_TIMEOUT
is no longer 20 seconds. The default value now is no timeout
(infinite), the same as in all other platforms.

Packets bigger than the maximum allowed packet size are no
longer skipped. Before this patch, if a application sent a
packet bigger than the maximum allowed packet size, or if
the server failed to allocate a buffer sufficiently large
to hold the packet, the server would keep reading the packet
until its end. Now the session is simply disconnected if the
server cannot handle such large packets.

The client socket buffer is no longer cleared (drained)
before sending commands to the server. Before this patch,
any data left in the socket buffer would be drained (removed)
before a command was sent to the server, in order to work
around bugs where the server would violate the protocol and
send more data. The only check left is a debug-only assertion
to ensure that the socket buffer is empty.