mysql-test/suite/rpl/t/rpl_io_thd_wait_for_disk_space_stress.test · mysql-8.0.3 · Rasoul Jahanshahi / Mysql Server

Mar 07, 2017

WL#10406: Improve usability when receiver thread is waiting · 6de594ad

Joao Gramacho authored Mar 07, 2017

          for disk space

Step 1
======

This patch replaced the requirement to access Master_info format
description event from Master_info->data_lock to relay_log->LOCK_log.

It also changed the locking order at queue_event() to
relay-log->LOCK_log, Master_info->data_lock.

It made the SQL thread not rely on the relay log LOCK_log to be
stopped anymore.

Step 2
======

Truncating the relay log in correct event boundaries
----------------------------------------------------

This patch introduced the MYSQL_BIN_LOG::truncate_relaylog_file(). This
function is called after errors writing events to the relay log, passing
the relay log end pos (the end of the last known successfully written
event) to minimize the possibility of the applier thread to read a
partial (bad) event.

Displaying "Waiting for disk space" on status
---------------------------------------------

This patch introduced enter_stage_hook to make my_write() able to set
the current thread stage as "Waiting for disk space" before calling
wait_for_free_space() function and restoring the previous thread stage
after the function call.

This will make any thread waiting for disk space on my_write() to report
this information not only in error logs but also in thread status
interfaces (performance schema tables, SHOW SLAVE STATUS, etc.).

WL related bug fixes
====================

BUG#26111422 ASSERTION `IS_OPEN()' FAILED AT
             MYSQL_BIN_LOG::TRUNCATE_RELAYLOG_FILE

Problem
-------

The replica server is trying to truncate a closed relay log file when an
unrecoverable error occurred while rotating the relay log.

Analysis
--------

In the case of an unrecoverable error when rotating the relay log, the
server will take the configured BINLOG_ERROR_ACTION.

When BINLOG_ERROR_ACTION=ABORT_SERVER, the server will be shutdown.

When BINLOG_ERROR_ACTION=IGNORE_ERROR, the server will close the relay
log. The only way of recovering the closed relay log is to restart the
whole server.

The code at truncate_relaylog_file() is asserting that the relay log was
opened when called to prevent trying to truncate a closed relay log.

Fix
---

Because of the possibility of calling the function after an error
rotating the relay log, the truncate_relaylog_file() function should
not assert that the relay log is open and also should take no action
when the relay log was closed.

BUG#26161405 EXECUTING STOP SLAVE WHEN IO_THREAD IS "WAITING FOR DISK
             SPACE" CAUSES PROBLEMS

Problems
-------

STOP SLAVE [IO_THREAD] will set mi->abort_slave flag and will wait until
the I/O thread to be stopped.

When the I/O thread is waiting for disk space, the mi->abort_slave
signal will not be checked by the I/O thread until finishing queuing the
current event. So, "STOP SLAVE" will be blocked (until STOP SLAVE
timeout with an error).

Also, any thread waiting for disk space at "my_write" (thread that used
the MY_WAIT_IF_FULL flag) could report itself as "Waiting for disk
space".

Shutting down the server while having an I/O thread waiting for disk
space would hang the server without accepting new connections until disk
space be freed.

Fixes
-----

STOP SLAVE [IO_THREAD] throws a warning message into the server error
log recommending either to free some disk space or to use 'KILL' to
abort I/O thread operation.

Only the relay log related operations will change the thread status to
"Waiting for disk space". A new flag was used to signal the my_write
function to change the thread status.

Shutting down the server while having an I/O thread waiting for disk
space will make the I/O thread to be killed, truncating the current
relay log file if possible.

Fixed a doxygen issue at MYSQL_BIN_LOG::truncate_relaylog_file().

Fixed an issue in "performance_schema.threads" that was not showing
"Waiting for disk space" at PROCESSLIST_STATE field.

6de594ad

WL#10406: Improve usability when receiver thread is waiting

Joao Gramacho authored Mar 07, 2017

          for disk space

Step 1
======

This patch replaced the requirement to access Master_info format
description event from Master_info->data_lock to relay_log->LOCK_log.

It also changed the locking order at queue_event() to
relay-log->LOCK_log, Master_info->data_lock.

It made the SQL thread not rely on the relay log LOCK_log to be
stopped anymore.

Step 2
======

Truncating the relay log in correct event boundaries
----------------------------------------------------

This patch introduced the MYSQL_BIN_LOG::truncate_relaylog_file(). This
function is called after errors writing events to the relay log, passing
the relay log end pos (the end of the last known successfully written
event) to minimize the possibility of the applier thread to read a
partial (bad) event.

Displaying "Waiting for disk space" on status
---------------------------------------------

This patch introduced enter_stage_hook to make my_write() able to set
the current thread stage as "Waiting for disk space" before calling
wait_for_free_space() function and restoring the previous thread stage
after the function call.

This will make any thread waiting for disk space on my_write() to report
this information not only in error logs but also in thread status
interfaces (performance schema tables, SHOW SLAVE STATUS, etc.).

WL related bug fixes
====================

BUG#26111422 ASSERTION `IS_OPEN()' FAILED AT
             MYSQL_BIN_LOG::TRUNCATE_RELAYLOG_FILE

Problem
-------

The replica server is trying to truncate a closed relay log file when an
unrecoverable error occurred while rotating the relay log.

Analysis
--------

In the case of an unrecoverable error when rotating the relay log, the
server will take the configured BINLOG_ERROR_ACTION.

When BINLOG_ERROR_ACTION=ABORT_SERVER, the server will be shutdown.

When BINLOG_ERROR_ACTION=IGNORE_ERROR, the server will close the relay
log. The only way of recovering the closed relay log is to restart the
whole server.

The code at truncate_relaylog_file() is asserting that the relay log was
opened when called to prevent trying to truncate a closed relay log.

Fix
---

Because of the possibility of calling the function after an error
rotating the relay log, the truncate_relaylog_file() function should
not assert that the relay log is open and also should take no action
when the relay log was closed.

BUG#26161405 EXECUTING STOP SLAVE WHEN IO_THREAD IS "WAITING FOR DISK
             SPACE" CAUSES PROBLEMS

Problems
-------

STOP SLAVE [IO_THREAD] will set mi->abort_slave flag and will wait until
the I/O thread to be stopped.

When the I/O thread is waiting for disk space, the mi->abort_slave
signal will not be checked by the I/O thread until finishing queuing the
current event. So, "STOP SLAVE" will be blocked (until STOP SLAVE
timeout with an error).

Also, any thread waiting for disk space at "my_write" (thread that used
the MY_WAIT_IF_FULL flag) could report itself as "Waiting for disk
space".

Shutting down the server while having an I/O thread waiting for disk
space would hang the server without accepting new connections until disk
space be freed.

Fixes
-----

STOP SLAVE [IO_THREAD] throws a warning message into the server error
log recommending either to free some disk space or to use 'KILL' to
abort I/O thread operation.

Only the relay log related operations will change the thread status to
"Waiting for disk space". A new flag was used to signal the my_write
function to change the thread status.

Shutting down the server while having an I/O thread waiting for disk
space will make the I/O thread to be killed, truncating the current
relay log file if possible.

Fixed a doxygen issue at MYSQL_BIN_LOG::truncate_relaylog_file().

Fixed an issue in "performance_schema.threads" that was not showing
"Waiting for disk space" at PROCESSLIST_STATE field.