-
Paweł Olchawa authored
WAITING FOR CHECKPOINTER 1. We missed a margin between boundary up to which log writer may write and boundary up to which log_free_check() calls do not have to wait (after they included the concurrency margin). This margin has been restored and it is called "extra_margin". 2. There was a completely useless safety-margin equal to 10% of true redo capacity. The safety margin is supposed not to be used (never, ever). We decrease size of the margin to: min(10% of true redo capacity, 1024 * UNIV_PAGE_SIZE). 3. The concurrency margin became increased by 5% of redo capacity (after the safety_margin is subtracted from the redo capacity). Therefore the initial value of concurrency_margin is now greater by a value which does not depend on number of concurrent threads. This is an extra protection for two cases: - when innodb_thread_concurrency = 0, - when we miss a call to log_free_check() in some flow (bug). 4. What's more, the concurrency_margin became adaptively increased when log_writer enters the extra_margin. If that happens, we had to either encounter a missing call to log_free_check() or we have the unlimited thread concurrency (innodb_thread_concurrency = 0) case. Then next log_free_check() calls are stopped even earlier and the log_writer thread proceeds further (it still has the extra_margin in which it can operate (5% of redo capacity)). We hopefully will resolve the problems (thanks to log_writer writing further, which allows to unlock the oldest dirty pages, flush them and advance the checkpoint) and next time we will stop users at log_free_check() calls earlier (this point might be moved up to even 50% of redo). When log_writer exits the extra_margin (by checkpoint advanced), it notices the fact and next time it re-enters the extra_margin, it increases the concurrency_margin again. Each time it increases the concurrency_margin, it does that by 20%. 5. Do not kill MySQL explicitly when log_writer runs out of space in redo. Keep waiting and emit error every 5s. In case of real deadlock it is error monitor thread which kills MySQL (after 10s, so we would have the error and we could recognize the reason). 6. Abort redo's archiver when log_writer started to overwrite non-archived yet region of the redo log. 7. Decrease sleep time in loop which is executed when log_writer ran out of free space in redo log (or reached non-archived region). 8. Do not emit warnings about running out of space in redo soon (we emit only error and only when we hit the final limit, and only once per 5s). 9. Emit warning about overwriting non-archived region of redo only once per provided period. 10. Introduced new monitor counters which allow to monitor scenarios in which we run out of free space in redo. 11. Reclaim space in log recent_written buffer when log_writer is waiting for free space in redo log / archiver's progress.
Paweł Olchawa authoredWAITING FOR CHECKPOINTER 1. We missed a margin between boundary up to which log writer may write and boundary up to which log_free_check() calls do not have to wait (after they included the concurrency margin). This margin has been restored and it is called "extra_margin". 2. There was a completely useless safety-margin equal to 10% of true redo capacity. The safety margin is supposed not to be used (never, ever). We decrease size of the margin to: min(10% of true redo capacity, 1024 * UNIV_PAGE_SIZE). 3. The concurrency margin became increased by 5% of redo capacity (after the safety_margin is subtracted from the redo capacity). Therefore the initial value of concurrency_margin is now greater by a value which does not depend on number of concurrent threads. This is an extra protection for two cases: - when innodb_thread_concurrency = 0, - when we miss a call to log_free_check() in some flow (bug). 4. What's more, the concurrency_margin became adaptively increased when log_writer enters the extra_margin. If that happens, we had to either encounter a missing call to log_free_check() or we have the unlimited thread concurrency (innodb_thread_concurrency = 0) case. Then next log_free_check() calls are stopped even earlier and the log_writer thread proceeds further (it still has the extra_margin in which it can operate (5% of redo capacity)). We hopefully will resolve the problems (thanks to log_writer writing further, which allows to unlock the oldest dirty pages, flush them and advance the checkpoint) and next time we will stop users at log_free_check() calls earlier (this point might be moved up to even 50% of redo). When log_writer exits the extra_margin (by checkpoint advanced), it notices the fact and next time it re-enters the extra_margin, it increases the concurrency_margin again. Each time it increases the concurrency_margin, it does that by 20%. 5. Do not kill MySQL explicitly when log_writer runs out of space in redo. Keep waiting and emit error every 5s. In case of real deadlock it is error monitor thread which kills MySQL (after 10s, so we would have the error and we could recognize the reason). 6. Abort redo's archiver when log_writer started to overwrite non-archived yet region of the redo log. 7. Decrease sleep time in loop which is executed when log_writer ran out of free space in redo log (or reached non-archived region). 8. Do not emit warnings about running out of space in redo soon (we emit only error and only when we hit the final limit, and only once per 5s). 9. Emit warning about overwriting non-archived region of redo only once per provided period. 10. Introduced new monitor counters which allow to monitor scenarios in which we run out of free space in redo. 11. Reclaim space in log recent_written buffer when log_writer is waiting for free space in redo log / archiver's progress.
Loading