-
Paweł Olchawa authored
0. Log buffer became a ring buffer, data inside is no longer shifted. 1. User threads are able to write concurrently to log buffer. 2. Relaxed order of dirty pages in flush lists - no need to synchronize the order in which dirty pages are added to flush lists. 3. Concurrent MTR commits can interleave on different stages of commits. 4. Introduced dedicated log threads which keep writing log buffer: * log_writer: writes log buffer to system buffers, * log_flusher: flushes system buffers to disk. As soon as they finished writing (flushing) and there is new data to write (flush), they start next write (flush). 5. User threads no longer write / flush log buffer to disk, they only wait by spinning or on event for notification. They do not have to compete for the responsibility of writing / flushing. 6. Introduced a ring buffer of events (one per log-block) which are used by user threads to wait for written/flushed redo log to avoid: * contention on single event * false wake-ups of all waiting threads whenever some write/flush has finished (we can wake-up only those waiting in related blocks) 7. Introduced dedicated notifier threads not to delay next writes/fsyncs: * log_write_notifier: notifies user threads about written redo, * log_flush_notifier: notifies user threads about flushed redo. 8. Master thread no longer has to flush log buffer. 9. Introduced dedicated log thread which is responsible for writing checkpoints. No longer concurrent user threads need to compete for this responsibility. 10. Master thread no longer has to take care of periodical checkpoints. Log checkpointer thread writes checkpoint at least once per second (before it was once per 7 seconds). 11. The following exposed system variables, can be changed in runtime now: * innodb_log_buffer_size, * innodb_log_write_ahead_size. 12. Master thread measures average global cpu usage in OS. 13. Introduced new exposed system variables: * innodb_log_wait_for_flush_spin_hwm, * innodb_log_spin_cpu_abs_lwm, * innodb_log_spin_cpu_pct_hwm. They control when we need to use spinning for the best performance, to reduce latency which would otherwise come from communication between log threads and user threads. The first one is based on average flush time, the two others are based on cpu usage. 14. Introduced new CMake option: ENABLE_EXPERIMENT_SYSVARS=0/1. System variables can be marked as hidden unless the experiment mode is turned on. 15. There is a list of hidden new system variables for experiments with redo log. We skip listing them here. 16. Created dedicated tester for redo log alone (as gtest). 17. Created doxygen documentation for the new redo log. 18. The dict_persist margin is updated when number of dirty pages is changed, instead of calculations on demand. 19. Mechanism used to copy last incomplete block for Clone has been changed, because log buffer is concurrent now. 20. Added more useful MONITOR counters for redo, including average lsn rate. 21. Introduced sharded rw-lock to have an option to stop the world in redo, because log_mutex is removed. 22. Invented and implemented a concurrent data structure which tracks progress of concurrent operations and can answer up to which point they all have been finished (when there is some order defined but they are allowed to be executed out of the order). This structure is used for concurrent writes to log buffer and re-used for concurrent additions to flush lists. 23. Introduced a universal mechanism to wait on event, which starts with provided number of spin delays, then fallbacks to waits on event, starting at small timeout, but increasing timeout every few waits. This mechanism is used in communication between user and log threads, and in communication between different log threads. 24. We slow-down redo log writer when there is no space in redo allowing checkpoints to progress and rescue the state of redo. 25. Log buffer can be resize in runtime - the size can also be decreased. 26. Simplified shutdown procedure to avoid a possible returns in logic to previous phases. 27. Removed concept of multiple log groups. 28. Relaxed conditions required for checkpoint_lsn. It can now point to any data byte within redo (does not need to point to a records group beginning). 29. Windows: always use buffered IO for redo log. 30. Mysql test runner received a new feature (thanks to Marcin): --exec_in_background. Review: RB#15134 Reviewers: - Marcin Babij <marcin.babij@oracle.com>, - Debarun Banerjee <debarun.banerjee@oracle.com>. Performance tests: - Dimitri Kravtchuk <dimitri.kravtchuk@oracle.com>, - Daniel Blanchard <daniel.blanchard@oracle.com>, - Amrendra Kumar <amrendra.x.kumar@oracle.com>. QA and MTR tests: - Vinay Fisrekar <vinay.fisrekar@oracle.com>.
Paweł Olchawa authored0. Log buffer became a ring buffer, data inside is no longer shifted. 1. User threads are able to write concurrently to log buffer. 2. Relaxed order of dirty pages in flush lists - no need to synchronize the order in which dirty pages are added to flush lists. 3. Concurrent MTR commits can interleave on different stages of commits. 4. Introduced dedicated log threads which keep writing log buffer: * log_writer: writes log buffer to system buffers, * log_flusher: flushes system buffers to disk. As soon as they finished writing (flushing) and there is new data to write (flush), they start next write (flush). 5. User threads no longer write / flush log buffer to disk, they only wait by spinning or on event for notification. They do not have to compete for the responsibility of writing / flushing. 6. Introduced a ring buffer of events (one per log-block) which are used by user threads to wait for written/flushed redo log to avoid: * contention on single event * false wake-ups of all waiting threads whenever some write/flush has finished (we can wake-up only those waiting in related blocks) 7. Introduced dedicated notifier threads not to delay next writes/fsyncs: * log_write_notifier: notifies user threads about written redo, * log_flush_notifier: notifies user threads about flushed redo. 8. Master thread no longer has to flush log buffer. 9. Introduced dedicated log thread which is responsible for writing checkpoints. No longer concurrent user threads need to compete for this responsibility. 10. Master thread no longer has to take care of periodical checkpoints. Log checkpointer thread writes checkpoint at least once per second (before it was once per 7 seconds). 11. The following exposed system variables, can be changed in runtime now: * innodb_log_buffer_size, * innodb_log_write_ahead_size. 12. Master thread measures average global cpu usage in OS. 13. Introduced new exposed system variables: * innodb_log_wait_for_flush_spin_hwm, * innodb_log_spin_cpu_abs_lwm, * innodb_log_spin_cpu_pct_hwm. They control when we need to use spinning for the best performance, to reduce latency which would otherwise come from communication between log threads and user threads. The first one is based on average flush time, the two others are based on cpu usage. 14. Introduced new CMake option: ENABLE_EXPERIMENT_SYSVARS=0/1. System variables can be marked as hidden unless the experiment mode is turned on. 15. There is a list of hidden new system variables for experiments with redo log. We skip listing them here. 16. Created dedicated tester for redo log alone (as gtest). 17. Created doxygen documentation for the new redo log. 18. The dict_persist margin is updated when number of dirty pages is changed, instead of calculations on demand. 19. Mechanism used to copy last incomplete block for Clone has been changed, because log buffer is concurrent now. 20. Added more useful MONITOR counters for redo, including average lsn rate. 21. Introduced sharded rw-lock to have an option to stop the world in redo, because log_mutex is removed. 22. Invented and implemented a concurrent data structure which tracks progress of concurrent operations and can answer up to which point they all have been finished (when there is some order defined but they are allowed to be executed out of the order). This structure is used for concurrent writes to log buffer and re-used for concurrent additions to flush lists. 23. Introduced a universal mechanism to wait on event, which starts with provided number of spin delays, then fallbacks to waits on event, starting at small timeout, but increasing timeout every few waits. This mechanism is used in communication between user and log threads, and in communication between different log threads. 24. We slow-down redo log writer when there is no space in redo allowing checkpoints to progress and rescue the state of redo. 25. Log buffer can be resize in runtime - the size can also be decreased. 26. Simplified shutdown procedure to avoid a possible returns in logic to previous phases. 27. Removed concept of multiple log groups. 28. Relaxed conditions required for checkpoint_lsn. It can now point to any data byte within redo (does not need to point to a records group beginning). 29. Windows: always use buffered IO for redo log. 30. Mysql test runner received a new feature (thanks to Marcin): --exec_in_background. Review: RB#15134 Reviewers: - Marcin Babij <marcin.babij@oracle.com>, - Debarun Banerjee <debarun.banerjee@oracle.com>. Performance tests: - Dimitri Kravtchuk <dimitri.kravtchuk@oracle.com>, - Daniel Blanchard <daniel.blanchard@oracle.com>, - Amrendra Kumar <amrendra.x.kumar@oracle.com>. QA and MTR tests: - Vinay Fisrekar <vinay.fisrekar@oracle.com>.
Loading