-
Andrei Elkin authored
Here is the total cset combining all revisions done since Sep 2010. Comments from the original commits are pasted in reverse chronological order. ------------------------------------------------------------ revno: 3364 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 17:09:22 +0300 message: wl#5569 MTS Refining rpl_rotate_logs that could not produce deterministic output. The list of binlogs contained one binlog more than expected. @ mysql-test/suite/rpl/r/rpl_rotate_logs.result results updated. @ mysql-test/suite/rpl/t/rpl_rotate_logs.test Refining a method of verification of the binlog rotation due to its max size: we check if the first log has been rotated by comparing its name before and after feeding load to the master. Notice, that as the former so the new current proof methods are not perfect as that part of the test really needs to demostrate every binlog file is less than @@max_binlog_size. ------------------------------------------------------------ revno: 3363 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 14:56:01 +0300 message: updating result files that were left incorrect by the last merge. ------------------------------------------------------------ revno: 3362 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 14:44:59 +0300 message: wl#5569 MTS Failure in recovery when binlog-checksum is active. The reason of the failure was in that parsing of relay log by MTS recovery gaps computing did not make sure to use the relay-log own FormatDescriptor events that contain checksumming info for all events in the log. Fixed with taking care to find out the checksum algorithm for every relay log as the first step of MTS recovery gaps computing. @ mysql-test/suite/rpl/t/rpl_mixed_mts_rec_crash_safe_checksum-master.opt forcing master to checksum. @ mysql-test/suite/rpl/t/rpl_mixed_mts_rec_crash_safe_checksum-slave.opt forcing slave to *not* checksum. @ mysql-test/suite/rpl/t/rpl_mixed_mts_rec_crash_safe_checksum.test same as rpl_mixed_mts_rec_crash_safe but to run in master with checksum and slave without own checksum. The test verifies that checksum does not affect recovery. Lack of own checksumming on slave allows to test more scenarios. @ sql/rpl_slave.cc Search for the checksum algorithm FD is added. Notice that the first three events to read is enough to find out the master side checksum alg. ------------------------------------------------------------ revno: 3361 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-08-17 11:21:23 +0300 message: merge from trunk forced to resolve few semantical conflicts caused by changes in THD::enter_cond() of the trunk. ------------------------------------------------------------ revno: 3360 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-27 08:56:14 +0100 message: Fixed failure in test rpl_mts_check_concurrency when running in the mts collection. ------------------------------------------------------------ revno: 3359 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-26 19:46:41 +0100 message: Added a test case that checks if MTS allows to concurrently access the replication tables, and as such, concurrently commit transactions that update different databases. ------------------------------------------------------------ revno: 3358 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 20:08:43 +0100 message: Configured rpl_parallel_switch_sequential to run in row and mixed mode to avoid cluttering the error log with messages on unsafe execution. ------------------------------------------------------------ revno: 3357 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 19:02:14 +0100 message: This patch contains the following fixes: . Removed suppressed warning introduced in the wrong test case (i.e. rpl_corruption) and put it in the correct one (i.e. rpl_row_corruption). . Introduced variable to avoid clutering the error log with several warning messages on unsafe execution. ------------------------------------------------------------ revno: 3356 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 11:01:12 +0100 message: This patch has the following changes: . Specific directories were created for the MTS runs in the default.push. . Warning message was suppressed in the rpl_corruption.test. . Annoying debug outputs were removed from the error log. However, this is a temporary solution as it forbids to enable traces. ------------------------------------------------------------ revno: 3355 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-20 11:56:40 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3354 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-19 22:26:30 +0300 message: wl#5569 MTS valgrind reported a stack on rpl_savepoint. The problem appears to be in that at computing slave_sql_running_state in show_mater_info() the sql thread proc_info pointer could refer to a value in a stack that has already gone. Fixed with making proc_info to point to a string literal. ------------------------------------------------------------ revno: 3353 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-19 17:46:43 +0100 message: Suppressed warning messages that could potentially cause problems while running mts crash safe test cases. ------------------------------------------------------------ revno: 3352 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-18 21:46:45 +0300 message: wl#5569 MTS Cosmetic changes are done to address readability and clearness of source code of the MTS patch. @ sql/binlog.cc Comments improved. @ sql/log_event.cc Warning text is improved. @ sql/log_event.h More comments are added. @ sql/rpl_rli.h More comments are added. @ sql/rpl_slave.cc Error constant was changed. @ sql/share/errmsg-utf8.txt Error constant is changed. ------------------------------------------------------------ revno: 3351 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-18 14:52:44 +0300 message: wl#5569 MTS Inadvertently introduced hunk two rev:s back is reverted to please rpl_*_mts_crash_safe. ------------------------------------------------------------ revno: 3350 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-17 00:51:45 +0300 message: wl#5569 MTS fixing build issue for embedded. Public visibility for Rows_log_event::do_apply_event() is restored. ------------------------------------------------------------ revno: 3349 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-16 20:08:31 +0300 message: wl#5569 MTS The patch contains improvements after code review. Changes are mostly consmetic. @ mysql-test/suite/rpl/r/rpl_parallel_start_stop.result results updated. @ sql/binlog.cc correcting comments. @ sql/field.cc renaming. @ sql/log_event.cc renaming and separating out a block of code in Log_event::get_slave_worker() into a new method of Slave_job_group class; some cleanup. @ sql/log_event.h Extending and improving comments; renaming to follow is_, get_, set_ pattern; restoring the private access to do_apply_event() in Rows_log_event. @ sql/mysqld.cc removing extra declaration. @ sql/rpl_info_factory.cc Minor comments is added. @ sql/rpl_rli.cc renaming to make _cnt suffix to all entities that have counter meaning in mts; improving comments. @ sql/rpl_rli.h Renaming, and improving comments for the new members to Relay_log_info. @ sql/rpl_rli_pdb.cc remaning. @ sql/rpl_rli_pdb.h Improving comments readability through adding legengs defining MTS specific abbreviations. @ sql/rpl_slave.cc Renaming; minor cleanup in sql_slave_killed(); adding comments on Seconds_behind_master update policy with MTS. @ sql/share/errmsg-utf8.txt Improving text of few errors. ------------------------------------------------------------ revno: 3348 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-16 02:11:11 +0300 message: bug#12755663 MTS: RPL_CIRCULAR_FOR_4_HOSTS FAILS: CANT EXECUTE THE CURRENT EVENT GROUP MTS stopped with an error in the middle of the test. The reason is that a group of events from the slave itself was processed partly to modify the group position. In the following restart the wrong group bondary made slave either to error out or assert. Fixed with locating a possible race condition allowin Coordinator to ignore actual failed status of a Worker. So in the case of the test, the slave server group can't be started. Notice, this is a trial patch since I can't catch the failure on available to me hosts at all. @ sql/rpl_rli_pdb.cc Changing the running status of the Worker before it releases assigned entries. That ensure that the waiting in wait_for_workers_to_finish() Coordinator exits the function with a negative result and therefore stops without attempting to apply an event due to which it attempted synchronization. Couple of diagnostics into error log are added. They may be removed in short while but currently might be helpful to provide details if the failure won't disappear after this push. ------------------------------------------------------------ revno: 3347 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-07-14 12:40:06 +0300 message: WL#5569 MTS further extensive rpl_circular_for_4_hosts exersices with --repeat 10 --parallel=8 revealed a race condition in that Coordinator might miss to catch not-running status for a Worker. That made Coordinator to skip only a part of a group of the slave server own events so the slave stops at not the bondary of a group. Fixed with moving marking of the errored-out Worker as failed prior to its APH entries release. TODO: notice there can be a possibility to stop at not the boundary due to graceful STOP SLAVE if one is run at time of skipping self-originated events. However this issue belongs to STS and might be similar with BUG@12604951 and BUG@12728160. @ mysql-test/suite/rpl/r/rpl_circular_for_4_hosts.result results are updated. @ mysql-test/suite/rpl/t/rpl_circular_for_4_hosts.test tests is updated with a new text of a suppression. @ sql/log_event.cc Adding clarifying text to an error message when parallel execution fails. @ sql/rpl_rli_pdb.cc Moving marking of the errored-out Worker as failed prior to its APH entries release. That ensures Coordinator always finds the non-running status in a case it has to know that (wait_for_workers_to_finish()). @ sql/share/errmsg-utf8.txt Adding a format specifier for a clarifying text. ------------------------------------------------------------ revno: 3346 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-07-14 08:03:55 +0100 message: Post-push fixes for WL#5569 Injecting faults while updating a myisam table requires to flush the changes before committing suicide. So we have introduced the follwing code: DBUG_EXECUTE_IF("crash_after_commit_and_update_pos", - DBUG_SUICIDE();); + sql_print_information("Crashing crash_after_commit_and_update_pos."); + flush_info(TRUE); + DBUG_SUICIDE(); Besides we improved some comments. ------------------------------------------------------------ revno: 3345 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-13 16:23:57 +0100 message: WL#5569 @ mysql-test/extra/rpl_tests/rpl_mts_crash_safe.inc Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. @ mysql-test/include/not_slave_worker_info_table.inc Removed this feature as option --slave-worker-info-repository was removed too. @ mysql-test/suite/rpl/t/rpl_mixed_mts_crash_safe-slave.opt Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. @ mysql-test/suite/rpl/t/rpl_mixed_mts_rec_crash_safe-slave.opt Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. @ mysql-test/suite/rpl/t/rpl_row_crash_safe-slave.opt Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. @ mysql-test/suite/rpl/t/rpl_row_mts_crash_safe-slave.opt Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. @ mysql-test/suite/rpl/t/rpl_row_mts_rec_crash_safe-slave.opt Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. @ mysql-test/suite/rpl/t/rpl_stm_mixed_crash_safe-slave.opt Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. @ mysql-test/suite/rpl/t/rpl_stm_mts_crash_safe-slave.opt Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. @ mysql-test/suite/rpl/t/rpl_stm_mts_rec_crash_safe-slave.opt Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. @ mysql-test/suite/sys_vars/t/slave_worker_info_repository_basic.test Removed this test case as option --slave-worker-info-repository was removed too. @ sql/binlog.cc Improved code as requested by reviewers. @ sql/lock.cc Removed mistake that got into sql/lock.cc after merging with trunk. @ sql/log_event.cc Introduced parameter force in commit_positions function to determine if flush must be executed regardless of sync options. @ sql/rpl_info.h Updated doxygen comments and removed a change to avoid conflicts when merging with trunk. @ sql/rpl_info_factory.h Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. @ sql/rpl_rli.cc Introduced parameter force in commit_positions function to determine if flush must be executed regardless of sync options. @ sql/rpl_rli_pdb.cc Improved the code and introduced parameter force in commit_positions function to determine if flush must be executed regardless of sync options. @ sql/rpl_rli_pdb.h Introduced parameter force in commit_positions function to determine if flush must be executed regardless of sync options. @ sql/rpl_slave.cc Removed duplicated code. @ sql/sql_parse.cc Reintroduced flag removed by mistake when merging with trunk. See also sql/lock.cc. @ sql/sys_vars.cc Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. ------------------------------------------------------------ revno: 3344 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-13 00:10:43 +0300 message: wl#5569 MTS merge trunk -> wl5569-tree ------------------------------------------------------------ revno: 3343 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-12 23:36:17 +0300 message: wl#5569 MTS adding suppression due to expected warning to rpl_circurlar_for_4_hosts; decreasing a loop limit in rpl_parallel_switch_sequential in case of statement format. ------------------------------------------------------------ revno: 3342 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-12 14:46:23 +0300 message: WL#5569 MTS Fixing code and test due to rpl.rpl_circular_for_4_hosts mismatch failure, like http://pb2.norway.sun.com/?action=archive_download&archive_id=3608382. The reason of the mismatch was that when having two group of events to execute, the first for a Worker and the 2nd for Coordinator, Coordinator waited for the 1st group completion but did not verify success of synchronization. So in a case of the failed applying of the 1st group processing of the 2nd could find an inconsistent state to end up with a segfault (even though only the mismatch has been seen so far). @ mysql-test/suite/rpl/r/rpl_circular_for_4_hosts.result results are updated. @ mysql-test/suite/rpl/t/rpl_circular_for_4_hosts.test Test is updated to include a part specific to MTS. While all former conditions hold, the new section makes sure B server has two group of events to send which was not previously guaraneed nor necessary. Further, when the first of the two fails with Duplicate entry at applying of the 2nd Coordinator senses the first failure and gives out the 2nd. The first error remains to be seen in show-slave-status. @ sql/log_event.cc Checking wait_for_workers_to_finish() return code in case Coordinator executes a sequential-mode event. Comments are deployed in few other places where that is unnecessary to do. @ sql/rpl_rli_pdb.cc Worker marks itself as failed to apply which fact is reported to Coordinator also through wait_for_workers_to_finish(). Coodinator shall check the error code in a branch of a sequential event applying. @ sql/rpl_rli_pdb.h Adding a new state that Worker sets itself to indiate its failure to apply. @ sql/rpl_slave.cc Refining an assert as consequence of the new state and its actual setting by Worker. ------------------------------------------------------------ revno: 3341 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-10 22:40:01 +0100 message: Avoiding busy waiting when running mts recovery tests. ------------------------------------------------------------ revno: 3340 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-09 23:11:58 +0100 message: Removed --slave-checkpoint-period from MTS test cases. ------------------------------------------------------------ revno: 3339 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-09 23:08:07 +0100 message: Improved test cases for the WL#5569. ------------------------------------------------------------ revno: 3338 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 22:40:52 +0300 message: wl#5569 MTS The patch refines logics of applying phase of MTS-recovery to always applying events that are for Coordinator; fixes few tests to make them passable on PB; makes GAQ size to be of checkpoint_group value. @ mysql-test/suite/rpl/t/rpl_parallel_switch_sequential.test attempting to decrease execution time that currently might be too much for some PB hosts. @ mysql-test/suite/rpl/t/rpl_row_crash_safe-slave.opt Making the test to run in parallel mode with Workers having the table as their info storage. @ mysql-test/suite/sys_vars/r/slave_checkpoint_period_basic.result results updated. @ mysql-test/suite/sys_vars/t/slave_checkpoint_period_basic.test masking out the actual value of slave_checkpoint_period. @ sql/log_event.cc Never skip events that are for Coordinator as indicated by mts_execution_mode(). @ sql/rpl_rli.h Improving comments. @ sql/rpl_slave.cc Simplifying while condition of the GAQ-progress loop and deploying an assert ensuring checkpoint_group parameter and GAQ state are combined correctly. ------------------------------------------------------------ revno: 3337 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 07:54:34 +0100 message: Reduced the timeout period to run the checkpoint routine by setting slave-checkpoint-period to 30. ------------------------------------------------------------ revno: 3336 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 07:44:35 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3335 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-06 12:46:05 +0300 message: wl#5569 MTS refining wait for db-hash entry release at event distribution. The graceful STOP is not accepted at this point so Coordinator continues to stay in a loop. ------------------------------------------------------------ revno: 3334 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-05 20:43:04 +0300 message: bug#12719875 possible MTS recovery issue. MTS stopped with an error after failing to apply an event. It turned out that the event was sceduled incorrectly due to earlier stop by Single-Threaded Slave not at the group boundary but rather in the middle of it. Fixed with forcing CREATE..SELECT be logged as two groups. The CREATE-TABLE group is surrounded with its own BEGIN/COMMIT braces. @ mysql-test/suite/rpl/r/rpl_parallel_switch_sequential.result new results file is added. @ mysql-test/suite/rpl/t/rpl_parallel_switch_sequential-slave.opt transaction retry is not supported yet by MTS. @ mysql-test/suite/rpl/t/rpl_parallel_switch_sequential.test Regression test for bug#12719875 is added. Notice, created tables engine is Innodb also because with MyISAM stop-slave can be actually in the middle a group of myisam table events so the following restart fails with a dup key error. CREATE-SELECT is not tested according to another bug as commented. @ sql/log_event.cc changing error report style to be actually effective: rli->report() does not make rli->info_thd to return from is_error() true. my_error() message eventually gets to the show-slave-status sql-error at the end of slave sql thread. @ sql/rpl_slave.cc fixing a possible hanging that can happen due to errored-out worker at time of gaq is full and the worker was the first to update it; refining asserts; shifting stop_workers() routine to a point where slave sql has not reset its errors which pleases a refined assert in slave_stop_workers(rli). ------------------------------------------------------------ revno: 3333 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-04 18:14:09 +0300 message: wl#5569 MTS Adding a rule to run PB with all suites in MTS with binlog-format ROW. @ .bzr-mysql/default.conf restoring commits@. @ mysql-test/collections/default.push adding a rule to run all suites in MTS with binlog-format ROW. ------------------------------------------------------------ revno: 3332 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-03 23:29:34 +0300 message: wl5569 MTS cleanup in one file. @ sql/rpl_rli.cc removing traces of a mutex that was served in prototyping support for temporary tables. ------------------------------------------------------------ revno: 3331 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-03 23:16:02 +0300 message: wl5569 MTS bzr commit mail address changed; a minor cleanup to make mts_is_worker() with const argument; releasing a test to run in MTS. ------------------------------------------------------------ revno: 3330 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-02 08:58:56 +0100 message: Fixed use of the performance schema in the replication code and concurrency issue in the IO Thread. In particular, the IO Thread was calling flush_master_info without grabbing locks. ------------------------------------------------------------ revno: 3329 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 16:41:35 +0300 message: wl5569 MTS merging from the main repo. ------------------------------------------------------------ revno: 3328 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 15:48:25 +0300 message: wl#5569 MTS the final cleanup patch. There are few glitches that were considered as tolerable at least for time of the total wl's code is being reviewed. That includes: - no support to old load-data events - no support for FK to add to the list, there are few places in the patch that suggests to deploy error branches each time flush_info() is called. @ sql/log_event.h cleanup. @ sql/rpl_reporting.cc introducing a new public method in order to be callable from Slave_worker executed code. @ sql/rpl_reporting.h the earlier do_report is renamed and a new do_report() is made a way to allow child classes to redefine its own way. The child class is suppose to call child->report() and to have child::do_report() 's designed behaviour. @ sql/rpl_rli_pdb.cc addressing an OOM issue at delete of curr_group_exec_parts. @ sql/rpl_rli_pdb.h deploying do_`method' pattern. @ sql/rpl_slave.cc cleanup. ------------------------------------------------------------ revno: 3327 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 13:16:52 +0300 message: wl#5569 MTS The patch cleans up some host of code. @ sql/log_event.cc cleanup, comments improved, logics of decision in Log_event::apply_event on mts execution mode is simplified. Moving flush_info() of Rotate_log_event::do_update_pos() into inc_group_relay_log_pos(). @ sql/log_event.h cleanup and merging logics of former mts_async_exec_by_coordinator() with mts_sequential_exec() which is turned to be called from a new mts_execution_mode(). Reducing visibility of mts members of Log_event hierarchy to match the needs. @ sql/rpl_rli.cc cleanup, renames and moving flush_info() inside inc_group_relay_log_pos(). @ sql/rpl_rli.h Cleanup and comments improved. @ sql/rpl_rli_pdb.cc Cleanup; renames; comments; a new Slave_worker::init_worker() is defined to be called at starting the Worker pool per each worker. Its initialization instructions are migrated from from slave_start_single_worker(). @ sql/rpl_rli_pdb.h Cleanup and comments improved. @ sql/rpl_slave.cc cleanup; replacing collection of initializations for a Worker in slave_start_single_worker() into a new Worker::init_worker(). @ sql/sql_class.h cleanup. ------------------------------------------------------------ revno: 3326 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-28 11:30:18 +0300 message: wl#5569 MTS replacing views with regular tables for consistency verification in rpl_parallel_innodb. Also a minor cleanup in rpl_parallel is done. ------------------------------------------------------------ revno: 3325 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-27 20:31:45 +0300 message: wl#5569 MTS Cleanup and addressing sporadic rpl_temp_table_mix_row failure in post-execution mtr.check_testcase(). The check of the test failure was caused by faulty optimization in avoiding to migrate temporary tables from Coordinator to Workers in case of rows-event assignement. while it's correct with the homogenous rows-event only load, the mixture can fail. Fixed with removing the optimization so map_db_to_worker() always relocates which is somewhat suboptimal and should be improved in future. @ mysql-test/suite/rpl/t/rpl_temp_table_mix_row.test Adding slave synchronization. @ sql/log_event.cc cleanup to move circular_buffer releated definitions into rpl_rli_pdb that is specialized on objects dealing with Worker, its assignement etc. improving comments; also instead of former separate flag indicating a T-event requires post-scheduling synchronization with the Worker is turned into a bit of existing Log_event::flags which also avoids ungliness of #if/#endif:s. @ sql/log_event.h instead of former separate flag indicating a T-event requires post-scheduling synchronization with the Worker is turned into a bit of existing Log_event::flags; @ sql/rpl_rli.cc cleanup: renaming. @ sql/rpl_rli.h cleanup: renaming, more comments. The former mts_wqs_overrun is converted into two: the statistics parameter mts_wq_overrun_cnt and the internal control parameter mts_wq_excess. @ sql/rpl_rli_pdb.cc Included rpl_slave.h that holds two necessary declarations; Cleanup: accepting circular_buffer related definitions migrated from log_event, improved comments, renaming, removing dead code @ sql/rpl_rli_pdb.h Cleanup: renaming and more comments are added. @ sql/rpl_slave.cc Augmenting print-out of statistics at the end of MTS session; cleanup: renaming. @ sql/rpl_slave.h Introducing two constants to define range of worker_id domain and a magic value of undefined worker. @ sql/sys_vars.cc replacing a literal int value with a symbilic constant. ------------------------------------------------------------ revno: 3324 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-27 13:12:52 +0100 message: Ensured that updates to the worker_info_repository are transactional and fixed the slave_checkpoint_group_basic test case. ------------------------------------------------------------ revno: 3323 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-26 13:02:59 +0100 message: Fixed test case. ------------------------------------------------------------ revno: 3322 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-06-25 15:14:24 +0100 message: Introduced test case for recovery with MTS and fixed bugs in recovery. ------------------------------------------------------------ revno: 3321 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 15:38:19 +0300 message: wl#5569 MTS This patch makes a bit of cleanup, addresses one memory-allocation todo and completes fixing valgrind report (rpl_parallel_start_stop) due to strings allocation in Slave_job_group items. ------------------------------------------------------------ revno: 3320 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 12:38:34 +0300 message: wl#5569 MTS this patch completes the previous one to fixes a result file and make the innodb specific test verification to base on tables not views. ------------------------------------------------------------ revno: 3319 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 00:11:22 +0300 message: wl#5569 MTS this is an exploratory patch to sort out if verification method what was based on views has its own not related to mts flaw. The patch calls verification macro on the tables that required some adjustment. ------------------------------------------------------------ revno: 3318 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-23 07:56:15 +0300 message: wl#5569 MTS fixing results of mysqld--help-win. ------------------------------------------------------------ revno: 3317 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 19:20:40 +0100 message: merge mysql-next-mr-wl5569 (local) --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3316 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 19:17:43 +0100 message: In some platforms, such as Windows, thread's wait time is stored in 100ns units. However, when computing the difference between two values, the result value was not multiplied by 100. Besides, there was a casting problem when the aforementioned result value was assigned to an ulong. ------------------------------------------------------------ revno: 3315 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 18:54:23 +0100 message: Fixed how mts copes with recovery. ------------------------------------------------------------ revno: 3314 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-21 19:10:54 +0300 message: wl#5569 MTS Fixing valgrind warnings. @ sql/log_event.cc w->running_status is verfied to find out the actually sought running status of a Worker. THD can be unavainlable that's what a valgrind report was about. @ sql/rpl_rli_pdb.cc commenting out an assert that valgrind does not like. @ sql/rpl_rli_pdb.h new method is added to be invoked at MTS shutdown. @ sql/rpl_slave.cc Invoking gaq cleanup at the end of MTS session. ------------------------------------------------------------ revno: 3313 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-21 18:15:43 +0300 message: wl#5569 MTS rpl_parallel_start_stop.test could fail sporadicaly with timeout. @ mysql-test/include/wait_for_slave_param.inc Correcting comments and handling of passed by caller $slave_timeout to make sure the unit of 1 second really holds. Introduced symbolic default_timeout, sleep_freq(uency) to procude time to sleep in between of two polls. @ mysql-test/suite/rpl/t/rpl_parallel_start_stop.test Since the default time to wait is less than one for innodb's wait for lock, the time to wait for error is set explicitly. ------------------------------------------------------------ revno: 3312 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 23:21:56 +0100 message: merge mysql-next-mr-wl5569 (local) --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3311 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 23:19:06 +0100 message: Fixed error when computing the Lower-Water-Mark. If two or more jobs were removed from the Group of assigned jobs and one of the jobs had a non-empty group relay log but the last one had an empty group relay log. The Lower-Water-Mark was not correctly updated, because the algorithm assumed that the group relay log was null. ------------------------------------------------------------ revno: 3310 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 11:52:44 +0100 message: Fixed valgrind errors. Slave_job_group was silently being cast to LOG_POS_COORD while calling sort_dynamic(&above_lwm_jobs, (qsort_cmp) mts_event_coord_cmp) and by consequence mts_event_coord_cmp(LOG_POS_COORD *, LOG_POS_COORD *). This had two problems: . The first two entries in the Slave_job_group were not a pointer to a char * and my_offset. . Even if the first two entries were char * and my_offset, such casting could lead to alignment problems. To fix the problem, we avoid this casting. ------------------------------------------------------------ revno: 3309 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 19:14:50 +0300 message: wl#5569 MTS fixing slave_transaction_retries_basic_64.result ------------------------------------------------------------ revno: 3308 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 16:11:25 +0300 message: wl#5569 MTS fixing tests. @ mysql-test/extra/rpl_tests/rpl_extra_col_master.test MTS-supperssion is necessary because the test is supposed to stop slave due to an error. @ mysql-test/extra/rpl_tests/rpl_relayrotate.test Load decreasing to prove a warning was caused by slow environment so waiting to accept the killed status by SQL thread was ended by 1 min timeout. @ mysql-test/suite/rpl/r/rpl_relayrotate.result results updated. @ mysql-test/suite/rpl/t/rpl_stm_000001.test A macro is expanded in order to isolate which branch of two activities a suffered timeout failure belongs in. @ mysql-test/suite/sys_vars/r/slave_transaction_retries_basic_64.result Fixing results of 64 version of the test that was editted in the prev push. ------------------------------------------------------------ revno: 3307 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 12:33:36 +0300 message: wl#5569 MTS Fixing rpl.rpl_mixed_binlog_max_cache_size that revealed incorrect asynchronous handling of a Rotate event which does not split the current group and therefore has to be executed after all previously scheduled events. Fixing sensetivity of two other tests to mtr's invocation environment that includes inital values of slave_parallel_workers and slave_transaction_retries. @ mysql-test/suite/sys_vars/inc/slave_transaction_retries_basic.inc made test insensetive to the value of slave_transaction_retries in mtr env. @ mysql-test/suite/sys_vars/r/slave_parallel_workers_basic.result made test insensetive to the value of slave_parallel_workers in mtr env. @ mysql-test/suite/sys_vars/r/slave_transaction_retries_basic_32.result made test insensetive to the value of slave_transaction_retries in mtr env. @ mysql-test/suite/sys_vars/t/slave_parallel_workers_basic.test made test insensetive to the value of slave_parallel_workers in mtr env. @ sql/log_event.cc get_slave_worker() passes need_temps argument as FALSE is case of rows-events. Correcting the actual value of `mts_in_group' of mts_async_exec_by_coordinator(). ------------------------------------------------------------ revno: 3306 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 09:04:19 +0100 message: Fixed some windows failures. ------------------------------------------------------------ revno: 3305 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-06-18 19:58:21 +0100 message: Fixed some recovery issues. ------------------------------------------------------------ revno: 3304 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 21:01:58 +0300 message: wl#5569 MTS fixing tests and a segfault at the end of handle_slave_sql() happened after worker initialization failed (e.g rpl_row_log on win). @ mysql-test/extra/rpl_tests/rpl_loaddata.test MTS-suppression is added. @ mysql-test/suite/rpl/r/rpl_loaddata.result MTS-suppression is added. @ mysql-test/suite/rpl/r/rpl_stm_loaddata_concurrent.result MTS-suppression is added. @ mysql-test/suite/sys_vars/t/disabled.def constant nuisanse is disabled in the feature tree. Todo: do not merge it when pushing to the main tree. @ sql/rpl_slave.cc Moved workers initialization after one of the coordinator so that failure in the former routine is handled with a proper state of coordinator. . This fix eliminates segfault at the end of handle_slave_sql() for few tests but does not address the reason of worker initialization failure, like in rpl_row_log on win: 110616 7:37:57 [Note] Info file G:\pb2\test\sb_1-3486364-1308189142.46\mysql-5.6.3-m5-win-x86_64-test\mysql-test\var-rpl-ps_row\4\mysqld.2\data\relay-log.info.0 cannot be accessed (errno 13). Most likely this is a new slave or you are changing the repository type. 110616 7:37:57 [ERROR] G:/pb2/test/sb_1-3486364-1308189142.46/mysql-5.6.3-m5-win-x86_64-test/sql/Debug/mysqld.exe: File 'G:\pb2\test\sb_1-3486364-1308189142.46\mysql-5.6.3-m5-win-x86_64-test\mysql-test\var-rpl-ps_row\4\mysqld.2\data\relay-log.info.0' not found (Errcode: 13) 110616 7:37:57 [ERROR] Failed to create a new info file (file 'G:\pb2\test\sb_1-3486364-1308189142.46\mysql-5.6.3-m5-win-x86_64-test\mysql-test\var-rpl-ps_row\4\mysqld.2\data\relay-log.info.0', errno 13) 110616 7:37:57 [ERROR] Error reading slave worker configuration 110616 7:37:57 [ERROR] Failed during slave worker thread create 110616 7:37:57 [ERROR] Slave SQL: Failed during slave workers initialization, Error_code: 1593 ------------------------------------------------------------ revno: 3303 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 18:34:16 +0300 message: wl#5569 MTS fixing tests. @ mysql-test/extra/rpl_tests/rpl_parallel_benchmark_load.test making aux file names unique to please mtr, pb. @ mysql-test/extra/rpl_tests/rpl_parallel_load_innodb.test making aux file names unique to please mtr, pb. @ mysql-test/suite/rpl/r/rpl_filter_tables_not_exist.result MTS-suppression is added. @ mysql-test/suite/rpl/r/rpl_mixed_binlog_max_cache_size.result MTS-suppression is added. @ mysql-test/suite/rpl/r/rpl_parallel_benchmark.result making aux file names unique to please mtr, pb. @ mysql-test/suite/rpl/r/rpl_parallel_innodb.result making aux file names unique to please mtr, pb. @ mysql-test/suite/rpl/r/rpl_stm_binlog_max_cache_size.result MTS-suppression is added. @ mysql-test/suite/rpl/r/rpl_typeconv.result MTS-suppression is added. @ mysql-test/suite/rpl/t/rpl_filter_tables_not_exist.test MTS-suppression is added. @ mysql-test/suite/rpl/t/rpl_parallel_benchmark-slave.opt cleanup. @ mysql-test/suite/rpl/t/rpl_typeconv.test MTS-suppression is added. @ mysql-test/suite/sys_vars/r/slave_parallel_workers_basic.result results updated. @ sql/sql_class.h Cleanup to remove early debug-related options. @ sql/sys_vars.cc Fixating slave_parallel_workers' max as 1024. Cleanup to remove early debug-related options. ------------------------------------------------------------ revno: 3302 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 14:00:41 +0300 message: wl#5569 MTS fixing rpl_row_basic_3innodb similarly to the previous patch. @ mysql-test/suite/rpl/r/rpl_row_basic_3innodb.result a suppression is added. ------------------------------------------------------------ revno: 3301 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 13:51:59 +0300 message: wl#5569 MTS fixing few tests. 1. Policy is implemented for reacting with a warning in a case of failing worker leaves the total slave state with gaps thereby inconsistent. 2. Two tests that were used to time out due to reset master/slave was disabled in there. @ mysql-test/extra/rpl_tests/rpl_binlog_max_cache_size.test a suppression is added. @ mysql-test/extra/rpl_tests/rpl_row_basic.test a suppression is added. @ mysql-test/suite/rpl/r/rpl_known_bugs_detection.result a suppression is added. @ mysql-test/suite/rpl/r/rpl_row_basic_2myisam.result a suppression is added. @ mysql-test/suite/rpl/r/rpl_row_binlog_max_cache_size.result a suppression is added. @ mysql-test/suite/rpl/r/rpl_row_event_max_size.result a suppression is added. @ mysql-test/suite/rpl/r/rpl_row_idempotency.result a suppression is added. @ mysql-test/suite/rpl/t/rpl_known_bugs_detection.test a suppression is added. @ mysql-test/suite/rpl/t/rpl_parallel_benchmark-slave.opt removing unnecessary options causing test to fail. @ mysql-test/suite/rpl/t/rpl_parallel_benchmark.test removing an erronous assignment. The former disabling of reset was intended for benchmarking w/o binlog on the slave to please master-slave.inc. @ mysql-test/suite/rpl/t/rpl_parallel_innodb-slave.opt removing unnecessary options causing test to fail. @ mysql-test/suite/rpl/t/rpl_parallel_innodb.test removing an erronous assignment. The former disabling of reset was intended for benchmarking w/o binlog on the slave to please master-slave.inc. @ mysql-test/suite/rpl/t/rpl_row_event_max_size.test a suppression is added. @ mysql-test/suite/rpl/t/rpl_row_idempotency.test a suppression is added. @ sql/rpl_slave.cc Downgrading error to warning in a case of Coordinator fails due to a Worker error. Improving messages. Merging two if:s to have just one report(). @ sql/share/errmsg-utf8.txt Improved the text of an error; Added a new error code. ------------------------------------------------------------ revno: 3300 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 02:24:59 +0100 message: Removed unnecessary test cases and augment others in order to test recovery. ------------------------------------------------------------ revno: 3299 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-16 19:46:22 +0300 message: wl#5569 MTS fixing slave_parallel_workers_basic and rpl_stop_middle_group which cant run in MTS ------------------------------------------------------------ revno: 3298 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-16 11:29:53 +0300 message: wl#5569 MTS adding new tests to sys_vars.\ ------------------------------------------------------------ revno: 3297 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:41:32 +0100 message: WL#5569 Adding a global suppression for the warning that may appear when stopping the slave sql thread in the middle of a group. This should affect MTS mode only. ------------------------------------------------------------ revno: 3296 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:40:41 +0100 message: WL#5569 Renames worker-info-repository to slave-worker-info-repository in some tests option files. ------------------------------------------------------------ revno: 3295 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:32:37 +0100 message: WL#5569 More test fixes. Removing remaining prefixes 'mts' from mts variables, which have been renamed recently. ------------------------------------------------------------ revno: 3294 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 00:27:20 +0100 message: WL#5569 Fixing rpl_parallel result file. ------------------------------------------------------------ revno: 3293 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:41:33 +0300 message: wl#5569 MTS correcting --slave-parallel-workers in few more files ------------------------------------------------------------ revno: 3292 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:31:46 +0300 message: wl#5569 MTS correcting --slave-parallel-workers in collections/default.push ------------------------------------------------------------ revno: 3291 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:12:11 +0300 message: wl#5569 MTS Cleanup, including 1. decreasing number and renaming system variables. Important for debugging command line options are replaced with reasonble constant values and only necessary are retained. 2. Small encapsulation in ha_blackhole.cc is done. @ mysql-test/extra/rpl_tests/rpl_parallel_benchmark_load.test cleanup. @ mysql-test/extra/rpl_tests/rpl_parallel_load.test cleanup. @ mysql-test/extra/rpl_tests/rpl_parallel_load_innodb.test cleanup. @ mysql-test/r/mysqld--help-notwin.result cleanup. @ mysql-test/suite/rpl/r/rpl_parallel_benchmark.result cleanup. @ mysql-test/suite/rpl/r/rpl_parallel_conf_limits.result cleanup. @ mysql-test/suite/rpl/r/rpl_parallel_conflicts.result cleanup. @ mysql-test/suite/rpl/r/rpl_parallel_ddl.result cleanup. @ mysql-test/suite/rpl/r/rpl_parallel_multi_db.result cleanup. @ mysql-test/suite/rpl/r/rpl_parallel_seconds_behind_master.result cleanup. @ mysql-test/suite/rpl/r/rpl_parallel_start_stop.result cleanup. @ mysql-test/suite/rpl/r/rpl_parallel_temp_query.result cleanup. @ mysql-test/suite/rpl/t/rpl_parallel.test cleanup. @ mysql-test/suite/rpl/t/rpl_parallel_benchmark.test cleanup. @ mysql-test/suite/rpl/t/rpl_parallel_conf_limits.test cleanup. @ mysql-test/suite/rpl/t/rpl_parallel_conflicts.test cleanup. @ mysql-test/suite/rpl/t/rpl_parallel_ddl.test cleanup. @ mysql-test/suite/rpl/t/rpl_parallel_innodb.test cleanup. @ mysql-test/suite/rpl/t/rpl_parallel_multi_db.test cleanup. @ mysql-test/suite/rpl/t/rpl_parallel_seconds_behind_master.test cleanup. @ mysql-test/suite/rpl/t/rpl_parallel_start_stop.test cleanup. @ mysql-test/suite/rpl/t/rpl_parallel_temp_query.test cleanup. @ mysql-test/suite/sys_vars/r/all_vars.result cleanup. @ mysql-test/suite/sys_vars/r/slave_checkpoint_group_basic.result cleanup. @ mysql-test/suite/sys_vars/r/slave_checkpoint_period_basic.result cleanup. @ mysql-test/suite/sys_vars/r/slave_worker_info_repository_basic.result cleanup. @ mysql-test/suite/sys_vars/t/slave_checkpoint_group_basic.test cleanup. @ mysql-test/suite/sys_vars/t/slave_checkpoint_period_basic.test cleanup. @ sql/log_event.cc removing experimental (for benchmarking) mts_slave_local_timestamp option. @ sql/mysqld.cc few debugging time options are replaced with constants. Interface-variables are non needed anymore. @ sql/mysqld.h few debugging time options are replaced with constants. Interface-variables are non needed anymore. @ sql/rpl_rli_pdb.cc few debugging time options are replaced with constants. @ sql/rpl_slave.cc few debugging time options are replaced with constants. @ sql/sys_vars.cc few debugging time options are replaced with constants; renaming the rest that deal with MTS to be prefixed with `slave_'. ------------------------------------------------------------ revno: 3290 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 15:59:23 +0100 message: Fixed replication valgring failures caused by the MTS. ------------------------------------------------------------ revno: 3289 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 21:23:13 +0300 message: wl#5569 MTS wl#5754 Query event parallel execution Fixing failing tests and a failure in gathering accessed databases that was caused by a recent merge from trunk. @ mysql-test/suite/rpl/r/rpl_parallel_multi_db.result results updated. @ mysql-test/suite/rpl/r/rpl_parallel_seconds_behind_master.result results updated. @ mysql-test/suite/rpl/r/rpl_parallel_start_stop.result results updated. @ mysql-test/suite/rpl/t/rpl_parallel_multi_db.test moving mtr.add_supp to eliminate possibility of warning in the slave's error; adding graceful termination lines the test. @ mysql-test/suite/rpl/t/rpl_parallel_seconds_behind_master.test moving mtr.add_supp to eliminate possibility of warning in the slave's error. @ mysql-test/suite/rpl/t/rpl_parallel_start_stop.test Suppression are added for errors that are expected by test logics; adding graceful termination lines the test. @ sql/log_event.cc fixing the last argument to report() which should be c-string; fixing gathering of db:s on the master side. Because of a query can be preceeded in binlog by engineered BEGIN (the current pattern of logging from the trunk) resetting in Query::write() can't be any longer. However another reset point exists at the end of the top-level query and that suffices. @ sql/rpl_rli.h is_mts_in_group() to mimic STS' is_in_group() is added though semantics are different. @ sql/rpl_slave.cc further cleanup in sql_slave_killed() as requested by reviewers. ------------------------------------------------------------ revno: 3288 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 13:35:20 +0300 message: merge from trunk ------------------------------------------------------------ revno: 3287 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 12:27:38 +0300 message: wl#5569 MTS Fixing failing tests due to a. a flaw in `isolated parallel' mode implementation. Isolation applies to a group of event rather than to an instance. And event that contains over-max accessed db:s or event from Old master trigger marking the current being scheduled group. Such group will be executed having all prior scheduled done and nomore will be scheduled until the group is done. b. Notification to Coordinator about errored-out Worker is corrected. @ sql/log_event.cc isolation applies to a group of event rather than to an instance. Logics of isolation while the group is still executed by a Worker is refined through use of `bool curr_group_isolated' that lasts the group sceduling time and is set and reset in Log_event::get_slave_worker_id(). Assert is added to monitor tmp tables correct migration. . get_slave_worker() is called with `need_temp_tables' set to TRUE. @ sql/log_event.h Renaming to indicate that isolation applies to a group of event. Adding more candidate event to mts_do_isolate_group() assert. @ sql/rpl_rli.h Isolation mode related declaration. @ sql/rpl_rli_pdb.cc Refining notification logics. Coordinator needs both its THD::KILLED and signalling to slave_worker_hash_cond. @ sql/rpl_slave.cc Isolation mode related init-ion. ------------------------------------------------------------ revno: 3286 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-12 22:33:32 +0300 message: wl#5569 MTS making default.push to run rpl suite with non-default --mts-slave-parallel-workers > 0 in all three format/mode (row,stmt, mixed). The default is run for all suites in mixed mode and rpl suites with row+ps, stmt formats. ------------------------------------------------------------ revno: 3285 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-12 22:05:05 +0300 message: wl#5569 MTS manual merge with few fixes for segfault of the last merge from the trunk etc, compilation issue on embedded. ------------------------------------------------------------ revno: 3284 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-09 18:35:59 +0100 message: Post-fixes for merge. Fixed compilation in Windows and removed an used options. ------------------------------------------------------------ revno: 3283 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-09 16:27:47 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3282 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-06 13:51:19 +0300 message: wl#5569 MTS STOP SLAVE now stops consistently w/o gaps, KILL shall be used for an urgent stop, an error case behaves like the killed. For instance, a Worker errors out, it sends KILL to Coordinator through THD::awake(), and Coordinator kill the rest through setting a special Worker-running status to killed (which breaks the read-exec loop of a Worker). @ sql/log_event.cc Changing style of computing mts-in-group bool arg into mts_async_exec_by_coordinator(). @ sql/rpl_rli.cc Changing style of computing mts-in-group arg of an if in stmt_done(). @ sql/rpl_rli.h Adding more states to Coordinator's MTS-group view. @ sql/rpl_rli_pdb.cc Relocating notification of a Worker's failure by the Worker into the error-branch of a functioning releasing common resources (entries of APH hash). The failed Worker trying awakening possibly waiting for the signal Coordinator. The latter's behaviour in it's turn is refined to not enter the waiting phase when it has been already killed. @ sql/rpl_slave.cc sql_slave_killed() is made of two flavors of the error branches. STOPped MTS coordinator does not give out too early and wait till its MTS-group state allows that. Notification with kill to Coordinator from the errored-out or killed worker is moved into a functioning releasing common resources (entries of APH hash). This case designates a hard stop. In case of the soft (SLAVE-STOPped) MTS, Coordinator is made to wait for Workers' assignements full completion before to mark their running status for stopping. ------------------------------------------------------------ revno: 3281 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-05 20:01:51 +0300 message: wl#5569 MTS More cleanup, fixes due to found issues when running tests, some improvements incl in stopping Workers to make routine to distinguish between killed and gracefully stopped cases so in the end STOP SLAVE will guarantee consistent state (some todo remains still). @ mysql-test/extra/rpl_tests/rpl_parallel_benchmark_load.test decreasing execution time. @ mysql-test/suite/rpl/t/rpl_begin_commit_rollback.test Marking the test as limited to Single-Thread-Slave. @ mysql-test/suite/rpl/t/rpl_deadlock_innodb.test Marking the test as limited to Single-Thread-Slave. @ mysql-test/suite/rpl/t/rpl_slave_skip.test Marking the test as limited to Single-Thread-Slave. @ sql/log_event.cc addressing few reviewing comments; asserting do_update_pos() can't run by Workers; cleaning up and separating Slave_worker *Log_event::get_slave_worker_id() and its caller's interest to rli-> last_assigned_worker; Deploying MTS group status marking in Log_event::apply_event(); Making Worker's exec loop break to obey to a new Worker's running status too; Deploying mts_checkpoint_routine() in Rotate_log_event::do_update_pos() (sim action for FD event's handler); Fixing relay-log update notification in Log_event::get_slave_worker_id(); @ sql/log_event.h renaming and re-typing of func:s as suggested by reviewer; leaving a todo item for the final cleanup; correcting logics of mts_async_exec_by_coordinator(); @ sql/rpl_rli.cc Initialization of a new MTS group status proverty: mts_group_status(MTS_NOT_IN_GROUP); asserting Relay_log_info::stmt_done() can't be run by Workers; deploying mts_checkpoint_routine() alike Rotate_log_event::do_update_pos() this time in Relay_log_info::stmt_done() to cover FD-event case and consulting mts_group_status in order to decide which branch to follow; @ sql/rpl_rli.h Augmenting Relay_log_info with mts_group_status to contain MTS group status; @ sql/rpl_rli_pdb.cc Slave_worker::commit_positions() is fixed to carry update relay-log info further to the following checkpoint routine action; Slave_worker *get_slave_worker() was cleaned, interfaces improved, few asserts corrected; Slave_worker::slave_worker_ends_group() cleaned a bit, and now frees extra memory of CGEP dynarray. wait_for_workers_to_finish() is made to set the Coordinator's state as not in MTS group after synchronization with all workers; @ sql/rpl_rli_pdb.h Slave_jobs_queue is augmented with running_status member. @ sql/rpl_slave.cc apply_event_and_update_pos(): corrects asserts, synch with *all* Workers at the end of dynamically marked as End of group event (mts_is_event_isolated() -> TRUE); exec_relay_log_event(): correts NULL event read out case; slave_stop_workers(): simplifying logics of stopping Workers, to mark them with w->running_status= Slave_worker::KILLED instead of killing workers' thd. . slave_stop_workers() finilizes reset of Coordinator's state with rli->mts_group_status= Relay_log_info::MTS_NOT_IN_GROUP to make sure a next restart will proceed with the reset value. ------------------------------------------------------------ revno: 3280 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-05-30 13:05:07 +0300 message: WL#5569 MTS WL#5754 Query event parallel applying ----------------------------------------------------------------- Aggregating 7 commits that are not pushed yet to the wl5569 repo. Find comments for each cset below. ------------------------------------------------------------------ The current patch addresses concurrent updating slave_open_temp_tables status counter. The former declaration of the underlying server variable is changed from ulong to int32. While that might affect (shrink) the actual range, there has been no specified range and now after the number of bits is the same on all platforms the range cat be set to be [0, max(int32)] ****** wl#5569 MTS Wl#5754 Query event parallel appying wl#5599 MTS recovery The patch includes some cleanup, including one for temp tables support, realization of few todo:s. ****** wl#5569 MTS wl#5754 Query event parallel applying More cleanup is done; Fixing temp tables manipulation. Asserting an impossible to support use case of group of events not wrapped with BEGIN/COMMIT. Todo: recognize old master binlog to refuse to run in parallel. ****** wl#5569 MTS Implementation of giving out the applier role to Worker for all cases but ones dealing with the Coordinators state. That includes Query event with over-max-db:s and Load-data related events. The current patch also makes old master binlog be handled by MTS though sometimes e.g for Query event to switch to the sequential mode. Fixing a race condition making C to wait endlessly if a Worker has exitted due to its applying error. ****** wl#5569 MTS correcting an assert that was used to fire as warned in the previous commit. Parallel feature tests pass now. ****** wl#5569 MTS This patch contains cleanup and simplification of logics of handling some events sequentially by Coordinator and adds memory-allocation failure branch to workers starting routine. ****** wl#5569 MTS An intermediate patch to address few issues raised by reviewers. To sum up, it's about cleanup and logics simplification of event distribution to Worker and consequent actions. Some efforts were paid to support Old Master Begin-less group of events. @ mysql-test/extra/rpl_tests/rpl_parallel_load_innodb.test Elaborated version of rpl_parallel_load generator still narrowed down to test performance with Innodb. @ mysql-test/suite/rpl/r/rpl_parallel_ddl.result results updated. @ mysql-test/suite/rpl/r/rpl_parallel_multi_db.result results updated. @ mysql-test/suite/rpl/r/rpl_parallel_temp_query.result results got updated. ****** results updated. @ mysql-test/suite/rpl/t/disabled.def Disabling few tests that triggers the assert installed in log_event.cc of this commit. . ****** Restoring tree tests as this patch makes them runable. @ mysql-test/suite/rpl/t/rpl_deadlock_innodb.test test can't run in MTS because of trans retry. @ mysql-test/suite/rpl/t/rpl_dual_pos_advance.test test can't run in MTS because of Until option of START SLAVE is not yet supported by MTS. @ mysql-test/suite/rpl/t/rpl_parallel_ddl-slave.opt rpl_parallel tests need --slave-transaction-retries=0 @ mysql-test/suite/rpl/t/rpl_parallel_innodb-master.opt new test opt file is added. @ mysql-test/suite/rpl/t/rpl_parallel_innodb-slave.opt new test opt file is added. ****** rpl_parallel tests need --slave-transaction-retries=0 @ mysql-test/suite/rpl/t/rpl_parallel_innodb.test Elaborated version of rpl_parallel narrowed down to test performance with Innodb. @ mysql-test/suite/rpl/t/rpl_parallel_multi_db-slave.opt rpl_parallel tests need --slave-transaction-retries=0 @ mysql-test/suite/rpl/t/rpl_parallel_temp_query-slave.opt rpl_parallel tests need --slave-transaction-retries=0 @ mysql-test/suite/rpl/t/rpl_parallel_temp_query.test Adding logics to watch Slave_open_temp_tables in face of its concurrent updating. @ sql/event_parse_data.cc Pleasing some tests. @ sql/field.cc Restoring asserts that were before changes to sql_base.cc. . ****** Old master binlog events can't be run in parallel for few reasons. Therefore that paticular branch of code is irrelevant for MTS. @ sql/handler.cc Restoring assert that were before changes to sql_base.cc. @ sql/log_event.cc cleanup, incl restoration of the trunk version of some pieces of code. passing future_event_relay_log_pos to Worker to stike out a todo in rpl_rli.cc. . ****** Asserting a not-implemented support of group of events not braced with BEGIN/COMMIT(Xid). Such groups are possible in stored routine logging and when an old server binlog file is adopted by MTS-aware slave. . ****** Making a group of event w/o B/C braces be handled by Worker. Such group can happen from an old master or the current master bilogging some SP queries. Also over-max db:s events are made to be handled by Worker. Coordinator only handles asyncrounously events dealing with Relay-log state and synchrounously events dealing with checkpoint changes (master-group coordinates). Also few types of events from OM are left to Coordinator to execute. . . ****** Cleanup and simplification of logics of handling some events sequentially by Coordinator. An event is marked as parallel or sequential through C's rli that affects commit to info table by C as well as the event's destruction. . ****** Cleanup and logics simplification in Log_event::get_slave_worker_id(), Log_event::apply_event(). The essense is: a. to return back to apply_event_and_update_pos() event associated either with the single-threaded sql-thread rli, or one of Coord or Worker. b. while the beginning of a group and corresponding actions are left to Log_event::get_slave_worker_id(), other actions including passing the event to a Worker and the final closure of the current group is moved into apply_event_and_update_pos(). . Correcting Query_log_event::at-,de-tach_temp_tables() to expect the magic "-empty string name db partition through which the applier thread receives temp tables. @ sql/log_event.h Leaving in mts_sequential_exec() only events that either can deal with Coordinator state, or are from old master. Making Query_log_event::mts_get_dbs to return a list with a magic ""-empty string partition name in case of over-max db:s query. The empty magic is converted into a record to APH to indicate the whole hash records lock. ****** More members are added to Log_event a. to associate the event with applier. b. to provide marking a B-less group of events (old master, select sf()). @ sql/mysqld.cc Turning slave_open_temp_tables from ulong to int32 and adding atomic locks declaration for the counter updating. @ sql/mysqld.h Extern-lizing slave_open_temp_tables_lock; @ sql/rpl_rli.cc Initializing/destorying slave_open_temp_tables lock at the same time with Workers. ****** passing future_event_relay_log_pos is done via an assignment to Worker's member in slave_worker_exec_job(). @ sql/rpl_rli.h restoring the original version of get_table_data() though no real changes. . ****** simplified (curr_group_is_parallel + curr_group_split) into curr_*event*_is_parallel. ****** Removing rli members that aren't necessary any longer. @ sql/rpl_rli_pdb.cc cleanup. ****** Removing redundant my_hash_update; cleanup; Fixing temp tables related issue of leaving wait_for_worker without all entries of APH given out their temp tables. ****** Changes due to redifining an object responsible to hold assigned partitions in few methods incl Slave_worker::slave_worker_ends_group(). Some cleanup in get_slave_worker(). ****** cleanup, a new assert, and init of an debug-related member. @ sql/rpl_rli_pdb.h Redifining an object responsible to hold assigned partitions. Now it's a Dyn-array holding *pointers* to records on Assigned Partition Hash. That simplifies few routines for Worker. E.g search for the records (entries of APH) by Worker at time of committing. . ****** Adding GAQ memory-allocation failure notification. ****** Memorizing last deleted event for debugging purpose. @ sql/rpl_slave.cc Adding info message to the error log; improving comments. ****** Restoring original sequential mode version of assert in sql_slave_killed. Worker is not supposed to run this function. Testing of skipping logics is left to the rpl suite be run in the parallel mode. Cleanup. Marking recovery related todo items explicitly. Setting up guards to guarantee sequential mode in requested points of the code. . ****** Streamlining Workers state identification with a boolean running_status; worker start and stop are controlled by means of the disignator. . ****** simplified (curr_group_is_parallel + curr_group_split) into curr_event_is_parallel; GAQ memory-allocation failure branch is added to workers starting routine. ****** Cleanup and, moving append_item_to_jobs() invocation into apply_event_and_update_pos() as well as other actions mentioned in log_event.cc comments; changing signature of apply_event_and_update_pos() to return NULL in place of referrenced pointer in case the event is handed over to a Worker; checking of the pointer value is done in places dealing with update-pos and event's destruction. @ sql/sql_base.cc Replacing slave opened temp tables counter incr/decr with a function perfoming atomic locking in case Worker runs it. ****** removing unnecessary return value in incr_slave_open_temp_tables def. ****** Func is renamed. Removing all traces of previous idea to return value out of modify_slave_open_temp_tables. @ sql/sql_parse.cc cleanup. ------------------------------------------------------------ revno: 3279 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-05-24 17:29:35 +0300 message: WL#5569 MTS WL#5754 Query parallel appying Changing implementation of temporary tables support in MTS. Cleanup, fixing few todo:s and few potential issues found. @ mysql-test/suite/rpl/t/rpl_parallel.test commetting failure in /include/rpl_end.inc (todo: explore and fix). @ mysql-test/suite/rpl/t/rpl_parallel_fallback.test The only Rows_query_log_event case of testing is no longer valid because the event is parallizable now. The test is removed. @ mysql-test/suite/rpl/t/rpl_parallel_multi_db.test Adding comments about possible issue of somewhat loose behaviour of sync_slave_with_master in parallel mode. TODO: investigate and fix. @ sql/binlog.cc Renaming only. @ sql/events.cc Renaming only. @ sql/log_event.cc Fixing found issues, cleanup and temp tables support. . The assigned partition as represented by an entry is passed through the assigned Worker. via Log_event::get_slave_worker(). The method attaches the entry to the Query event which do_exec_event() calls new attach and detach methods that grabs temp tables list on each involved db and returns possibly updated lists back to APH at the end of Query event applying. @ sql/log_event.h Mostly renaming. @ sql/rpl_rli.cc relocating mts_get_coordinator_thd() definition. @ sql/rpl_rli.h re-defining mts_is_worker() through SYSTEM_THREAD_SLAVE_WORKER. @ sql/rpl_rli_pdb.cc Changes mostly due to temp table support. Coordinator disaccosiates temporary tables of a being schedule db-partition from its thd and attaches the list to APH's entry. In the following the Worker finds the list and adopts it to return possibly updated version back to the entry at the end of the query. The list resides most of time in either APH's passive (usage == 0) entry, or in Worker's thd->temporary_tables. It can be relocated back to the Coordinator's repository via wait_for_workers_to_finish() that is called in case an event requires the sequential execution. . Few auxiliary functions are defined dealing with migration and merging temp tables. @ sql/rpl_rli_pdb.h Adding TABLE* pointer to list of temp tables in entry of Assigned Partition Hash. The entry pointer carries temp tables from C to W and backward. Changes in few function signitures motivated by temp table support. Adding auxiliary funcs to help with temp tables manipulations. @ sql/rpl_slave.cc renaming, cleanup and improving Worker identification. @ sql/rpl_slave.h cleanup. @ sql/rpl_utility.h cleanup. @ sql/sql_base.cc removing a hack to access temp tables in MTS. @ sql/sql_class.cc Renaming only. @ sql/sql_class.h Renaming only. @ sql/sql_rename.cc Renaming only. @ sql/sql_table.cc Renaming only. @ sql/sql_view.cc Renaming only. ------------------------------------------------------------ revno: 3278 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-05-19 12:36:28 +0300 message: wl#5569 MTS Support for ROWS_QUERY_LOG_EVENT is added. It required refactoring of its handling in the canonical sequential mode. The event life suggests its behavior similar to objects associated with Table_map, in particural, its destoying to occur at the end-of-statement time. Tested against existing ROWS_QUERY_LOG_EVENT feature tests incl rpl_row_ignorable_event in both sequential and parallel mode. @ sql/log_event.cc cleanup of MTS code; relocating handle_rows_query_log_event() logics into a. do_apply_event() and b. rli->cleanup_context(). @ sql/log_event.h cleanup of MTS code; @ sql/rpl_rli.cc Deploying ROWS_QUERY_LOG_EVENT destruction in context_cleanup(). @ sql/rpl_rli.h cleanup of MTS code; @ sql/rpl_slave.cc cleanup of MTS code; @ sql/sql_binlog.cc Simplifying ROWS_QUERY_LOG_EVENT handling in the case of BINLOG pseudo-query. ------------------------------------------------------------ revno: 3277 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-05-16 22:43:58 +0300 message: wl#5569 MTS Simplifying Coordinator-Worker interfaces. In essence after this patch Worker execute events in its private context (class Slave_worker :public Relay_log_info). The only exception is Query referring to temporary table. The temp:s are maintained in the Coordinator's "central" rli; removing some dead code; performing a lot of cleanup. There are few todo items incl: 1. To implement several todo:s scattered across MTS' code and tests (e.g to restore protected for few members of RLI of rpl_rli.h); 2. to cover Rows_query_log_event that currently can cause hanging (e.g rpl_parallel_fallback) 3. To sort out names of classes based on Rpl_info, possibly remove Rpl_info_worker @ mysql-test/suite/rpl/t/rpl_parallel_start_stop.test The test as most of rpl_parallel* bunch can't yet stand `include/rpl_end.inc'. @ sql/log_event.cc Defining the default Log_event::do_apply_event_worker() that simply executes canonical do_apply_event() however supplying Slave_worker intance reference that is critical in order to execute different rli->methods(), e.g `report'. Xid_log_event::do_apply_event_worker() runs the Worker version of Xid commit; simplifying Rows event parallel applying to remove or elaborate some host of the early prototype code incl. rli->get_tables_to_lock() and related logics; @ sql/log_event.h Adding virtual int do_apply_event_worker() to Log_event and specializing it for Xid class; @ sql/rpl_reporting.cc Spliting report() into two methods in order to make possible to call the functional part of the two with va_list as an arg be called from Slave_worker class. @ sql/rpl_reporting.h New va_list version of report method is declared. @ sql/rpl_rli.cc removing early prototype time support to Rows-event parallel execution. The new scheme of applying is almost equivalent to the standard sequential algorith thanks to Slave_worker :public Relay_log_info inheritence implementation. @ sql/rpl_rli.h Removing unnecessary interfaces; TODO: restore `protected' for few members. @ sql/rpl_rli_pdb.cc Some cleanup and defining Slave_worker::report() to eventially call the Coordinator's rli->report() and exploit a fact that the latter was designed for concurrent use. @ sql/rpl_rli_pdb.h Changing base class for Slave_worker to make it behaving as Relay_log_info when needed; Removing some dead code; Adding report() methods to run it in do_apply_event(). @ sql/rpl_slave.cc Removed UNTIL todo as it's actually not supported with a warning; Removed a todo for cleanup of error-out statement format transaction because w->cleanup_context() impelements it indeed; Cleanup or transition from w->w_rli (of Relay_log_info) to w (of Slave_worker); Adding forgotten unlock_mutex; Simplifying definitions of few func:s (mts_is_worker() etc); ------------------------------------------------------------ revno: 3276 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-05-06 21:33:32 +0300 message: wl#5569 MTS improving benchmarking test. ------------------------------------------------------------ revno: 3275 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-04-06 15:51:58 +0300 message: wl#5569 MTS Statistics for Workers and Coordinator incl waiting times, sleeping is reported now into the error log as slave stopping time. @ sql/log_event.cc statistics addded. @ sql/rpl_rli.h statistics added. @ sql/rpl_slave.cc print-out mts statistics into the error log at stopping the slave. ------------------------------------------------------------ revno: 3274 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-04-05 19:26:37 +0300 message: wl#5569 MTS restoring previous 4 default workers that rpl_parallel works with. ------------------------------------------------------------ revno: 3273 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-04-03 13:07:30 +0300 message: wl#5569 MTS Benchmarking related patch uniforms rpl_parallel to be run with arbitrary number of workers, db:s, tables, etc. TODO: to restore the final constinency check which is given out temporary while i could not find a way to leave it surrounded with a --dis/en-able* stanza. @ mysql-test/extra/rpl_tests/rpl_parallel_load.test making the load generator to be indifferent to all parameters incl the number of db:s. Have to comment out the final consistency check sinc could not find a way to hide the verified table(s) line out of the results. @ mysql-test/suite/rpl/r/rpl_parallel.result results got updated. @ mysql-test/suite/rpl/r/rpl_sequential.result results got updated. @ mysql-test/suite/rpl/t/rpl_parallel-slave.opt the test caller has to supply -mysqld=--mts-slave-parallel-workers=[:num:]. With :num: == 0 the test is equivalent to rpl_sequential. @ mysql-test/suite/rpl/t/rpl_parallel.test removed traces of the number of workers that can vary in [0 - ..] range. The test caller has to supply -mysqld=--mts-slave-parallel-workers=[:num:]. With :num: == 0 the test is equivalent to rpl_sequential. ------------------------------------------------------------ revno: 3272 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-04-02 14:32:02 +0300 message: wl#5569 MTS a test file for benchmarking is added. Benchmarking results can be gained by extracting the master side generating and the slave side applying times like in the following loop: workers=6; for n in `seq 1 3`; do echo; echo loop: $n; echo; my_mtr.sh --mysqld=--mts-slave-parallel-workers=$workers \ rpl_parallel_benchmark --mysqld=--binlog-format=statement \ && cat /dev/shm/var/mysqld.2/data/test/delta.out >> p${workers}_stmt.out 2>&1; done @ mysql-test/extra/rpl_tests/rpl_parallel_benchmark_load.test the load generator for a test file for benchmarking is added. @ mysql-test/suite/rpl/r/rpl_parallel_benchmark.result a new results file is added. @ mysql-test/suite/rpl/t/rpl_parallel_benchmark-slave.opt slave does to log into binary log. The number of workers is supposed to set via --mysqld at mtr invocation. @ mysql-test/suite/rpl/t/rpl_parallel_benchmark.test a test file for benchmarking is added. ------------------------------------------------------------ revno: 3271 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-03-30 17:11:24 +0300 message: wl#5754 Query event parallel execution Small cleanup for comments as requested by reviewer. @ sql/log_event.cc only comments cleanup. @ sql/rpl_slave.cc only comments cleanup. ------------------------------------------------------------ revno: 3270 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-02-27 19:35:25 +0200 message: WL#5754 Query event parallel execution Bundling together implementation the whole DML+DDL Query parallel support. That includes: The earlierst four rev:s to cut off the DML stage of the parallel query project from the following devoted to DDL. The four skeleton parallel applying of Queries containing a temporary table, and implement a core of the design that is the DML queries. Queries can contain arbitrary features including temp tables. The DDL part also refined few items related to the general low-level design. In particular, of the mark of the over-max db:s in the updated-db:s status var is turned to be another new constant value. The very last patch to the bundle addresses the last review mail notes. @ mysql-test/r/mysqld--help-notwin.result results get updated. ****** results gets updated. @ mysql-test/suite/rpl/r/rpl_packet.result results updated. @ mysql-test/suite/rpl/r/rpl_parallel_ddl.result the new test results are added. . ****** results get updated. @ mysql-test/suite/rpl/r/rpl_parallel_multi_db.result new result file is added. ****** results get updated. @ mysql-test/suite/rpl/r/rpl_parallel_start_stop.result results get updated. @ mysql-test/suite/rpl/r/rpl_parallel_temp_query.result a new results file ****** new result file is added. @ mysql-test/suite/rpl/t/rpl_packet.test making a hashing fixes in order the test to pass. todo: refine logics of max_allowed_packed for master & slave. @ mysql-test/suite/rpl/t/rpl_parallel_ddl.test DDL specifics for parallelization tests are added. ****** added over-the-max updated db:s case through RENAME tables. ****** added remained DDL set members to test. @ mysql-test/suite/rpl/t/rpl_parallel_fallback.test Marked a todo. @ mysql-test/suite/rpl/t/rpl_parallel_multi_db.test multi-db DML query test is added. todo: add triggers, sf(), SP. ****** adding stored routines testing. ****** increased the number of db:s. Notice that forces to change the default of --thread-stack size; added over-the-max updated db:s case through multi-updates. @ mysql-test/suite/rpl/t/rpl_parallel_start_stop.test removed explicit log pos from the results. @ mysql-test/suite/rpl/t/rpl_parallel_temp_query.test query with temporary tables testing. @ mysql-test/suite/sys_vars/r/all_vars.result results get updated. @ sql/binlog.cc gathering to be updated db in the DML case. Over-MAX_DBS_IN_QUERY_MTS-sized list won't be shipped to the slave. ****** correcting memory allocation to be in thd's memroot. ****** separating out multiple db gathering into a THD method to be invoked as for DML so for few cases of DDL. ****** Changed location of comparisions against MAX_DBS_IN_QUERY_MTS to be inside the adding to db list method; refined logics of gathering db in decide_(): *all* db:s whenever there is at least one table to update are picked up. ****** Comments are added, other changes are due to MAX_DBS_IN_QUERY_MTS + 1 ceased to be the over-max mark. @ sql/events.cc gathering updated dbs for create/drop events. @ sql/field.cc relaxing an assert (todo: add to it more specific claim field->table is temp). ****** adding comments to asserts. ****** improved comments. @ sql/handler.cc relaxing an assert (todo: add to it more specific claim table is temp). ****** adding comments to asserts. @ sql/log_event.cc master and slave (Coord's and Worker's) handling of updated db:s. The Coordinator's distribution changed to involve a loop per db, similaraly for the Worker at applying. ****** adding comments and correcting clearence of binlog_updated_db_names to not let BEGIN, COMMIT in particular to get the updated list. ****** removed an extraneous assert. ****** cleaned some parts of the code; improved comments; refined an assert; turned Coordinator to use a specific new mem-root; other changes are due to MAX_DBS_IN_QUERY_MTS + 1 ceased to be the over-max mark. @ sql/log_event.h Hardcoding the max updated db:s. Static allocation for updated db:s in Query log event is motivated by the fact of the event is shared by both C and W and the standard malloc/free can't be a reasonble choice either. Added a new status and changed dependent info. Added a new method to return the *list* of updated db:s which in all but Query case is just a wrapper over get_db(). . ****** adding commits, and interfaces to helper functions. ****** updated some comments. ****** added OVER_MAX_DBS_IN_QUERY_MTS to serve as the over-max db:s mark instead of the former MAX_DBS_IN_QUERY_MTS + 1; mts_get_dbs() receives a mem-root arg supplied by Worker or Coord. @ sql/mysqld.cc removed opt_mts_master_updated_dbs_max. @ sql/mysqld.h removed opt_mts_master_updated_dbs_max. @ sql/rpl_rli.cc a new temp table mutex init, destroy and a set of helper functions providing access to C,W's thd:s in arbitrary place of the server code are added. ****** fixing an error. ****** relocalating helper functions to rpl_slave.cc. @ sql/rpl_rli.h a new temp table mutex is added to RLI class. ****** improving comments. ****** A memroot for the Coordiator is placed into rli. @ sql/rpl_slave.cc SLAVE_THD_WORKER appeared to be redundant. Worker's thd->system_thread is set to the same as the Coordinator thread constant. ****** Added a work-around/cleanup needed by the standard temp table closing algorithm. ****** comments explaining close_temp_tables() not to run by Workers. Accepting relocated functions. ****** init alloc and the final destuction of the Coord rli->mts_coor_mem_root mem-root. @ sql/rpl_slave.h declarations of auxiliary func:s defined in rpl_slave.cc are moved from log_event.h. @ sql/share/errmsg-utf8.txt Added a new error/warning on master specific to Query parallel replication. @ sql/sp.cc covering db gathering for create/drop SP. @ sql/sql_base.cc replacing refs to thd->temporary with an appropriate one corresponding to the Coord's thd->t_t:s. Also surrounding critical sections of codes dealing with opening, finding, closing or changing temproray_tables' list with a specific mutex lock/unlock. ****** Correcting and simplifying logics for the temp table parallel support. In particular close_temporary_tables() does not need to know about thd of the caller. . ****** simplified the temp table support related addons. The double ref to thd->temporary_table is needed only in one place. @ sql/sql_class.cc master side gathering updated db:s new memeber initializations. ****** Correcting logics of merging the updated db:s of a child to the parent's top-level. ****** removed dead-comments. @ sql/sql_class.h master side gathering updated db:s list and accessor members. ****** adding a necessary cleanup method. ****** adding two base methods of db gathering: one for a queries that can update only one db, and the other for multiple db:s. . ****** added more comments, removed dead-codes. @ sql/sql_db.cc create/drop database case of db gathering. @ sql/sql_rename.cc rename table(s) case of db gathering. @ sql/sql_table.cc create, drop, alter cases of db gathering. ****** Changed location of comparisions against MAX_DBS_IN_QUERY_MTS to be inside the adding to db list method. @ sql/sql_trigger.cc create/drop trigger case of db gathering. @ sql/sql_view.cc support for CREATE/DROP views is added. @ sql/sys_vars.cc Added a system variable (todo/fixme: may turn out to be unnecessary though). . ****** removed ealier added variable. ------------------------------------------------------------ revno: 3269 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-01-12 01:01:02 +0200 message: merging from mysql-trunk ------------------------------------------------------------ revno: 3268 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-01-12 00:54:12 +0200 message: wl#5569 MTS fixing the worker threads start/stop. @ sql/rpl_rli.h adding RLI::opt_slave_parallel_workers to cache the server's namesake global var. @ sql/rpl_slave.cc moving rli->recovery_parallel_workers resetting down to the exit point from starting routine. ------------------------------------------------------------ revno: 3267 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-27 18:54:41 +0000 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3266 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-24 01:57:03 +0200 message: wl#5569 MTS the timed-wait loop of SQL thread required a break-through parameter in case the signal missed in action and just timeout would be reported ------------------------------------------------------------ revno: 3265 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 19:03:42 +0200 message: merging from the repo wl5569 ------------------------------------------------------------ revno: 3264 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 17:49:19 +0200 message: wl#5569 MTS fixing corner cases that mtr-testing with mts workers against stardard suites reveal. @ sql/log_event.cc removing COMMIT Query event from a set of ones containing the partition info. @ sql/log_event.h ROLLBACK TO can be inside of a replicated trans. ------------------------------------------------------------ revno: 3263 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 16:00:28 +0200 message: wl#5569 MTS: refining another assert that can force C to delete events that are skipped with the slave skip counter ------------------------------------------------------------ revno: 3262 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 15:34:02 +0200 message: wl#5569 MTS Correcting an assert that is hit by few tests. @ sql/rpl_rli_pdb.cc Indeed, Coordinator can be awakened with abort_slave flag UP and not being killed. ------------------------------------------------------------ revno: 3261 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 13:27:15 +0200 message: wl#5569 MTS merging from the repo. ------------------------------------------------------------ revno: 3260 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 13:25:31 +0200 message: wl#5569 MTS fixing failing tests. @ sql/rpl_slave.cc fixing an issue where a Rotate event could appear in between of events of a group. That case should not force any rli->flush_info() but rather normal relay log coordinates incr. ------------------------------------------------------------ revno: 3259 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 20:34:26 +0200 message: wl#5569 MTS merging from the repo. ------------------------------------------------------------ revno: 3258 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 20:31:13 +0200 message: wl#5569 MTS fixing tests failure when mtr runs --mts_slave_parallel_workers != 0. rpl000010 is a representative. Fixed with identifying, marking, running carefully ev->update_pos() and destroying an event that can split a group of events to force part to be in different relay logs. @ sql/log_event.cc Identifying and marking an event that can split a group of events to force part to be in different relay logs. @ sql/log_event.h FD and Rotate both can be the group splitter but only if they are "artificial". @ sql/rpl_rli.h a marker flag to be set when the group splitter such as FD is spotted. @ sql/rpl_slave.cc identifying, marking, running carefully ev->update_pos() and destroying an event that can split a group of events. ------------------------------------------------------------ revno: 3257 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 13:57:18 +0200 message: wl#5569 MTS and wl#5599 MTS recovery The general recovery implementation is finished by this patch. Tested against ./mtr rpl_parallel_conf_limits. Warning, ./mtr rpl_parallel_conf_limits rpl_parallel_conf_limits ... can fail at the 2nd etc test because of no removal of Worker tables happens at RESET SLAVE. @ sql/log_event.cc adding a special to mts-recovery branch to the event scheduling routine located in Log_event::apply_event(). todo: think about rli->flush_info() at the end of gap-filling. @ sql/rpl_rli.cc to be recovered group counter and a running index on the recovery bitmap are init-ed, also renaming. In recovery phase Coordinator can execute rows-events now. @ sql/rpl_rli.h to be recovered group counter and a running index on the recovery bitmap is added. @ sql/rpl_slave.cc engaging to be recovered group counter in mts_recovery_groups() in the end of which the recovery bitmap is ready and rli->mts_recovery_group_cnt counted how many bits of interest in there. . No actual recovery case is followed by rli->recovery_parallel_workers= rli->slave_parallel_workers at Workers startup time. ------------------------------------------------------------ revno: 3256 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 22:12:30 +0200 message: wl#5569 MTS slave_worker_info def is updated in the system db. ------------------------------------------------------------ revno: 3255 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 21:34:58 +0200 message: merging with repo ------------------------------------------------------------ revno: 3254 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 21:31:29 +0200 message: wl#5569 MTS Recovery routine part I: gathering the group recovery bitmap. @ sql/log_event.h Introducing a typedef for getting frequently used struct. @ sql/rpl_rli_pdb.cc checkpoint_seqno is added to the Worker table to index of the last committed group in the bitmap; Init, read, write, propagation of its value are addressed. @ sql/rpl_rli_pdb.h Worker class gets checkpoint_seqno members. @ sql/rpl_slave.cc mts_recovery_groups() is refined to follow a simpler design scheme. Checkpoint info that Worker must have at recovery consists of seqno, bitmap and the master binlog coordinates. ------------------------------------------------------------ revno: 3253 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 22:18:33 +0000 message: WL#5599 Fixed routine to compute the bitmap of executed events. ------------------------------------------------------------ revno: 3252 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 21:37:48 +0200 message: wl#5569 MTS adding checkpoint relay_log_name,pos as necessary part to locate a relay-log for recovery. Tested with rpl_parallel. @ mysql-test/extra/rpl_tests/rpl_parallel_load.test restoring $iter to 1k as it was used to be. @ sql/log_event.cc adding checkpoint relay_log_name,pos @ sql/rpl_rli_pdb.cc adding checkpoint relay_log_name,pos @ sql/rpl_rli_pdb.h adding checkpoint relay_log_name,pos ------------------------------------------------------------ revno: 3251 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 17:58:58 +0200 message: wl#5569 MTS manual merging from the repo and correcting GAQ processing with introducing a volatile byte to indicate whether an item is busy or released. ------------------------------------------------------------ revno: 3250 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-18 21:00:23 +0200 message: wl#5569 MTS fixing --mts-exp-slave-run-query-in-parallel=1 case when Query-log-event can be run in parallel incl DML and DDL. The feature is `exp'erimental still can be tried while there are no temp tables involved neither a db different than the session's default is modified by the query. Tested: Changes sustain mtr rpl_parallel --mysqld=--mts-exp-slave-run-query-in-parallel=1 --mysqld=--binlog-format=statement @ sql/log_event.cc making a single-query group such as DDL to be distributed to Workers. ------------------------------------------------------------ revno: 3249 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-17 14:46:15 +0200 message: wl#5569 MTS fixing PB2 failures, incl valgrind issues, long exec time and asserting in a test. @ mysql-test/extra/rpl_tests/rpl_parallel_load.test Making slow win machines happy on PB2 to lessen load. @ mysql-test/suite/rpl/t/rpl_parallel_start_stop.test adding an assert and print out if one will fire. @ sql/rpl_rli_pdb.cc the \0 term char was not allocated. @ sql/rpl_slave.cc missed initialization is added. ------------------------------------------------------------ revno: 3248 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-17 00:00:47 +0200 message: merge from wl#5569 repo to local branch rpl_sequential opt files are added to avoid mtr give up to process a bulk of unsafe warnings. ------------------------------------------------------------ revno: 3247 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-16 23:41:45 +0200 message: wl#5569 MTS Adding transparent support/fallback to the sequential execution cases of 1. Query-log-event 2. Rows_query_log_event info event Both cases can be fully parallelized in future project. Fixing an issue in move_queue_head() that was surficed as an assert in Slave_worker::slave_worker_group_ends(). Fixing destoying an event by Worker. @ mysql-test/extra/rpl_tests/rpl_parallel_load.test Edited to compare all involved tables, add explicitly multi-statement transaction, and letting verification even in a case of statement format events. @ mysql-test/suite/rpl/r/rpl_parallel_fallback.result new results file is added. @ mysql-test/suite/rpl/t/rpl_parallel.test is changed to run with all formats because it starts verifying the transparent fallback to sequential for Query-log-event and related. @ mysql-test/suite/rpl/t/rpl_parallel_fallback.test A new test file is added to contain cases of transparent fallback to the sequential execution. Rows_query_log_event case is placed there. Notice, the Query-log-event fallback is largely tested by rpl_parallel. @ sql/log_event.cc Refining event distribution logics in order to support fallback to the sequential execution. The following definions are framed out . curr_group_is_parallel - indicates a Worker is engaged for all operation incl dectruction for all events of the group. The value lasts till a next group is decided to be "pure" sequential so that C will execute it, update rli synchronously and destroy the events. curr_group_seen_begin - indicates if the current group is started with a B-event (BEGIN query). The value lasts till T-event is distributed. . Deploying a w/a for Rows_query_log_event that involves a nap to protect from a case of multiple Rows_query_log in one group. Notice, a specific (w/a as well) rule of destroying the event. @ sql/log_event.h Rows_query_log_event fallback to sequential support is added. @ sql/rpl_rli.cc Rows_query_log_event fallback to sequential support is added. @ sql/rpl_rli.h curr_group_isolated is defined to be a parallel group that is executed in isolation from any other ahead and behind workers. Coordinator is supposed to provide such environment, the new member is a facility to control it. @ sql/rpl_rli_pdb.cc Fixing usage of circular_buffer_queue::gt() to deploy an assert suggested by the heading comments. Refining logics of finding a gap in GAQ. Adding 2nd arg to wait_for_workers...() to cover the 2nd use case of waiting Workers by C. The two are: wait for all, and wait for all but not one being currently scheduled. @ sql/rpl_slave.cc Refining logics of C's commit to the main rli due to a pure sequential event (e.g FD, Rotate), similarly refining logics of freeing. Deploying a w/a for Rows_query_log_event. ------------------------------------------------------------ revno: 3246 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-14 16:46:20 +0200 message: merge from wl5569 repo ------------------------------------------------------------ revno: 3245 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-14 10:57:16 +0200 message: wl#5569 MTS a light cleanup to arrange the option/system var names properly - mts_-prefixing, and _exp prefixing for experimental features needed for benchmarking (mts_exp_slave_local_timestamp) or suppored limitly (mts_exp_slave_run_query_in_parallel for Query-log-event). Fixing GAQ size. It might be too tight e.g in case of the max WQ length of 1; tested with running rpl_parallel supplying --mts-slave-worker-queue-len-max=1. @ sql/rpl_slave.cc Fixing GAQ size. It might be too tight e.g in case of the max WQ length of 1. ------------------------------------------------------------ revno: 3244 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 18:53:32 +0200 message: wl#5569 MTS fixing a valgrind stack cauased by extra pfs-keys/cond_var. Those are removed with Alfranio`s consent ------------------------------------------------------------ revno: 3243 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 17:57:01 +0200 message: wl#5569 MTS fixing a set of valgrind warning cauased by a c&p ------------------------------------------------------------ revno: 3242 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 16:52:50 +0200 message: wl#5569 MTS updating results for few tests. ------------------------------------------------------------ revno: 3241 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-11 21:00:47 +0200 message: wl#5569 MTS 1. Fixing recovery related issue of DBUG_ASSERT(rli->get_event_relay_log_pos() >= BIN_LOG_HEADER_SIZE); at slave start with shifting mts_recovery_routine() at front of the assert. 2. Making SKIP-ed event to commit to the central RLI. That is correct since Workers are not executing anything at this time. 3. Fixing the default for mts_checkpoint_period which should not be zero normally. Zero makes sense solely for debugging (so we may stress that through VALID_RANGE(1,...). 4. Introduced a general mts-unsupported error/warning to apply to cases of non-zero parallel workers and a feature that parallelization can't work with. @ mysql-test/suite/rpl/r/rpl_parallel_start_stop.result results are updated. @ mysql-test/suite/rpl/t/rpl_parallel_start_stop.test Extending the test to cover UNTIL, SKIP, a temporary to the regular error escalation. ------------------------------------------------------------ revno: 3240 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-10 18:25:27 +0200 message: merge from wl5569 repo to a local branch ------------------------------------------------------------ revno: 3239 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-10 17:50:03 +0200 message: wl#5569 MTS Improving GAQ in a) limit size to be capable to hold items while all WQ:s are full b) move_queue_head() contained a flaw to make no progress falsely c) never let to enque in GAQ while it's full @ sql/log_event.cc Fixing impossible gaq_idx == -1. GAQ may not be full at this point. The total counter of executed groups starts from 1, that is nothing is done yet when 0. @ sql/rpl_rli_pdb.cc move_queue_head() contained a flaw to break the progress loop falsely. Fixed with comparing the current index with the Worker::last_group_done_index instead of this->last_done. The latter changed to become of pure statictics character and to contain the total seqno which is guaranteed to grow monotonically by its ulonglong size. @ sql/rpl_rli_pdb.h changes due to last_done turned into statistics holder. @ sql/rpl_slave.cc Improving GAQ in limit size to be capable to hold items while all WQ:s are full. Wait to release item at checkpoint() when GAQ is full. @ sql/sys_vars.cc opt_mts_coordinator_basic_nap is set to non-zero 5 msecs default value. ------------------------------------------------------------ revno: 3238 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-09 19:46:27 +0200 message: merge from wl5569 repo to a local branch ------------------------------------------------------------ revno: 3237 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-09 19:45:02 +0200 message: wl#5569 MTS Integration with wl#5599 recovery for MTS and fixing two asserts. One is due to missed cleanup of errored-out rows-events; the other is a work-around on w->curr_group_exec_parts->dynamic_ids is initialized to have one partition on the Worker startup, but it should not. @ sql/log_event.cc Propagating CP related info from C to W. @ sql/rpl_rli.cc Added a part of CP info from C to W propagation. @ sql/rpl_rli.h New members to RLI due to CP info from C to W propagation. @ sql/rpl_rli_pdb.cc Worker stores the new CP to mention it in flush_info() along with (todo) a bitmap of the executed groups within the checkpoint interval. @ sql/rpl_rli_pdb.h New members to a transport and the Worker class due to CP info. @ sql/rpl_slave.cc missed cleanup of errored-out rows-events; work-around on w->curr_group_exec_parts->dynamic_ids is initialized to have one partition on the Worker startup, but it should not. ------------------------------------------------------------ revno: 3236 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 13:59:07 +0000 message: WL#5599 Fixed warning messages. ------------------------------------------------------------ revno: 3235 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 12:59:07 +0000 message: WL#5599 Fixed test cases. ------------------------------------------------------------ revno: 3234 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 01:30:32 +0000 message: WL#5599 Fixed failures in test cases. ------------------------------------------------------------ revno: 3233 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 00:33:48 +0000 message: merge mysql-trunk --> mysql-next-mr-wl5569 Conflicts: . mysql-test/r/log_tables_upgrade.result . mysql-test/r/mysql_upgrade.result . mysql-test/r/mysql_upgrade_ssl.result . mysql-test/r/mysqlcheck.result . mysql-test/suite/perfschema/r/pfs_upgrade_lc0.result . mysql-test/suite/rpl/t/disabled.def . mysql-test/suite/sys_vars/r/all_vars.result . mysql-test/t/system_mysql_db_fix40123.test . mysql-test/t/system_mysql_db_fix50030.test . mysql-test/t/system_mysql_db_fix50117.test . sql/log_event.cc . sql/log_event.h . sql/rpl_mi.h . sql/rpl_slave.cc . sql/share/errmsg-utf8.txt ------------------------------------------------------------ revno: 3232 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-07 20:01:39 +0200 message: manual merge with a piece of recovery support on repo. rpl_parallel hits an assert that Alfranio is fixing ------------------------------------------------------------ revno: 3231 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-07 19:35:16 +0200 message: wl#5569 MTS Testing related fixes incl master_pos_wait() support and thereafter replacing sleeps with the functioning sync_slave_with_master; Fixing the limitted Q-log-event parallelization. After the fixing mixture of rows- and Q- transactions can run concurrently. Q-transaction will be treated sequentially by default. @ mysql-test/suite/rpl/r/rpl_parallel.result results updated. @ mysql-test/suite/rpl/r/rpl_sequential.result results updated. @ mysql-test/suite/rpl/t/disabled.def a nuisance test gets disabled. @ mysql-test/suite/rpl/t/rpl_parallel_conf_limits.test sleeps go away. @ mysql-test/suite/rpl/t/rpl_parallel_conflicts.test sleeps go away. @ mysql-test/suite/rpl/t/rpl_parallel_start_stop.test sleeps go away. @ sql/log_event.cc Fullfiling long perding time todo:s wrt update_pos and delete ev, update_pos() is redundant being superseded by a special commit of the Worker; Addressing {B, Q, T} not-parallel case. The issue was due to unability to support Q-log-event as quickly as Rows- parallelization. @ sql/rpl_rli_pdb.cc circular_buffer_queue::de_tail() a very specific method is motivated by the limitted support for Q-log-ev parallelization. It may happen to be unnessary once Q has become parallel. @ sql/rpl_slave.cc Implementing CP in successful read branch. ------------------------------------------------------------ revno: 3230 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2010-12-05 22:04:17 +0200 message: wl#5569 WL#5599 MTS & recovery Refining and correcting two wl:s integration. The main achievement is events execution status is consistently recorded into the Worker and the central RL recovery tables. That was tested manually in rather agressive env where IO was used to reconnect randomly and load from Master contained Rotate events. TODO: to fix: rpl.rpl_parallel_conf_limits may not pass to address: Multi-stmt Query-log-event transaction case (see todo in sources). to destruct by Workers their executed events (was deferred until ev->update_pos started working). (Alfranio) to deploy mts_checkpoint_routine() call inside the successful event read branch of next_event(). Otherwise no calling happens when Coord is constanly busy with read/distribute. ------------------------------------------------------------ revno: 3229 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-04 19:14:50 +0200 message: merging from the repo wl5569 ------------------------------------------------------------ revno: 3228 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-04 15:45:02 +0000 message: Added mutex to the checkpoint_routine. ------------------------------------------------------------ revno: 3227 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-03 16:56:11 +0000 message: Implemented periodic checkpoint if parallel slave is enabled. ------------------------------------------------------------ revno: 3226 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-03 10:15:45 +0000 message: Fixed commit_positions() and removed unnecessary checkpoint thread. ------------------------------------------------------------ revno: 3225 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-02 20:13:12 +0200 message: manual merge to wl#5569 tree ------------------------------------------------------------ revno: 3224 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-02 19:46:46 +0200 message: wl#5569 MTS User interface related: set @@global.slave_parallel_workers= `non-zero` following with `START SLAVE` starts slave with so many Worker threads. That is non-zero value is defacto the slave parallel execution mode. Earlier introduced enum enum_slave_exec_mode SLAVE_EXEC_MODE_PARALLEL is withdrawn. Fixes rli->mts_pending_jobs_size statistics which might cause assert-crash otherwise. a silly c&p mistake of relay-log name change notification. Made a little clean-up including relocation of init-ion of workers related stuff into start_slave_workers(). Many changes in tests due to SLAVE_EXEC_MODE_PARALLEL and not only. @ sql/log_event.cc few asserts are motivated by a silly c&p mistake of relay-log name change notification. Fixing rli->mts_pending_jobs_size statistics which might cause assert-crash otherwise. @ sql/rpl_rli.cc relocating Worker related stuff from init at RLI constructor to the start slave workers; Intruduced a public Relay_log_info::reset_notified_relay_log_change() to call when C discovers the relay log name change (next_event). @ sql/rpl_rli.h the server @@global.slave_parallel_workers has affect only when the slave is stopped. At start the var's value is copied to rli::slave_parallel_workers and this value is in used in the slave session time. Refining is_parallel_exec() to base on the rli's value; @ sql/rpl_rli_pdb.cc Fixing a c&p bug for relay-log name; @ sql/rpl_slave.cc removing ealier intruduced extra rli->slave_worker_is_error; relocating Worker related stuff from init at RLI constructor to the start slave workers; @ sql/sql_class.h removing explicit slave exec paral mode. @ sql/sys_vars.cc changing default 4 to 0 for slave_parallel_workers. Non-zero value means so many Worker threads is launched. Conversely zero is the sequential slave execution mode. Fixing the name of the server var: mts_partition_hash_soft_max. ------------------------------------------------------------ revno: 3223 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-01 19:08:21 +0200 message: wl#5569 MTS The limit conditions such as WQ len, total WQ:s size related changes. Also a new test file is added. @ mysql-test/suite/rpl/r/rpl_parallel_conf_limits.result new results file. @ mysql-test/suite/rpl/r/rpl_parallel_conflicts.result results updated. @ mysql-test/suite/rpl/t/rpl_parallel_conf_limits.test Testing two RAM usage by Workers limit parameters. @ mysql-test/suite/rpl/t/rpl_parallel_conflicts.test Converting an assert into wait for that condition. Todo: improve the test to let it run with slave_run_query_in_parallel. @ sql/log_event.cc limit condition (wq len, total wql sizes) related changes. fixing a compilation warn. @ sql/mysqld.cc renaming. @ sql/mysqld.h renaming. @ sql/rpl_rli.cc renaming. @ sql/rpl_rli.h s / slave_max_pending_jobs / opt_mts_slave_worker_queue_len_max / the new name is supposed to indicate the purpose of the entity more clearly. @ sql/rpl_slave.cc renaming. @ sql/sys_vars.cc renaming. ------------------------------------------------------------ revno: 3222 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-30 16:39:40 +0200 message: merging from from wl#5569 repo containing wl#5599 integration ------------------------------------------------------------ revno: 3221 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-30 16:02:15 +0200 message: wl#5569 MTS Fixing group_relay_log_name change propagation from C to W; Garbage collection in the Partition-to-Worker hash is added with a parameter of how many records in the hash are tolerated w/o checking of the usage counter. Adding C-W synchronization due to: - overall WQ:s data max - hitting the limit of a WQ length Adding Flow Control infrastructure with - level of the hungry Worker forcing Coordinator to distribute eagerly symmetrically a Worker whose load is more than 100 % - hungry level is considered as fed-up. - nap time for C in case all WQ:s lengths are above the level. - a weight param to the base nap as a function of the number of fed-up W:s. TODO: UNTIL to force sequential exec; To fix ROWS_QUERY_LOG_EVENT corner case; to fix commented out // if (!ev) delete ev; after wl#5599 is merged (ev->update_pos() is done). @ sql/log_event.cc changes due to FC and WQ:s data size and WQ-lenght synch-ions; @ sql/mysqld.cc placeholders for few mts user interfaces variables are added. @ sql/mysqld.h mts user interfaces variables are interfaced. @ sql/rpl_rli.cc Correcting a cast that otherwise would not let relay log change be seen by Worker. @ sql/rpl_rli.h A set of user options is reflected by new members of the central RLI. A user var propagates its value into RLI at slave sys startup and can't affect the running slave anymore until the slave is stopped. @ sql/rpl_rli_pdb.cc Garbage collection in the Partition-to-Worker hash. @ sql/rpl_rli_pdb.h Exetending Slave_jobs_queue::waited_overfill. and Slave_worker::wq_overrun_set. . Overfill is seen as the queue's property whereas wq_overrun_set is about C-W flow-control. @ sql/rpl_slave.cc Initialization of the mts user option in the central RLI is added. Fixing a cast; Todo about ROWS_QUERY_LOG_EVENT; Comments on UNTIL forcing the sequential exec; @ sql/sys_vars.cc A set of mts related user options is added. ------------------------------------------------------------ revno: 3220 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-11-27 17:36:50 +0200 message: wl#5569 Providing relay-log name for wl#5599. Protocol of action on the C and W sides is described in rpl_rli_pdb.h. Erroring out in case of parallel exec and ROWS_QUERY_LOG_EVENT. (todo: the native sequential mode for the event needs some revision, in particular `delete ev' shall happen *always* in rli->cleanup_context not in two places as of current). @ sql/log_event.cc Erroring out in case of parallel exec and ROWS_QUERY_LOG_EVENT; Deploying C role of handling relay-log name change; @ sql/rpl_rli_pdb.cc Providing relay-log name for wl#5599. Freeing allocated memory for relay-log name at the end of the group execution by Worker. @ sql/rpl_rli_pdb.h Protocol of action on the C and W sides is here. Removing current_binlog; Adding a pointer group_relay_log_name member to st_slave_job_group. ------------------------------------------------------------ revno: 3219 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-26 23:08:30 +0200 message: wl#5569 MTS Partitioning conflict detection and handling is implemented. A new option to run Query in parallel though incompatibly with Rows- case in that the default db not the actual db:s are used as the partition key. User interface gained the global var and the cmd line opt: slave_run_query_in_parallel (Welcome to the set! :-) @ mysql-test/suite/rpl/r/rpl_parallel_conflicts.result new tests result file is added. @ mysql-test/suite/rpl/t/rpl_parallel_conflicts.test Partitioning conflicts detection, handling basic initial test is added. @ sql/log_event.cc Refining parallel vs sequential decider to cover optional support for Query parallelization. @ sql/log_event.h Refining only_serial_exec() with providing hints through two new args. @ sql/mysqld.cc new Query limited parallelization support related. @ sql/mysqld.h new Query limited parallelization support related. @ sql/rpl_rli.h changed are due to new Query limited parallelization support. @ sql/rpl_rli_pdb.cc Conflict detection, waiting, partition release is implemented. ------------------------------------------------------------ revno: 3218 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-andrei timestamp: Fri 2010-11-26 16:15:37 +0000 message: There was a mismatching between the number of fields read and write and by consequence the read was failing for the Slave_worker. ------------------------------------------------------------ revno: 3217 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-25 11:03:54 +0200 message: wl#5569 merging with wl#5599 piece of code ------------------------------------------------------------ revno: 3216 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-25 10:47:39 +0200 message: wl#5569 Converting the prototype time db2w hash to be concurrent; Necessary inruduction of the least occupied Worker notion. It's currently computed as Worker having the least number of distributed partitions. Adding parallel support for Query_log_event; caution: 1. the session/default not the actual db as the key 2. may not have been tested against all use cases (e.g int vars) Fixing slave stop issues. @ sql/log_event.cc dding parallel support for Query_log_event that forces changes in both Coord and Worker scopes; a query can have both B and g parallel properties. @ sql/rpl_rli.h Changes necessary for the concurrent hash. Although east occupied defined as one having the least number of partitions atm, that may be too coarse so a method basing on distributed jobs can be deployed in a while. @ sql/rpl_rli_pdb.cc Least occupied defined as one having the least number of partitions atm (may be too coarse so a method basing on distributed jobs can be deployed in a while). @ sql/rpl_rli_pdb.h Changes necessary for the concurrent hash and the parallelizable query-log-event. @ sql/rpl_slave.cc rli->least_occupied_workers is prepared to be used in the least occupied calc as a finer option. Improving Workers stop. ------------------------------------------------------------ revno: 3215 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-11-22 20:57:13 +0200 message: wl#5569 extinding futher interfaces to wl#5599 with propagating future_event_relay_log_pos to the Worker exec context. @ sql/log_event.cc extract the stored future_event_relay_log_pos to copy to worker rli. @ sql/rpl_slave.cc Store future_event_relay_log_pos into event member. ------------------------------------------------------------ revno: 3214 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-11-20 19:23:42 +0200 message: wl#5569 MTS Worker pool start, stop, kills, error out implementation. @ mysql-test/extra/rpl_tests/rpl_parallel_load.test increasing the load param to get more reliable benchmarking data out of the test. @ mysql-test/suite/rpl/r/rpl_parallel_start_stop.result a new tests results. @ mysql-test/suite/rpl/t/rpl_parallel_start_stop.test worker pool start, stop, kills, errors testing. @ sql/log_event.cc removing a false and unnessary extention-arg to exit_cond(); Refining start-stop alg to base on the Worker private info, not the common info. In particular handshakes organized through magic value of length of the Worker private queue to is set by an initiator. @ sql/rpl_slave.cc Starting a worker thread with passing its Slave_worker * pointer. Simplifying and refining start-stop. @ sql/sql_class.h removing a false and unnessary extention-arg to exit_cond(); @ sql/sys_vars.cc Reckoning a magic value outside of the valid range for pending_jobs. ------------------------------------------------------------ revno: 3213 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-19 16:51:58 +0200 message: wl#5569 recovery interfaces for wl#5599 implementation. The essence of this patch is to provide GAQ object implimentation and valid life cycle. The checkpoint handler prior to call store methods of wl#5599 is supposed to invoke rli->gaq->move_queue_head(&rli->workers). See a simulation of that near ev->update_pos() of the mail sql thread loop. The checkpoint info is composed as instance of Slave_job_group to reside as rli->gap->lwm. Todo: uncomment + // delete ev; // after ev->update_pos() event is garbage once the real checkpoint has been done. Todo: the real implemention needs to take care of filing Slave_job_group::update_current_binlog as initially so at time of executing Rotate/FD methods. + // experimental checkpoint per each scheduling attempt + // logics of next_event() + + rli->gaq->move_queue_head(&rli->workers); @ sql/log_event.cc Log_event::get_slave_worker_id() got shaped more to the final version with elements necessary to rli->gaq lify cycle. @ sql/log_event.h Log_event::mts_group_cnt is added as a part of GAQ index propagation path from C to W. @ sql/rpl_rli.h Further extension to RLI necessary to the distribution hash function (APH). @ sql/rpl_rli_pdb.cc Implementing circular_buffer_queue::*queue and few other methods incl ulong Slave_committed_queue::move_queue_head() the main concern for checkpoint. @ sql/rpl_rli_pdb.h Extending classes with few new member definitions necessary for GAQ interface / checkpoint / recovery. @ sql/rpl_slave.cc Simulation of the lwm-checkpoint and changes due to rpl_rli_pdb classes extensions. ------------------------------------------------------------ revno: 3212 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-18 16:50:54 +0200 message: wl#5569 wl#5599 Recovery related. Prototyping the worker RLI instantiation, to be elaborated on. ------------------------------------------------------------ revno: 3211 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-18 16:00:52 +0200 message: wl#5569 MTS Extending the wl#5563 prototype gradually. This commit addresses: 1. recovery interface (a new Worker rli plus rli->gaq and pseudo-code for checkpoint to update GAQ and the central RLI recovery table. Wrt rli, C and W execute do_apply_event(c_rli) where c_rli is the central instance. C executes update_pos(c_rli), but W update_pos(w_rli). others: - decreased processing time for rpl_parallel, serial. @ sql/log_event.cc Enhance Log_event::get_slave_worker_id() to classify events by set of parallelization properties; Presence of a property in an event forces some actions both on C and W side. en_queue etc are prepared to turn into circular_buffer_queue methods. Pseudo-coded numerious todo:s wrt to low-level-design implementation. Deployed changes due to Worker private rli. Annotated on Deferred Array for B,p,r property events. . delete ev is moved from C to W which is fault-prone but it could not be kept any longer as a part of de_queue() that transits into cir_buf_queue class. @ sql/log_event.h removed `soiled' that was used to make delete ev run safely. Added Log_event methods identifying the parallelization properties, incl - contains_partition_info() to identify events containing info to be processed by the partition hash func - starts,ends_group() - also updated the list of only_serial(). @ sql/rpl_rli.cc Only Coordinator can destroy Workers dynarray; Relay_log_info::get_current_worker() turned out to become more complicated, see comments; Reminder to migrate rli->future... into ev-> future_event_relay_log_pos which would make Worker to find the value out the event's context; Prototyped // w->flush_info() in stmt_done; @ sql/rpl_rli.h The worker RLI has `this_worker' pointing to the actual worker instance. @ sql/rpl_rli_pdb.cc Annotated with fine details APH etc implementation. @ sql/rpl_rli_pdb.h Trasformed earlier queue struct into a family of classes. Recovery interface: last_group_done_index of Slave_worker to be filled in with an index of GAQ queue by W. To poll the value by C at checkpoint. Added CGEP to W context (sim to CGAP of C). @ sql/rpl_slave.cc Simplified the Worker poll. Deployed worker rli initialization. Recovery: rli->gaq is instantiated by C at worker poll activization. Recovery: pseudo-code for checkpoint in next_event(). @ sql/sys_vars.cc editted help lines for slave_max_pending_jobs. ------------------------------------------------------------ revno: 3210 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2010-11-14 11:55:32 +0000 message: Post-push fix for WL#5599. ------------------------------------------------------------ revno: 3209 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-12 17:58:12 +0000 message: Post-push fix for WL#5599. ------------------------------------------------------------ revno: 3208 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-11 11:53:01 +0000 message: WL#5599 The patch changed the handler's functions, i.e. init_info, check_info, flush_info, remove_info and end_info and the related private member functions, in both file and table handlers, to accept an index that identifies the information that will be read or written. This is necessary now because the handlers will be used by the workers to read and write information from file(s) and table and there may be several workers running at the same time and thus an index is used to identify the worker that is accessing information. This change is also necessary for the multi-master replication as information from each master must be uniquely identified. @ sql/binlog.cc Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/log_event.cc Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/rpl_constants.h Introduced an array and a variable that states the array's size and are used as parameters to init_info, check_info, flush_info, remove_info and end_info. . This is ok for now as we assume a single master and uses slave's id to identify entries in a system table, if there is any. However, this code needs to be changed when we start handling multi-master replication. @ sql/rpl_info.cc Introduced an array and a variable that states the array's size and are used as parameters to init_info, check_info, flush_info, remove_info and end_info. . This is ok for now as we assume a single master and uses slave's id to identify entries in a system table, if there is any. However, this code needs to be changed when we start handling multi-master replication. @ sql/rpl_info.h Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/rpl_info_factory.cc Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. . Removed static references to field indexes used as primary key. @ sql/rpl_info_factory.h Removed static references to field indexes used as primary key. @ sql/rpl_info_file.cc Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/rpl_info_file.h Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/rpl_info_handler.h Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/rpl_info_table.cc Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. . Changed the calls to find_info_for_server_id. @ sql/rpl_info_table.h Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/rpl_info_table_access.cc Changed the funciton find_info_server_id in order to put the cursor at a row identified by uidx which is an array of fields that composes a primary key. . The name of the function was also changed to reflect the new behavior. @ sql/rpl_info_table_access.h Changed the funciton find_info_server_id in order to put the cursor at a row identified by uidx which is an array of fields that composes a primary key. . The name of the function was also changed to reflect the new behavior. @ sql/rpl_mi.cc Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. . Moved the call to handler->flush_info from write_info to flush_info in order to avoid passing uidx and idx as parameters. @ sql/rpl_mi.h Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/rpl_rli.cc Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/rpl_rli.h Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/rpl_rli_pdb.cc Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. . Moved the call to handler->flush_info from write_info to flush_info in order to avoid passing uidx and idx as parameters. @ sql/rpl_rli_pdb.h Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/rpl_slave.cc Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. ------------------------------------------------------------ revno: 3207 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-11-10 10:57:13 +0000 message: Refactory to start work on WL#5599. ------------------------------------------------------------ revno: 3206 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 13:34:18 +0000 message: Removed mysql-test/collections/mysql-next-mr.crash-safe.* in the WL#5569. ------------------------------------------------------------ revno: 3205 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 13:04:14 +0000 message: merge mysql-next-mr.crash-safe --> mysql-next-mr-wl5569 Conflicts: . sql/CMakeLists.txt . sql/Makefile.am . sql/sql_class.h . sql/rpl_slave.cc ------------------------------------------------------------ revno: 3204 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 11:39:37 +0000 message: merge mysql-next-mr-wl5563-labs --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3203 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 23:33:37 +0300 message: wl#5563 simplifying memory handling for the Coor-Workers transport to avoid sporadic crashes ------------------------------------------------------------ revno: 3202 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 21:19:56 +0300 message: wl#5563 leaving out a fine garbage collection. That task is unnessary to solve at prototyping time. Update-pos routine to be implemented is going to eliminated that piece of code ------------------------------------------------------------ revno: 3201 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 20:38:35 +0300 message: wl#5563 Extending the tests base to split the former rpl_parallel into two to run in serial exec mode as well. @ sql/log_event.cc Condition-out few debug purpose print:s. ------------------------------------------------------------ revno: 3200 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 11:49:00 +0300 message: wl#5563 improved test; fixed a delete issue that was used to crash; added @@global.slave_local_timestamp to fill in timestamp col slave clock value. Performance growth can be seen through the test. todo: merge with Alfranio work on hashing and dyn alloc of PFS obj:s. ------------------------------------------------------------ revno: 3199 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Wed 2010-09-15 14:51:49 +0300 message: wl#5563 tests for the wl. Number of workers and iterations can be tuned. todo: convert as param:s to pass to the test through mtr ------------------------------------------------------------ revno: 3198 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Mon 2010-09-13 18:22:41 +0300 message: wl#5563 adding an ingeneous no-stress-attempting-yet test that also fired an assert. Refined the Worker instance ref computing because cleanup_context() is executed by the sql-thread the coordinator as well ------------------------------------------------------------ revno: 3197 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Mon 2010-09-13 13:15:38 +0300 message: wl#5563 Rows-event parallelization basically is implemented although tested shallowly. Write access to rli central stuct by workers may not be eliminated fully at this phase. E.g that relates to errors. todo: to prove rli gets out of Worker scope todo: to provide a stress test @ sql/log_event.cc changing from the direct to api-based access to RBR-applying context. @ sql/rpl_rli.cc implementation of RBR-applying context api. @ sql/rpl_rli.h copying RBR applying part of context info from rli to the Worker class; RLI gets accessor methods to RBR applying context to choose a right object from either the central (RLI) or the Worker repository. ------------------------------------------------------------ revno: 3196 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Sat 2010-09-11 17:00:08 +0300 message: wl#5563 adding Rows-event limitted to one Worker support. Deferred deletion did not check emptyness of the list ------------------------------------------------------------ revno: 3195 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-10 20:36:07 +0300 message: wl#5563 correcting comments to indicate less limitations ------------------------------------------------------------ revno: 3194 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-10 20:32:39 +0300 message: WL#5563 Prototype for Slave parallelized by db name More progress to the WL in that the STMT binlog-format works while the conceptual limits are held. That is no query/transaction is allowed to deal with more than one db. Addressed a complication in that update pos method that is run by Coordinator belongs to Log_event hierarchy and therefore the event deletion now by Worker must be careful. Todo: 1. (High prior) fix Row-format complications 2. (Hight prior) Elaborate on the hash function to be a function on db text name 3. (Optional) Consider moving update_pos to the RLI class to get rid of the delete logics complication. How-to-use: The instuction can be found in comments of the previous commit, see there for more details. In brief though, the db names have to follow a pattern: `test[0-9]'. E.g test0, test1, test2, test3 for the default four Worker threads. Slave side has to set @@global.slave_exec_mode=PARALLEL; before START SLAVE. @ sql/log_event.cc Hashing function is refined to circumvent lack of db info in associated with a query event internal events; Executed event can't be just delete-d by Worker since SQL-thread needs it to update positions. Hence a piece of code added to defer delete till time SQL has marked the event as `soiled'. @ sql/log_event.h A new member to allow deletion of an executed event. @ sql/rpl_rli.h A new member to Worker class for deferred delete of the exec-d event. A new member to RLI class to memorize the last time assigned worker. @ sql/rpl_slave.cc Setting marks on event by SQL coord thread. ------------------------------------------------------------ revno: 3193 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Thu 2010-09-09 21:43:16 +0300 message: WL#5563 Prototype for Slave parallelized by db name This is an intermediate commit that indicates some progress. Namely, the worker pool operates correctly and with signs of scalable performance. How to test: connection master; set @@global.binlog_format=statement; connection slave; set @@global.slave_exec_mode=PARALLEL; set @@global.binlog_format=mixed; show processlist; => IO, SQL threads + 4 workers by default change master to ... connection master; create database test0; create database test1; create database test2; create database test3; # create databases with magic names "test[0-9]+", where the number will index # a worker. create database test0; create database test1; create database test2; create database test3; # create tables. they are only of MyISAM type for now use test0; create table tm_1(a int, b int) engine=myisam; use ... # DML on tables: use test[0-3]; insert into tm_1 values (1,0); ... ... connection slave; # monitor CPU (visually this time: top etc) # check correctness e.g select count(*) from test[0-3].tm_1; connection master; select count(*) from test[0-3].tm_1; @ sql/log_event.cc mts coordinator (C) and worker (W) code is added. C - fills in a job assignment and queues it to a selected via get_slave_worker_id (todo: to elaborate on) W private queue; W - spins in wait-extract-exec loop of slave_worker_exec_job(); Redefining Log_event::apply_event() to continue serving as the usual serial applier and making it to distribute events between Workers. @ sql/log_event.h Log_event accepted get_slave_worker_id() to assign a worker basing on hashing function implemented in the method, only_serial_exec() to filter out not-parallelizable events; do_apply_event() made of public scope; @ sql/mysqld.cc PSI interfaces required to add keys for all mutex:s, cond:s that MTS introduces to sources. Only for the prototype implementation declaration contain explicit max index in arrays (16) - to be elaborated on in following patches. `slave_parallel_workers' as placeholder of the value for a new glob sysvar is added. @ sql/mysqld.h externalization to access `slave_parallel_workers' and the PSI keys in other parts of the code. @ sql/rpl_rli.cc Instantiations, initial allocations and destruction for RLI members added by MTS; @ sql/rpl_rli.h Definitions for the Worker, its communication with Coordinator and gathering statistics; is_parallel_exec() is a compromize because of unreported yet bug (see comments). @ sql/rpl_slave.cc Added the worker pool initialization, termination. Added the thread handler for Worker. @ sql/rpl_utility.h macros that can be used in near future are added. @ sql/sql_class.h More values to the slave_exec_mode set (which mistakenly is defined as sys_var_enum); refining exit_cond() to allow the mutex not be released. The default behaviour when one arg supplied is not changed. @ sql/sys_vars.cc A new global sysvar for number of workers. Is supposed to be updatedable in run time. todo: (bug report) notice static enum Slave_exec_mode - it should be Sys_var_*set*. ****** wl#5569 MTS fixing explicit error code in rpl_parallel_start_stop that changed due to merge with trunk. ****** WL#5596 MTS Here is the total cset combining all revisions done since Sep 2010. Comments from the original commits are pasted in reverse chronological order. ------------------------------------------------------------ revno: 3364 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 17:09:22 +0300 message: wl#5569 MTS Refining rpl_rotate_logs that could not produce deterministic output. The list of binlogs contained one binlog more than expected. ------------------------------------------------------------ revno: 3363 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 14:56:01 +0300 message: updating result files that were left incorrect by the last merge. ------------------------------------------------------------ revno: 3362 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 14:44:59 +0300 message: wl#5569 MTS Failure in recovery when binlog-checksum is active. The reason of the failure was in that parsing of relay log by MTS recovery gaps computing did not make sure to use the relay-log own FormatDescriptor events that contain checksumming info for all events in the log. Fixed with taking care to find out the checksum algorithm for every relay log as the first step of MTS recovery gaps computing. ------------------------------------------------------------ revno: 3361 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-08-17 11:21:23 +0300 message: merge from trunk forced to resolve few semantical conflicts caused by changes in THD::enter_cond() of the trunk. ------------------------------------------------------------ revno: 3360 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-27 08:56:14 +0100 message: Fixed failure in test rpl_mts_check_concurrency when running in the mts collection. ------------------------------------------------------------ revno: 3359 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-26 19:46:41 +0100 message: Added a test case that checks if MTS allows to concurrently access the replication tables, and as such, concurrently commit transactions that update different databases. ------------------------------------------------------------ revno: 3358 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 20:08:43 +0100 message: Configured rpl_parallel_switch_sequential to run in row and mixed mode to avoid cluttering the error log with messages on unsafe execution. ------------------------------------------------------------ revno: 3357 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 19:02:14 +0100 message: This patch contains the following fixes: . Removed suppressed warning introduced in the wrong test case (i.e. rpl_corruption) and put it in the correct one (i.e. rpl_row_corruption). . Introduced variable to avoid clutering the error log with several warning messages on unsafe execution. ------------------------------------------------------------ revno: 3356 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 11:01:12 +0100 message: This patch has the following changes: . Specific directories were created for the MTS runs in the default.push. . Warning message was suppressed in the rpl_corruption.test. . Annoying debug outputs were removed from the error log. However, this is a temporary solution as it forbids to enable traces. ------------------------------------------------------------ revno: 3355 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-20 11:56:40 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3354 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-19 22:26:30 +0300 message: wl#5569 MTS valgrind reported a stack on rpl_savepoint. The problem appears to be in that at computing slave_sql_running_state in show_mater_info() the sql thread proc_info pointer could refer to a value in a stack that has already gone. Fixed with making proc_info to point to a string literal. ------------------------------------------------------------ revno: 3353 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-19 17:46:43 +0100 message: Suppressed warning messages that could potentially cause problems while running mts crash safe test cases. ------------------------------------------------------------ revno: 3352 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-18 21:46:45 +0300 message: wl#5569 MTS Cosmetic changes are done to address readability and clearness of source code of the MTS patch. ------------------------------------------------------------ revno: 3351 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-18 14:52:44 +0300 message: wl#5569 MTS Inadvertently introduced hunk two rev:s back is reverted to please rpl_*_mts_crash_safe. ------------------------------------------------------------ revno: 3350 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-17 00:51:45 +0300 message: wl#5569 MTS fixing build issue for embedded. Public visibility for Rows_log_event::do_apply_event() is restored. ------------------------------------------------------------ revno: 3349 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-16 20:08:31 +0300 message: wl#5569 MTS The patch contains improvements after code review. Changes are mostly consmetic. ------------------------------------------------------------ revno: 3348 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-16 02:11:11 +0300 message: bug#12755663 MTS: RPL_CIRCULAR_FOR_4_HOSTS FAILS: CANT EXECUTE THE CURRENT EVENT GROUP MTS stopped with an error in the middle of the test. The reason is that a group of events from the slave itself was processed partly to modify the group position. In the following restart the wrong group bondary made slave either to error out or assert. Fixed with locating a possible race condition allowin Coordinator to ignore actual failed status of a Worker. So in the case of the test, the slave server group can't be started. Notice, this is a trial patch since I can't catch the failure on available to me hosts at all. ------------------------------------------------------------ revno: 3347 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-07-14 12:40:06 +0300 message: WL#5569 MTS further extensive rpl_circular_for_4_hosts exersices with --repeat 10 --parallel=8 revealed a race condition in that Coordinator might miss to catch not-running status for a Worker. That made Coordinator to skip only a part of a group of the slave server own events so the slave stops at not the bondary of a group. Fixed with moving marking of the errored-out Worker as failed prior to its APH entries release. TODO: notice there can be a possibility to stop at not the boundary due to graceful STOP SLAVE if one is run at time of skipping self-originated events. However this issue belongs to STS and might be similar with BUG@12604951 and BUG@12728160. ------------------------------------------------------------ revno: 3346 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-07-14 08:03:55 +0100 message: Post-push fixes for WL#5569 Injecting faults while updating a myisam table requires to flush the changes before committing suicide. So we have introduced the follwing code: DBUG_EXECUTE_IF("crash_after_commit_and_update_pos", - DBUG_SUICIDE();); + sql_print_information("Crashing crash_after_commit_and_update_pos."); + flush_info(TRUE); + DBUG_SUICIDE(); Besides we improved some comments. ------------------------------------------------------------ revno: 3345 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-13 16:23:57 +0100 message: WL#5569 ------------------------------------------------------------ revno: 3344 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-13 00:10:43 +0300 message: wl#5569 MTS merge trunk -> wl5569-tree ------------------------------------------------------------ revno: 3343 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-12 23:36:17 +0300 message: wl#5569 MTS adding suppression due to expected warning to rpl_circurlar_for_4_hosts; decreasing a loop limit in rpl_parallel_switch_sequential in case of statement format. ------------------------------------------------------------ revno: 3342 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-12 14:46:23 +0300 message: WL#5569 MTS Fixing code and test due to rpl.rpl_circular_for_4_hosts mismatch failure, like http://pb2.norway.sun.com/?action=archive_download&archive_id=3608382. The reason of the mismatch was that when having two group of events to execute, the first for a Worker and the 2nd for Coordinator, Coordinator waited for the 1st group completion but did not verify success of synchronization. So in a case of the failed applying of the 1st group processing of the 2nd could find an inconsistent state to end up with a segfault (even though only the mismatch has been seen so far). ------------------------------------------------------------ revno: 3341 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-10 22:40:01 +0100 message: Avoiding busy waiting when running mts recovery tests. ------------------------------------------------------------ revno: 3340 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-09 23:11:58 +0100 message: Removed --slave-checkpoint-period from MTS test cases. ------------------------------------------------------------ revno: 3339 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-09 23:08:07 +0100 message: Improved test cases for the WL#5569. ------------------------------------------------------------ revno: 3338 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 22:40:52 +0300 message: wl#5569 MTS The patch refines logics of applying phase of MTS-recovery to always applying events that are for Coordinator; fixes few tests to make them passable on PB; makes GAQ size to be of checkpoint_group value. ------------------------------------------------------------ revno: 3337 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 07:54:34 +0100 message: Reduced the timeout period to run the checkpoint routine by setting slave-checkpoint-period to 30. ------------------------------------------------------------ revno: 3336 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 07:44:35 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3335 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-06 12:46:05 +0300 message: wl#5569 MTS refining wait for db-hash entry release at event distribution. The graceful STOP is not accepted at this point so Coordinator continues to stay in a loop. ------------------------------------------------------------ revno: 3334 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-05 20:43:04 +0300 message: bug#12719875 possible MTS recovery issue. MTS stopped with an error after failing to apply an event. It turned out that the event was sceduled incorrectly due to earlier stop by Single-Threaded Slave not at the group boundary but rather in the middle of it. Fixed with forcing CREATE..SELECT be logged as two groups. The CREATE-TABLE group is surrounded with its own BEGIN/COMMIT braces. ------------------------------------------------------------ revno: 3333 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-04 18:14:09 +0300 message: wl#5569 MTS Adding a rule to run PB with all suites in MTS with binlog-format ROW. ------------------------------------------------------------ revno: 3332 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-03 23:29:34 +0300 message: wl5569 MTS cleanup in one file. ------------------------------------------------------------ revno: 3331 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-03 23:16:02 +0300 message: wl5569 MTS bzr commit mail address changed; a minor cleanup to make mts_is_worker() with const argument; releasing a test to run in MTS. ------------------------------------------------------------ revno: 3330 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-02 08:58:56 +0100 message: Fixed use of the performance schema in the replication code and concurrency issue in the IO Thread. In particular, the IO Thread was calling flush_master_info without grabbing locks. ------------------------------------------------------------ revno: 3329 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 16:41:35 +0300 message: wl5569 MTS merging from the main repo. ------------------------------------------------------------ revno: 3328 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 15:48:25 +0300 message: wl#5569 MTS the final cleanup patch. There are few glitches that were considered as tolerable at least for time of the total wl's code is being reviewed. That includes: - no support to old load-data events - no support for FK to add to the list, there are few places in the patch that suggests to deploy error branches each time flush_info() is called. ------------------------------------------------------------ revno: 3327 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 13:16:52 +0300 message: wl#5569 MTS The patch cleans up some host of code. ------------------------------------------------------------ revno: 3326 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-28 11:30:18 +0300 message: wl#5569 MTS replacing views with regular tables for consistency verification in rpl_parallel_innodb. Also a minor cleanup in rpl_parallel is done. ------------------------------------------------------------ revno: 3325 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-27 20:31:45 +0300 message: wl#5569 MTS Cleanup and addressing sporadic rpl_temp_table_mix_row failure in post-execution mtr.check_testcase(). The check of the test failure was caused by faulty optimization in avoiding to migrate temporary tables from Coordinator to Workers in case of rows-event assignement. while it's correct with the homogenous rows-event only load, the mixture can fail. Fixed with removing the optimization so map_db_to_worker() always relocates which is somewhat suboptimal and should be improved in future. ------------------------------------------------------------ revno: 3324 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-27 13:12:52 +0100 message: Ensured that updates to the worker_info_repository are transactional and fixed the slave_checkpoint_group_basic test case. ------------------------------------------------------------ revno: 3323 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-26 13:02:59 +0100 message: Fixed test case. ------------------------------------------------------------ revno: 3322 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-06-25 15:14:24 +0100 message: Introduced test case for recovery with MTS and fixed bugs in recovery. ------------------------------------------------------------ revno: 3321 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 15:38:19 +0300 message: wl#5569 MTS This patch makes a bit of cleanup, addresses one memory-allocation todo and completes fixing valgrind report (rpl_parallel_start_stop) due to strings allocation in Slave_job_group items. ------------------------------------------------------------ revno: 3320 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 12:38:34 +0300 message: wl#5569 MTS this patch completes the previous one to fixes a result file and make the innodb specific test verification to base on tables not views. ------------------------------------------------------------ revno: 3319 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 00:11:22 +0300 message: wl#5569 MTS this is an exploratory patch to sort out if verification method what was based on views has its own not related to mts flaw. The patch calls verification macro on the tables that required some adjustment. ------------------------------------------------------------ revno: 3318 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-23 07:56:15 +0300 message: wl#5569 MTS fixing results of mysqld--help-win. ------------------------------------------------------------ revno: 3317 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 19:20:40 +0100 message: merge mysql-next-mr-wl5569 (local) --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3316 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 19:17:43 +0100 message: In some platforms, such as Windows, thread's wait time is stored in 100ns units. However, when computing the difference between two values, the result value was not multiplied by 100. Besides, there was a casting problem when the aforementioned result value was assigned to an ulong. ------------------------------------------------------------ revno: 3315 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 18:54:23 +0100 message: Fixed how mts copes with recovery. ------------------------------------------------------------ revno: 3314 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-21 19:10:54 +0300 message: wl#5569 MTS Fixing valgrind warnings. ------------------------------------------------------------ revno: 3313 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-21 18:15:43 +0300 message: wl#5569 MTS rpl_parallel_start_stop.test could fail sporadicaly with timeout. ------------------------------------------------------------ revno: 3312 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 23:21:56 +0100 message: merge mysql-next-mr-wl5569 (local) --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3311 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 23:19:06 +0100 message: Fixed error when computing the Lower-Water-Mark. If two or more jobs were removed from the Group of assigned jobs and one of the jobs had a non-empty group relay log but the last one had an empty group relay log. The Lower-Water-Mark was not correctly updated, because the algorithm assumed that the group relay log was null. ------------------------------------------------------------ revno: 3310 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 11:52:44 +0100 message: Fixed valgrind errors. Slave_job_group was silently being cast to LOG_POS_COORD while calling sort_dynamic(&above_lwm_jobs, (qsort_cmp) mts_event_coord_cmp) and by consequence mts_event_coord_cmp(LOG_POS_COORD *, LOG_POS_COORD *). This had two problems: . The first two entries in the Slave_job_group were not a pointer to a char * and my_offset. . Even if the first two entries were char * and my_offset, such casting could lead to alignment problems. To fix the problem, we avoid this casting. ------------------------------------------------------------ revno: 3309 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 19:14:50 +0300 message: wl#5569 MTS fixing slave_transaction_retries_basic_64.result ------------------------------------------------------------ revno: 3308 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 16:11:25 +0300 message: wl#5569 MTS fixing tests. ------------------------------------------------------------ revno: 3307 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 12:33:36 +0300 message: wl#5569 MTS Fixing rpl.rpl_mixed_binlog_max_cache_size that revealed incorrect asynchronous handling of a Rotate event which does not split the current group and therefore has to be executed after all previously scheduled events. Fixing sensetivity of two other tests to mtr's invocation environment that includes inital values of slave_parallel_workers and slave_transaction_retries. ------------------------------------------------------------ revno: 3306 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 09:04:19 +0100 message: Fixed some windows failures. ------------------------------------------------------------ revno: 3305 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-06-18 19:58:21 +0100 message: Fixed some recovery issues. ------------------------------------------------------------ revno: 3304 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 21:01:58 +0300 message: wl#5569 MTS fixing tests and a segfault at the end of handle_slave_sql() happened after worker initialization failed (e.g rpl_row_log on win). ------------------------------------------------------------ revno: 3303 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 18:34:16 +0300 message: wl#5569 MTS fixing tests. ------------------------------------------------------------ revno: 3302 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 14:00:41 +0300 message: wl#5569 MTS fixing rpl_row_basic_3innodb similarly to the previous patch. ------------------------------------------------------------ revno: 3301 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 13:51:59 +0300 message: wl#5569 MTS fixing few tests. 1. Policy is implemented for reacting with a warning in a case of failing worker leaves the total slave state with gaps thereby inconsistent. 2. Two tests that were used to time out due to reset master/slave was disabled in there. ------------------------------------------------------------ revno: 3300 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 02:24:59 +0100 message: Removed unnecessary test cases and augment others in order to test recovery. ------------------------------------------------------------ revno: 3299 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-16 19:46:22 +0300 message: wl#5569 MTS fixing slave_parallel_workers_basic and rpl_stop_middle_group which cant run in MTS ------------------------------------------------------------ revno: 3298 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-16 11:29:53 +0300 message: wl#5569 MTS adding new tests to sys_vars.\ ------------------------------------------------------------ revno: 3297 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:41:32 +0100 message: WL#5569 Adding a global suppression for the warning that may appear when stopping the slave sql thread in the middle of a group. This should affect MTS mode only. ------------------------------------------------------------ revno: 3296 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:40:41 +0100 message: WL#5569 Renames worker-info-repository to slave-worker-info-repository in some tests option files. ------------------------------------------------------------ revno: 3295 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:32:37 +0100 message: WL#5569 More test fixes. Removing remaining prefixes 'mts' from mts variables, which have been renamed recently. ------------------------------------------------------------ revno: 3294 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 00:27:20 +0100 message: WL#5569 Fixing rpl_parallel result file. ------------------------------------------------------------ revno: 3293 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:41:33 +0300 message: wl#5569 MTS correcting --slave-parallel-workers in few more files ------------------------------------------------------------ revno: 3292 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:31:46 +0300 message: wl#5569 MTS correcting --slave-parallel-workers in collections/default.push ------------------------------------------------------------ revno: 3291 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:12:11 +0300 message: wl#5569 MTS Cleanup, including 1. decreasing number and renaming system variables. Important for debugging command line options are replaced with reasonble constant values and only necessary are retained. 2. Small encapsulation in ha_blackhole.cc is done. ------------------------------------------------------------ revno: 3290 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 15:59:23 +0100 message: Fixed replication valgring failures caused by the MTS. ------------------------------------------------------------ revno: 3289 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 21:23:13 +0300 message: wl#5569 MTS wl#5754 Query event parallel execution Fixing failing tests and a failure in gathering accessed databases that was caused by a recent merge from trunk. ------------------------------------------------------------ revno: 3288 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 13:35:20 +0300 message: merge from trunk ------------------------------------------------------------ revno: 3287 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 12:27:38 +0300 message: wl#5569 MTS Fixing failing tests due to a. a flaw in `isolated parallel' mode implementation. Isolation applies to a group of event rather than to an instance. And event that contains over-max accessed db:s or event from Old master trigger marking the current being scheduled group. Such group will be executed having all prior scheduled done and nomore will be scheduled until the group is done. b. Notification to Coordinator about errored-out Worker is corrected. ------------------------------------------------------------ revno: 3286 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-12 22:33:32 +0300 message: wl#5569 MTS making default.push to run rpl suite with non-default --mts-slave-parallel-workers > 0 in all three format/mode (row,stmt, mixed). The default is run for all suites in mixed mode and rpl suites with row+ps, stmt formats. ------------------------------------------------------------ revno: 3285 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-12 22:05:05 +0300 message: wl#5569 MTS manual merge with few fixes for segfault of the last merge from the trunk etc, compilation issue on embedded. ------------------------------------------------------------ revno: 3284 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-09 18:35:59 +0100 message: Post-fixes for merge. Fixed compilation in Windows and removed an used options. ------------------------------------------------------------ revno: 3283 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-09 16:27:47 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3282 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-06 13:51:19 +0300 message: wl#5569 MTS STOP SLAVE now stops consistently w/o gaps, KILL shall be used for an urgent stop, an error case behaves like the killed. For instance, a Worker errors out, it sends KILL to Coordinator through THD::awake(), and Coordinator kill the rest through setting a special Worker-running status to killed (which breaks the read-exec loop of a Worker). ------------------------------------------------------------ revno: 3281 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-05 20:01:51 +0300 message: wl#5569 MTS More cleanup, fixes due to found issues when running tests, some improvements incl in stopping Workers to make routine to distinguish between killed and gracefully stopped cases so in the end STOP SLAVE will guarantee consistent state (some todo remains still). ------------------------------------------------------------ revno: 3280 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-05-30 13:05:07 +0300 message: WL#5569 MTS WL#5754 Query event parallel applying ----------------------------------------------------------------- Aggregating 7 commits that are not pushed yet to the wl5569 repo. Find comments for each cset below. ------------------------------------------------------------------ The current patch addresses concurrent updating slave_open_temp_tables status counter. The former declaration of the underlying server variable is changed from ulong to int32. While that might affect (shrink) the actual range, there has been no specified range and now after the number of bits is the same on all platforms the range cat be set to be [0, max(int32)] ****** wl#5569 MTS Wl#5754 Query event parallel appying wl#5599 MTS recovery The patch includes some cleanup, including one for temp tables support, realization of few todo:s. ****** wl#5569 MTS wl#5754 Query event parallel applying More cleanup is done; Fixing temp tables manipulation. Asserting an impossible to support use case of group of events not wrapped with BEGIN/COMMIT. Todo: recognize old master binlog to refuse to run in parallel. ****** wl#5569 MTS Implementation of giving out the applier role to Worker for all cases but ones dealing with the Coordinators state. That includes Query event with over-max-db:s and Load-data related events. The current patch also makes old master binlog be handled by MTS though sometimes e.g for Query event to switch to the sequential mode. Fixing a race condition making C to wait endlessly if a Worker has exitted due to its applying error. ****** wl#5569 MTS correcting an assert that was used to fire as warned in the previous commit. Parallel feature tests pass now. ****** wl#5569 MTS This patch contains cleanup and simplification of logics of handling some events sequentially by Coordinator and adds memory-allocation failure branch to workers starting routine. ****** wl#5569 MTS An intermediate patch to address few issues raised by reviewers. To sum up, it's about cleanup and logics simplification of event distribution to Worker and consequent actions. Some efforts were paid to support Old Master Begin-less group of events. ------------------------------------------------------------ revno: 3279 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-05-24 17:29:35 +0300 message: WL#5569 MTS WL#5754 Query parallel appying Changing implementation of temporary tables support in MTS. Cleanup, fixing few todo:s and few potential issues found. ------------------------------------------------------------ revno: 3278 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-05-19 12:36:28 +0300 message: wl#5569 MTS Support for ROWS_QUERY_LOG_EVENT is added. It required refactoring of its handling in the canonical sequential mode. The event life suggests its behavior similar to objects associated with Table_map, in particural, its destoying to occur at the end-of-statement time. Tested against existing ROWS_QUERY_LOG_EVENT feature tests incl rpl_row_ignorable_event in both sequential and parallel mode. ------------------------------------------------------------ revno: 3277 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-05-16 22:43:58 +0300 message: wl#5569 MTS Simplifying Coordinator-Worker interfaces. In essence after this patch Worker execute events in its private context (class Slave_worker :public Relay_log_info). The only exception is Query referring to temporary table. The temp:s are maintained in the Coordinator's "central" rli; removing some dead code; performing a lot of cleanup. There are few todo items incl: 1. To implement several todo:s scattered across MTS' code and tests (e.g to restore protected for few members of RLI of rpl_rli.h); 2. to cover Rows_query_log_event that currently can cause hanging (e.g rpl_parallel_fallback) 3. To sort out names of classes based on Rpl_info, possibly remove Rpl_info_worker ------------------------------------------------------------ revno: 3276 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-05-06 21:33:32 +0300 message: wl#5569 MTS improving benchmarking test. ------------------------------------------------------------ revno: 3275 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-04-06 15:51:58 +0300 message: wl#5569 MTS Statistics for Workers and Coordinator incl waiting times, sleeping is reported now into the error log as slave stopping time. ------------------------------------------------------------ revno: 3274 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-04-05 19:26:37 +0300 message: wl#5569 MTS restoring previous 4 default workers that rpl_parallel works with. ------------------------------------------------------------ revno: 3273 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-04-03 13:07:30 +0300 message: wl#5569 MTS Benchmarking related patch uniforms rpl_parallel to be run with arbitrary number of workers, db:s, tables, etc. TODO: to restore the final constinency check which is given out temporary while i could not find a way to leave it surrounded with a --dis/en-able* stanza. ------------------------------------------------------------ revno: 3272 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-04-02 14:32:02 +0300 message: wl#5569 MTS a test file for benchmarking is added. Benchmarking results can be gained by extracting the master side generating and the slave side applying times like in the following loop: workers=6; for n in `seq 1 3`; do echo; echo loop: $n; echo; my_mtr.sh --mysqld=--mts-slave-parallel-workers=$workers \ rpl_parallel_benchmark --mysqld=--binlog-format=statement \ && cat /dev/shm/var/mysqld.2/data/test/delta.out >> p${workers}_stmt.out 2>&1; done ------------------------------------------------------------ revno: 3271 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-03-30 17:11:24 +0300 message: wl#5754 Query event parallel execution Small cleanup for comments as requested by reviewer. ------------------------------------------------------------ revno: 3270 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-02-27 19:35:25 +0200 message: WL#5754 Query event parallel execution Bundling together implementation the whole DML+DDL Query parallel support. That includes: The earlierst four rev:s to cut off the DML stage of the parallel query project from the following devoted to DDL. The four skeleton parallel applying of Queries containing a temporary table, and implement a core of the design that is the DML queries. Queries can contain arbitrary features including temp tables. The DDL part also refined few items related to the general low-level design. In particular, of the mark of the over-max db:s in the updated-db:s status var is turned to be another new constant value. The very last patch to the bundle addresses the last review mail notes. ------------------------------------------------------------ revno: 3269 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-01-12 01:01:02 +0200 message: merging from mysql-trunk ------------------------------------------------------------ revno: 3268 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-01-12 00:54:12 +0200 message: wl#5569 MTS fixing the worker threads start/stop. ------------------------------------------------------------ revno: 3267 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-27 18:54:41 +0000 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3266 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-24 01:57:03 +0200 message: wl#5569 MTS the timed-wait loop of SQL thread required a break-through parameter in case the signal missed in action and just timeout would be reported ------------------------------------------------------------ revno: 3265 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 19:03:42 +0200 message: merging from the repo wl5569 ------------------------------------------------------------ revno: 3264 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 17:49:19 +0200 message: wl#5569 MTS fixing corner cases that mtr-testing with mts workers against stardard suites reveal. ------------------------------------------------------------ revno: 3263 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 16:00:28 +0200 message: wl#5569 MTS: refining another assert that can force C to delete events that are skipped with the slave skip counter ------------------------------------------------------------ revno: 3262 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 15:34:02 +0200 message: wl#5569 MTS Correcting an assert that is hit by few tests. ------------------------------------------------------------ revno: 3261 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 13:27:15 +0200 message: wl#5569 MTS merging from the repo. ------------------------------------------------------------ revno: 3260 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 13:25:31 +0200 message: wl#5569 MTS fixing failing tests. ------------------------------------------------------------ revno: 3259 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 20:34:26 +0200 message: wl#5569 MTS merging from the repo. ------------------------------------------------------------ revno: 3258 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 20:31:13 +0200 message: wl#5569 MTS fixing tests failure when mtr runs --mts_slave_parallel_workers != 0. rpl000010 is a representative. Fixed with identifying, marking, running carefully ev->update_pos() and destroying an event that can split a group of events to force part to be in different relay logs. ------------------------------------------------------------ revno: 3257 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 13:57:18 +0200 message: wl#5569 MTS and wl#5599 MTS recovery The general recovery implementation is finished by this patch. Tested against ./mtr rpl_parallel_conf_limits. Warning, ./mtr rpl_parallel_conf_limits rpl_parallel_conf_limits ... can fail at the 2nd etc test because of no removal of Worker tables happens at RESET SLAVE. ------------------------------------------------------------ revno: 3256 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 22:12:30 +0200 message: wl#5569 MTS slave_worker_info def is updated in the system db. ------------------------------------------------------------ revno: 3255 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 21:34:58 +0200 message: merging with repo ------------------------------------------------------------ revno: 3254 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 21:31:29 +0200 message: wl#5569 MTS Recovery routine part I: gathering the group recovery bitmap. ------------------------------------------------------------ revno: 3253 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 22:18:33 +0000 message: WL#5599 Fixed routine to compute the bitmap of executed events. ------------------------------------------------------------ revno: 3252 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 21:37:48 +0200 message: wl#5569 MTS adding checkpoint relay_log_name,pos as necessary part to locate a relay-log for recovery. Tested with rpl_parallel. ------------------------------------------------------------ revno: 3251 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 17:58:58 +0200 message: wl#5569 MTS manual merging from the repo and correcting GAQ processing with introducing a volatile byte to indicate whether an item is busy or released. ------------------------------------------------------------ revno: 3250 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-18 21:00:23 +0200 message: wl#5569 MTS fixing --mts-exp-slave-run-query-in-parallel=1 case when Query-log-event can be run in parallel incl DML and DDL. The feature is `exp'erimental still can be tried while there are no temp tables involved neither a db different than the session's default is modified by the query. Tested: Changes sustain mtr rpl_parallel --mysqld=--mts-exp-slave-run-query-in-parallel=1 --mysqld=--binlog-format=statement ------------------------------------------------------------ revno: 3249 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-17 14:46:15 +0200 message: wl#5569 MTS fixing PB2 failures, incl valgrind issues, long exec time and asserting in a test. ------------------------------------------------------------ revno: 3248 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-17 00:00:47 +0200 message: merge from wl#5569 repo to local branch rpl_sequential opt files are added to avoid mtr give up to process a bulk of unsafe warnings. ------------------------------------------------------------ revno: 3247 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-16 23:41:45 +0200 message: wl#5569 MTS Adding transparent support/fallback to the sequential execution cases of 1. Query-log-event 2. Rows_query_log_event info event Both cases can be fully parallelized in future project. Fixing an issue in move_queue_head() that was surficed as an assert in Slave_worker::slave_worker_group_ends(). Fixing destoying an event by Worker. ------------------------------------------------------------ revno: 3246 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-14 16:46:20 +0200 message: merge from wl5569 repo ------------------------------------------------------------ revno: 3245 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-14 10:57:16 +0200 message: wl#5569 MTS a light cleanup to arrange the option/system var names properly - mts_-prefixing, and _exp prefixing for experimental features needed for benchmarking (mts_exp_slave_local_timestamp) or suppored limitly (mts_exp_slave_run_query_in_parallel for Query-log-event). Fixing GAQ size. It might be too tight e.g in case of the max WQ length of 1; tested with running rpl_parallel supplying --mts-slave-worker-queue-len-max=1. ------------------------------------------------------------ revno: 3244 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 18:53:32 +0200 message: wl#5569 MTS fixing a valgrind stack cauased by extra pfs-keys/cond_var. Those are removed with Alfranio`s consent ------------------------------------------------------------ revno: 3243 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 17:57:01 +0200 message: wl#5569 MTS fixing a set of valgrind warning cauased by a c&p ------------------------------------------------------------ revno: 3242 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 16:52:50 +0200 message: wl#5569 MTS updating results for few tests. ------------------------------------------------------------ revno: 3241 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-11 21:00:47 +0200 message: wl#5569 MTS 1. Fixing recovery related issue of DBUG_ASSERT(rli->get_event_relay_log_pos() >= BIN_LOG_HEADER_SIZE); at slave start with shifting mts_recovery_routine() at front of the assert. 2. Making SKIP-ed event to commit to the central RLI. That is correct since Workers are not executing anything at this time. 3. Fixing the default for mts_checkpoint_period which should not be zero normally. Zero makes sense solely for debugging (so we may stress that through VALID_RANGE(1,...). 4. Introduced a general mts-unsupported error/warning to apply to cases of non-zero parallel workers and a feature that parallelization can't work with. ------------------------------------------------------------ revno: 3240 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-10 18:25:27 +0200 message: merge from wl5569 repo to a local branch ------------------------------------------------------------ revno: 3239 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-10 17:50:03 +0200 message: wl#5569 MTS Improving GAQ in a) limit size to be capable to hold items while all WQ:s are full b) move_queue_head() contained a flaw to make no progress falsely c) never let to enque in GAQ while it's full ------------------------------------------------------------ revno: 3238 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-09 19:46:27 +0200 message: merge from wl5569 repo to a local branch ------------------------------------------------------------ revno: 3237 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-09 19:45:02 +0200 message: wl#5569 MTS Integration with wl#5599 recovery for MTS and fixing two asserts. One is due to missed cleanup of errored-out rows-events; the other is a work-around on w->curr_group_exec_parts->dynamic_ids is initialized to have one partition on the Worker startup, but it should not. ------------------------------------------------------------ revno: 3236 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 13:59:07 +0000 message: WL#5599 Fixed warning messages. ------------------------------------------------------------ revno: 3235 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 12:59:07 +0000 message: WL#5599 Fixed test cases. ------------------------------------------------------------ revno: 3234 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 01:30:32 +0000 message: WL#5599 Fixed failures in test cases. ------------------------------------------------------------ revno: 3233 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 00:33:48 +0000 message: merge mysql-trunk --> mysql-next-mr-wl5569 Conflicts: . mysql-test/r/log_tables_upgrade.result . mysql-test/r/mysql_upgrade.result . mysql-test/r/mysql_upgrade_ssl.result . mysql-test/r/mysqlcheck.result . mysql-test/suite/perfschema/r/pfs_upgrade_lc0.result . mysql-test/suite/rpl/t/disabled.def . mysql-test/suite/sys_vars/r/all_vars.result . mysql-test/t/system_mysql_db_fix40123.test . mysql-test/t/system_mysql_db_fix50030.test . mysql-test/t/system_mysql_db_fix50117.test . sql/log_event.cc . sql/log_event.h . sql/rpl_mi.h . sql/rpl_slave.cc . sql/share/errmsg-utf8.txt ------------------------------------------------------------ revno: 3232 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-07 20:01:39 +0200 message: manual merge with a piece of recovery support on repo. rpl_parallel hits an assert that Alfranio is fixing ------------------------------------------------------------ revno: 3231 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-07 19:35:16 +0200 message: wl#5569 MTS Testing related fixes incl master_pos_wait() support and thereafter replacing sleeps with the functioning sync_slave_with_master; Fixing the limitted Q-log-event parallelization. After the fixing mixture of rows- and Q- transactions can run concurrently. Q-transaction will be treated sequentially by default. ------------------------------------------------------------ revno: 3230 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2010-12-05 22:04:17 +0200 message: wl#5569 WL#5599 MTS & recovery Refining and correcting two wl:s integration. The main achievement is events execution status is consistently recorded into the Worker and the central RL recovery tables. That was tested manually in rather agressive env where IO was used to reconnect randomly and load from Master contained Rotate events. TODO: to fix: rpl.rpl_parallel_conf_limits may not pass to address: Multi-stmt Query-log-event transaction case (see todo in sources). to destruct by Workers their executed events (was deferred until ev->update_pos started working). (Alfranio) to deploy mts_checkpoint_routine() call inside the successful event read branch of next_event(). Otherwise no calling happens when Coord is constanly busy with read/distribute. ------------------------------------------------------------ revno: 3229 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-04 19:14:50 +0200 message: merging from the repo wl5569 ------------------------------------------------------------ revno: 3228 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-04 15:45:02 +0000 message: Added mutex to the checkpoint_routine. ------------------------------------------------------------ revno: 3227 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-03 16:56:11 +0000 message: Implemented periodic checkpoint if parallel slave is enabled. ------------------------------------------------------------ revno: 3226 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-03 10:15:45 +0000 message: Fixed commit_positions() and removed unnecessary checkpoint thread. ------------------------------------------------------------ revno: 3225 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-02 20:13:12 +0200 message: manual merge to wl#5569 tree ------------------------------------------------------------ revno: 3224 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-02 19:46:46 +0200 message: wl#5569 MTS User interface related: set @@global.slave_parallel_workers= `non-zero` following with `START SLAVE` starts slave with so many Worker threads. That is non-zero value is defacto the slave parallel execution mode. Earlier introduced enum enum_slave_exec_mode SLAVE_EXEC_MODE_PARALLEL is withdrawn. Fixes rli->mts_pending_jobs_size statistics which might cause assert-crash otherwise. a silly c&p mistake of relay-log name change notification. Made a little clean-up including relocation of init-ion of workers related stuff into start_slave_workers(). Many changes in tests due to SLAVE_EXEC_MODE_PARALLEL and not only. ------------------------------------------------------------ revno: 3223 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-01 19:08:21 +0200 message: wl#5569 MTS The limit conditions such as WQ len, total WQ:s size related changes. Also a new test file is added. ------------------------------------------------------------ revno: 3222 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-30 16:39:40 +0200 message: merging from from wl#5569 repo containing wl#5599 integration ------------------------------------------------------------ revno: 3221 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-30 16:02:15 +0200 message: wl#5569 MTS Fixing group_relay_log_name change propagation from C to W; Garbage collection in the Partition-to-Worker hash is added with a parameter of how many records in the hash are tolerated w/o checking of the usage counter. Adding C-W synchronization due to: - overall WQ:s data max - hitting the limit of a WQ length Adding Flow Control infrastructure with - level of the hungry Worker forcing Coordinator to distribute eagerly symmetrically a Worker whose load is more than 100 % - hungry level is considered as fed-up. - nap time for C in case all WQ:s lengths are above the level. - a weight param to the base nap as a function of the number of fed-up W:s. TODO: UNTIL to force sequential exec; To fix ROWS_QUERY_LOG_EVENT corner case; to fix commented out // if (!ev) delete ev; after wl#5599 is merged (ev->update_pos() is done). ------------------------------------------------------------ revno: 3220 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-11-27 17:36:50 +0200 message: wl#5569 Providing relay-log name for wl#5599. Protocol of action on the C and W sides is described in rpl_rli_pdb.h. Erroring out in case of parallel exec and ROWS_QUERY_LOG_EVENT. (todo: the native sequential mode for the event needs some revision, in particular `delete ev' shall happen *always* in rli->cleanup_context not in two places as of current). ------------------------------------------------------------ revno: 3219 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-26 23:08:30 +0200 message: wl#5569 MTS Partitioning conflict detection and handling is implemented. A new option to run Query in parallel though incompatibly with Rows- case in that the default db not the actual db:s are used as the partition key. User interface gained the global var and the cmd line opt: slave_run_query_in_parallel (Welcome to the set! :-) ------------------------------------------------------------ revno: 3218 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-andrei timestamp: Fri 2010-11-26 16:15:37 +0000 message: There was a mismatching between the number of fields read and write and by consequence the read was failing for the Slave_worker. ------------------------------------------------------------ revno: 3217 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-25 11:03:54 +0200 message: wl#5569 merging with wl#5599 piece of code ------------------------------------------------------------ revno: 3216 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-25 10:47:39 +0200 message: wl#5569 Converting the prototype time db2w hash to be concurrent; Necessary inruduction of the least occupied Worker notion. It's currently computed as Worker having the least number of distributed partitions. Adding parallel support for Query_log_event; caution: 1. the session/default not the actual db as the key 2. may not have been tested against all use cases (e.g int vars) Fixing slave stop issues. ------------------------------------------------------------ revno: 3215 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-11-22 20:57:13 +0200 message: wl#5569 extinding futher interfaces to wl#5599 with propagating future_event_relay_log_pos to the Worker exec context. ------------------------------------------------------------ revno: 3214 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-11-20 19:23:42 +0200 message: wl#5569 MTS Worker pool start, stop, kills, error out implementation. ------------------------------------------------------------ revno: 3213 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-19 16:51:58 +0200 message: wl#5569 recovery interfaces for wl#5599 implementation. The essence of this patch is to provide GAQ object implimentation and valid life cycle. The checkpoint handler prior to call store methods of wl#5599 is supposed to invoke rli->gaq->move_queue_head(&rli->workers). See a simulation of that near ev->update_pos() of the mail sql thread loop. The checkpoint info is composed as instance of Slave_job_group to reside as rli->gap->lwm. Todo: uncomment + // delete ev; // after ev->update_pos() event is garbage once the real checkpoint has been done. Todo: the real implemention needs to take care of filing Slave_job_group::update_current_binlog as initially so at time of executing Rotate/FD methods. + // experimental checkpoint per each scheduling attempt + // logics of next_event() + + rli->gaq->move_queue_head(&rli->workers); ------------------------------------------------------------ revno: 3212 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-18 16:50:54 +0200 message: wl#5569 wl#5599 Recovery related. Prototyping the worker RLI instantiation, to be elaborated on. ------------------------------------------------------------ revno: 3211 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-18 16:00:52 +0200 message: wl#5569 MTS Extending the wl#5563 prototype gradually. This commit addresses: 1. recovery interface (a new Worker rli plus rli->gaq and pseudo-code for checkpoint to update GAQ and the central RLI recovery table. Wrt rli, C and W execute do_apply_event(c_rli) where c_rli is the central instance. C executes update_pos(c_rli), but W update_pos(w_rli). others: - decreased processing time for rpl_parallel, serial. ------------------------------------------------------------ revno: 3210 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2010-11-14 11:55:32 +0000 message: Post-push fix for WL#5599. ------------------------------------------------------------ revno: 3209 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-12 17:58:12 +0000 message: Post-push fix for WL#5599. ------------------------------------------------------------ revno: 3208 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-11 11:53:01 +0000 message: WL#5599 The patch changed the handler's functions, i.e. init_info, check_info, flush_info, remove_info and end_info and the related private member functions, in both file and table handlers, to accept an index that identifies the information that will be read or written. This is necessary now because the handlers will be used by the workers to read and write information from file(s) and table and there may be several workers running at the same time and thus an index is used to identify the worker that is accessing information. This change is also necessary for the multi-master replication as information from each master must be uniquely identified. ------------------------------------------------------------ revno: 3207 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-11-10 10:57:13 +0000 message: Refactory to start work on WL#5599. ------------------------------------------------------------ revno: 3206 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 13:34:18 +0000 message: Removed mysql-test/collections/mysql-next-mr.crash-safe.* in the WL#5569. ------------------------------------------------------------ revno: 3205 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 13:04:14 +0000 message: merge mysql-next-mr.crash-safe --> mysql-next-mr-wl5569 Conflicts: . sql/CMakeLists.txt . sql/Makefile.am . sql/sql_class.h . sql/rpl_slave.cc ------------------------------------------------------------ revno: 3204 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 11:39:37 +0000 message: merge mysql-next-mr-wl5563-labs --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3203 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 23:33:37 +0300 message: wl#5563 simplifying memory handling for the Coor-Workers transport to avoid sporadic crashes ------------------------------------------------------------ revno: 3202 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 21:19:56 +0300 message: wl#5563 leaving out a fine garbage collection. That task is unnessary to solve at prototyping time. Update-pos routine to be implemented is going to eliminated that piece of code ------------------------------------------------------------ revno: 3201 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 20:38:35 +0300 message: wl#5563 Extending the tests base to split the former rpl_parallel into two to run in serial exec mode as well. ------------------------------------------------------------ revno: 3200 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 11:49:00 +0300 message: wl#5563 improved test; fixed a delete issue that was used to crash; added @@global.slave_local_timestamp to fill in timestamp col slave clock value. Performance growth can be seen through the test. todo: merge with Alfranio work on hashing and dyn alloc of PFS obj:s. ------------------------------------------------------------ revno: 3199 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Wed 2010-09-15 14:51:49 +0300 message: wl#5563 tests for the wl. Number of workers and iterations can be tuned. todo: convert as param:s to pass to the test through mtr ------------------------------------------------------------ revno: 3198 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Mon 2010-09-13 18:22:41 +0300 message: wl#5563 adding an ingeneous no-stress-attempting-yet test that also fired an assert. Refined the Worker instance ref computing because cleanup_context() is executed by the sql-thread the coordinator as well ------------------------------------------------------------ revno: 3197 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Mon 2010-09-13 13:15:38 +0300 message: wl#5563 Rows-event parallelization basically is implemented although tested shallowly. Write access to rli central stuct by workers may not be eliminated fully at this phase. E.g that relates to errors. todo: to prove rli gets out of Worker scope todo: to provide a stress test ------------------------------------------------------------ revno: 3196 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Sat 2010-09-11 17:00:08 +0300 message: wl#5563 adding Rows-event limitted to one Worker support. Deferred deletion did not check emptyness of the list ------------------------------------------------------------ revno: 3195 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-10 20:36:07 +0300 message: wl#5563 correcting comments to indicate less limitations ------------------------------------------------------------ revno: 3194 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-10 20:32:39 +0300 message: WL#5563 Prototype for Slave parallelized by db name More progress to the WL in that the STMT binlog-format works while the conceptual limits are held. That is no query/transaction is allowed to deal with more than one db. Addressed a complication in that update pos method that is run by Coordinator belongs to Log_event hierarchy and therefore the event deletion now by Worker must be careful. Todo: 1. (High prior) fix Row-format complications 2. (Hight prior) Elaborate on the hash function to be a function on db text name 3. (Optional) Consider moving update_pos to the RLI class to get rid of the delete logics complication. How-to-use: The instuction can be found in comments of the previous commit, see there for more details. In brief though, the db names have to follow a pattern: `test[0-9]'. E.g test0, test1, test2, test3 for the default four Worker threads. Slave side has to set @@global.slave_exec_mode=PARALLEL; before START SLAVE. ------------------------------------------------------------ revno: 3193 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Thu 2010-09-09 21:43:16 +0300 message: WL#5563 Prototype for Slave parallelized by db name This is an intermediate commit that indicates some progress. Namely, the worker pool operates correctly and with signs of scalable performance. How to test: connection master; set @@global.binlog_format=statement; connection slave; set @@global.slave_exec_mode=PARALLEL; set @@global.binlog_format=mixed; show processlist; => IO, SQL threads + 4 workers by default change master to ... connection master; create database test0; create database test1; create database test2; create database test3; # create databases with magic names "test[0-9]+", where the number will index # a worker. create database test0; create database test1; create database test2; create database test3; # create tables. they are only of MyISAM type for now use test0; create table tm_1(a int, b int) engine=myisam; use ... # DML on tables: use test[0-3]; insert into tm_1 values (1,0); ... ... connection slave; # monitor CPU (visually this time: top etc) # check correctness e.g select count(*) from test[0-3].tm_1; connection master; select count(*) from test[0-3].tm_1; ****** ------------------------------------------------------------ revno: 3364 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 17:09:22 +0300 message: wl#5569 MTS Refining rpl_rotate_logs that could not produce deterministic output. The list of binlogs contained one binlog more than expected. ------------------------------------------------------------ revno: 3363 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 14:56:01 +0300 message: updating result files that were left incorrect by the last merge. ------------------------------------------------------------ revno: 3362 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 14:44:59 +0300 message: wl#5569 MTS Failure in recovery when binlog-checksum is active. The reason of the failure was in that parsing of relay log by MTS recovery gaps computing did not make sure to use the relay-log own FormatDescriptor events that contain checksumming info for all events in the log. Fixed with taking care to find out the checksum algorithm for every relay log as the first step of MTS recovery gaps computing. ------------------------------------------------------------ revno: 3361 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-08-17 11:21:23 +0300 message: merge from trunk forced to resolve few semantical conflicts caused by changes in THD::enter_cond() of the trunk. ------------------------------------------------------------ revno: 3360 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-27 08:56:14 +0100 message: Fixed failure in test rpl_mts_check_concurrency when running in the mts collection. ------------------------------------------------------------ revno: 3359 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-26 19:46:41 +0100 message: Added a test case that checks if MTS allows to concurrently access the replication tables, and as such, concurrently commit transactions that update different databases. ------------------------------------------------------------ revno: 3358 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 20:08:43 +0100 message: Configured rpl_parallel_switch_sequential to run in row and mixed mode to avoid cluttering the error log with messages on unsafe execution. ------------------------------------------------------------ revno: 3357 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 19:02:14 +0100 message: This patch contains the following fixes: . Removed suppressed warning introduced in the wrong test case (i.e. rpl_corruption) and put it in the correct one (i.e. rpl_row_corruption). . Introduced variable to avoid clutering the error log with several warning messages on unsafe execution. ------------------------------------------------------------ revno: 3356 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 11:01:12 +0100 message: This patch has the following changes: . Specific directories were created for the MTS runs in the default.push. . Warning message was suppressed in the rpl_corruption.test. . Annoying debug outputs were removed from the error log. However, this is a temporary solution as it forbids to enable traces. ------------------------------------------------------------ revno: 3355 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-20 11:56:40 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3354 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-19 22:26:30 +0300 message: wl#5569 MTS valgrind reported a stack on rpl_savepoint. The problem appears to be in that at computing slave_sql_running_state in show_mater_info() the sql thread proc_info pointer could refer to a value in a stack that has already gone. Fixed with making proc_info to point to a string literal. ------------------------------------------------------------ revno: 3353 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-19 17:46:43 +0100 message: Suppressed warning messages that could potentially cause problems while running mts crash safe test cases. ------------------------------------------------------------ revno: 3352 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-18 21:46:45 +0300 message: wl#5569 MTS Cosmetic changes are done to address readability and clearness of source code of the MTS patch. ------------------------------------------------------------ revno: 3351 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-18 14:52:44 +0300 message: wl#5569 MTS Inadvertently introduced hunk two rev:s back is reverted to please rpl_*_mts_crash_safe. ------------------------------------------------------------ revno: 3350 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-17 00:51:45 +0300 message: wl#5569 MTS fixing build issue for embedded. Public visibility for Rows_log_event::do_apply_event() is restored. ------------------------------------------------------------ revno: 3349 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-16 20:08:31 +0300 message: wl#5569 MTS The patch contains improvements after code review. Changes are mostly consmetic. ------------------------------------------------------------ revno: 3348 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-16 02:11:11 +0300 message: bug#12755663 MTS: RPL_CIRCULAR_FOR_4_HOSTS FAILS: CANT EXECUTE THE CURRENT EVENT GROUP MTS stopped with an error in the middle of the test. The reason is that a group of events from the slave itself was processed partly to modify the group position. In the following restart the wrong group bondary made slave either to error out or assert. Fixed with locating a possible race condition allowin Coordinator to ignore actual failed status of a Worker. So in the case of the test, the slave server group can't be started. Notice, this is a trial patch since I can't catch the failure on available to me hosts at all. ------------------------------------------------------------ revno: 3347 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-07-14 12:40:06 +0300 message: WL#5569 MTS further extensive rpl_circular_for_4_hosts exersices with --repeat 10 --parallel=8 revealed a race condition in that Coordinator might miss to catch not-running status for a Worker. That made Coordinator to skip only a part of a group of the slave server own events so the slave stops at not the bondary of a group. Fixed with moving marking of the errored-out Worker as failed prior to its APH entries release. TODO: notice there can be a possibility to stop at not the boundary due to graceful STOP SLAVE if one is run at time of skipping self-originated events. However this issue belongs to STS and might be similar with BUG@12604951 and BUG@12728160. ------------------------------------------------------------ revno: 3346 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-07-14 08:03:55 +0100 message: Post-push fixes for WL#5569 Injecting faults while updating a myisam table requires to flush the changes before committing suicide. So we have introduced the follwing code: DBUG_EXECUTE_IF("crash_after_commit_and_update_pos", - DBUG_SUICIDE();); + sql_print_information("Crashing crash_after_commit_and_update_pos."); + flush_info(TRUE); + DBUG_SUICIDE(); Besides we improved some comments. ------------------------------------------------------------ revno: 3345 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-13 16:23:57 +0100 message: WL#5569 ------------------------------------------------------------ revno: 3344 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-13 00:10:43 +0300 message: wl#5569 MTS merge trunk -> wl5569-tree ------------------------------------------------------------ revno: 3343 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-12 23:36:17 +0300 message: wl#5569 MTS adding suppression due to expected warning to rpl_circurlar_for_4_hosts; decreasing a loop limit in rpl_parallel_switch_sequential in case of statement format. ------------------------------------------------------------ revno: 3342 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-12 14:46:23 +0300 message: WL#5569 MTS Fixing code and test due to rpl.rpl_circular_for_4_hosts mismatch failure, like http://pb2.norway.sun.com/?action=archive_download&archive_id=3608382. The reason of the mismatch was that when having two group of events to execute, the first for a Worker and the 2nd for Coordinator, Coordinator waited for the 1st group completion but did not verify success of synchronization. So in a case of the failed applying of the 1st group processing of the 2nd could find an inconsistent state to end up with a segfault (even though only the mismatch has been seen so far). ------------------------------------------------------------ revno: 3341 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-10 22:40:01 +0100 message: Avoiding busy waiting when running mts recovery tests. ------------------------------------------------------------ revno: 3340 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-09 23:11:58 +0100 message: Removed --slave-checkpoint-period from MTS test cases. ------------------------------------------------------------ revno: 3339 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-09 23:08:07 +0100 message: Improved test cases for the WL#5569. ------------------------------------------------------------ revno: 3338 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 22:40:52 +0300 message: wl#5569 MTS The patch refines logics of applying phase of MTS-recovery to always applying events that are for Coordinator; fixes few tests to make them passable on PB; makes GAQ size to be of checkpoint_group value. ------------------------------------------------------------ revno: 3337 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 07:54:34 +0100 message: Reduced the timeout period to run the checkpoint routine by setting slave-checkpoint-period to 30. ------------------------------------------------------------ revno: 3336 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 07:44:35 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3335 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-06 12:46:05 +0300 message: wl#5569 MTS refining wait for db-hash entry release at event distribution. The graceful STOP is not accepted at this point so Coordinator continues to stay in a loop. ------------------------------------------------------------ revno: 3334 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-05 20:43:04 +0300 message: bug#12719875 possible MTS recovery issue. MTS stopped with an error after failing to apply an event. It turned out that the event was sceduled incorrectly due to earlier stop by Single-Threaded Slave not at the group boundary but rather in the middle of it. Fixed with forcing CREATE..SELECT be logged as two groups. The CREATE-TABLE group is surrounded with its own BEGIN/COMMIT braces. ------------------------------------------------------------ revno: 3333 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-04 18:14:09 +0300 message: wl#5569 MTS Adding a rule to run PB with all suites in MTS with binlog-format ROW. ------------------------------------------------------------ revno: 3332 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-03 23:29:34 +0300 message: wl5569 MTS cleanup in one file. ------------------------------------------------------------ revno: 3331 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-03 23:16:02 +0300 message: wl5569 MTS bzr commit mail address changed; a minor cleanup to make mts_is_worker() with const argument; releasing a test to run in MTS. ------------------------------------------------------------ revno: 3330 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-02 08:58:56 +0100 message: Fixed use of the performance schema in the replication code and concurrency issue in the IO Thread. In particular, the IO Thread was calling flush_master_info without grabbing locks. ------------------------------------------------------------ revno: 3329 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 16:41:35 +0300 message: wl5569 MTS merging from the main repo. ------------------------------------------------------------ revno: 3328 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 15:48:25 +0300 message: wl#5569 MTS the final cleanup patch. There are few glitches that were considered as tolerable at least for time of the total wl's code is being reviewed. That includes: - no support to old load-data events - no support for FK to add to the list, there are few places in the patch that suggests to deploy error branches each time flush_info() is called. ------------------------------------------------------------ revno: 3327 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 13:16:52 +0300 message: wl#5569 MTS The patch cleans up some host of code. ------------------------------------------------------------ revno: 3326 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-28 11:30:18 +0300 message: wl#5569 MTS replacing views with regular tables for consistency verification in rpl_parallel_innodb. Also a minor cleanup in rpl_parallel is done. ------------------------------------------------------------ revno: 3325 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-27 20:31:45 +0300 message: wl#5569 MTS Cleanup and addressing sporadic rpl_temp_table_mix_row failure in post-execution mtr.check_testcase(). The check of the test failure was caused by faulty optimization in avoiding to migrate temporary tables from Coordinator to Workers in case of rows-event assignement. while it's correct with the homogenous rows-event only load, the mixture can fail. Fixed with removing the optimization so map_db_to_worker() always relocates which is somewhat suboptimal and should be improved in future. ------------------------------------------------------------ revno: 3324 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-27 13:12:52 +0100 message: Ensured that updates to the worker_info_repository are transactional and fixed the slave_checkpoint_group_basic test case. ------------------------------------------------------------ revno: 3323 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-26 13:02:59 +0100 message: Fixed test case. ------------------------------------------------------------ revno: 3322 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-06-25 15:14:24 +0100 message: Introduced test case for recovery with MTS and fixed bugs in recovery. ------------------------------------------------------------ revno: 3321 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 15:38:19 +0300 message: wl#5569 MTS This patch makes a bit of cleanup, addresses one memory-allocation todo and completes fixing valgrind report (rpl_parallel_start_stop) due to strings allocation in Slave_job_group items. ------------------------------------------------------------ revno: 3320 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 12:38:34 +0300 message: wl#5569 MTS this patch completes the previous one to fixes a result file and make the innodb specific test verification to base on tables not views. ------------------------------------------------------------ revno: 3319 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 00:11:22 +0300 message: wl#5569 MTS this is an exploratory patch to sort out if verification method what was based on views has its own not related to mts flaw. The patch calls verification macro on the tables that required some adjustment. ------------------------------------------------------------ revno: 3318 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-23 07:56:15 +0300 message: wl#5569 MTS fixing results of mysqld--help-win. ------------------------------------------------------------ revno: 3317 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 19:20:40 +0100 message: merge mysql-next-mr-wl5569 (local) --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3316 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 19:17:43 +0100 message: In some platforms, such as Windows, thread's wait time is stored in 100ns units. However, when computing the difference between two values, the result value was not multiplied by 100. Besides, there was a casting problem when the aforementioned result value was assigned to an ulong. ------------------------------------------------------------ revno: 3315 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 18:54:23 +0100 message: Fixed how mts copes with recovery. ------------------------------------------------------------ revno: 3314 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-21 19:10:54 +0300 message: wl#5569 MTS Fixing valgrind warnings. ------------------------------------------------------------ revno: 3313 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-21 18:15:43 +0300 message: wl#5569 MTS rpl_parallel_start_stop.test could fail sporadicaly with timeout. ------------------------------------------------------------ revno: 3312 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 23:21:56 +0100 message: merge mysql-next-mr-wl5569 (local) --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3311 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 23:19:06 +0100 message: Fixed error when computing the Lower-Water-Mark. If two or more jobs were removed from the Group of assigned jobs and one of the jobs had a non-empty group relay log but the last one had an empty group relay log. The Lower-Water-Mark was not correctly updated, because the algorithm assumed that the group relay log was null. ------------------------------------------------------------ revno: 3310 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 11:52:44 +0100 message: Fixed valgrind errors. Slave_job_group was silently being cast to LOG_POS_COORD while calling sort_dynamic(&above_lwm_jobs, (qsort_cmp) mts_event_coord_cmp) and by consequence mts_event_coord_cmp(LOG_POS_COORD *, LOG_POS_COORD *). This had two problems: . The first two entries in the Slave_job_group were not a pointer to a char * and my_offset. . Even if the first two entries were char * and my_offset, such casting could lead to alignment problems. To fix the problem, we avoid this casting. ------------------------------------------------------------ revno: 3309 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 19:14:50 +0300 message: wl#5569 MTS fixing slave_transaction_retries_basic_64.result ------------------------------------------------------------ revno: 3308 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 16:11:25 +0300 message: wl#5569 MTS fixing tests. ------------------------------------------------------------ revno: 3307 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 12:33:36 +0300 message: wl#5569 MTS Fixing rpl.rpl_mixed_binlog_max_cache_size that revealed incorrect asynchronous handling of a Rotate event which does not split the current group and therefore has to be executed after all previously scheduled events. Fixing sensetivity of two other tests to mtr's invocation environment that includes inital values of slave_parallel_workers and slave_transaction_retries. ------------------------------------------------------------ revno: 3306 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 09:04:19 +0100 message: Fixed some windows failures. ------------------------------------------------------------ revno: 3305 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-06-18 19:58:21 +0100 message: Fixed some recovery issues. ------------------------------------------------------------ revno: 3304 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 21:01:58 +0300 message: wl#5569 MTS fixing tests and a segfault at the end of handle_slave_sql() happened after worker initialization failed (e.g rpl_row_log on win). ------------------------------------------------------------ revno: 3303 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 18:34:16 +0300 message: wl#5569 MTS fixing tests. ------------------------------------------------------------ revno: 3302 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 14:00:41 +0300 message: wl#5569 MTS fixing rpl_row_basic_3innodb similarly to the previous patch. ------------------------------------------------------------ revno: 3301 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 13:51:59 +0300 message: wl#5569 MTS fixing few tests. 1. Policy is implemented for reacting with a warning in a case of failing worker leaves the total slave state with gaps thereby inconsistent. 2. Two tests that were used to time out due to reset master/slave was disabled in there. ------------------------------------------------------------ revno: 3300 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 02:24:59 +0100 message: Removed unnecessary test cases and augment others in order to test recovery. ------------------------------------------------------------ revno: 3299 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-16 19:46:22 +0300 message: wl#5569 MTS fixing slave_parallel_workers_basic and rpl_stop_middle_group which cant run in MTS ------------------------------------------------------------ revno: 3298 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-16 11:29:53 +0300 message: wl#5569 MTS adding new tests to sys_vars.\ ------------------------------------------------------------ revno: 3297 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:41:32 +0100 message: WL#5569 Adding a global suppression for the warning that may appear when stopping the slave sql thread in the middle of a group. This should affect MTS mode only. ------------------------------------------------------------ revno: 3296 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:40:41 +0100 message: WL#5569 Renames worker-info-repository to slave-worker-info-repository in some tests option files. ------------------------------------------------------------ revno: 3295 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:32:37 +0100 message: WL#5569 More test fixes. Removing remaining prefixes 'mts' from mts variables, which have been renamed recently. ------------------------------------------------------------ revno: 3294 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 00:27:20 +0100 message: WL#5569 Fixing rpl_parallel result file. ------------------------------------------------------------ revno: 3293 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:41:33 +0300 message: wl#5569 MTS correcting --slave-parallel-workers in few more files ------------------------------------------------------------ revno: 3292 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:31:46 +0300 message: wl#5569 MTS correcting --slave-parallel-workers in collections/default.push ------------------------------------------------------------ revno: 3291 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:12:11 +0300 message: wl#5569 MTS Cleanup, including 1. decreasing number and renaming system variables. Important for debugging command line options are replaced with reasonble constant values and only necessary are retained. 2. Small encapsulation in ha_blackhole.cc is done. ------------------------------------------------------------ revno: 3290 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 15:59:23 +0100 message: Fixed replication valgring failures caused by the MTS. ------------------------------------------------------------ revno: 3289 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 21:23:13 +0300 message: wl#5569 MTS wl#5754 Query event parallel execution Fixing failing tests and a failure in gathering accessed databases that was caused by a recent merge from trunk. ------------------------------------------------------------ revno: 3288 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 13:35:20 +0300 message: merge from trunk ------------------------------------------------------------ revno: 3287 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 12:27:38 +0300 message: wl#5569 MTS Fixing failing tests due to a. a flaw in `isolated parallel' mode implementation. Isolation applies to a group of event rather than to an instance. And event that contains over-max accessed db:s or event from Old master trigger marking the current being scheduled group. Such group will be executed having all prior scheduled done and nomore will be scheduled until the group is done. b. Notification to Coordinator about errored-out Worker is corrected. ------------------------------------------------------------ revno: 3286 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-12 22:33:32 +0300 message: wl#5569 MTS making default.push to run rpl suite with non-default --mts-slave-parallel-workers > 0 in all three format/mode (row,stmt, mixed). The default is run for all suites in mixed mode and rpl suites with row+ps, stmt formats. ------------------------------------------------------------ revno: 3285 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-12 22:05:05 +0300 message: wl#5569 MTS manual merge with few fixes for segfault of the last merge from the trunk etc, compilation issue on embedded. ------------------------------------------------------------ revno: 3284 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-09 18:35:59 +0100 message: Post-fixes for merge. Fixed compilation in Windows and removed an used options. ------------------------------------------------------------ revno: 3283 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-09 16:27:47 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3282 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-06 13:51:19 +0300 message: wl#5569 MTS STOP SLAVE now stops consistently w/o gaps, KILL shall be used for an urgent stop, an error case behaves like the killed. For instance, a Worker errors out, it sends KILL to Coordinator through THD::awake(), and Coordinator kill the rest through setting a special Worker-running status to killed (which breaks the read-exec loop of a Worker). ------------------------------------------------------------ revno: 3281 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-05 20:01:51 +0300 message: wl#5569 MTS More cleanup, fixes due to found issues when running tests, some improvements incl in stopping Workers to make routine to distinguish between killed and gracefully stopped cases so in the end STOP SLAVE will guarantee consistent state (some todo remains still). ------------------------------------------------------------ revno: 3280 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-05-30 13:05:07 +0300 message: WL#5569 MTS WL#5754 Query event parallel applying ----------------------------------------------------------------- Aggregating 7 commits that are not pushed yet to the wl5569 repo. Find comments for each cset below. ------------------------------------------------------------------ The current patch addresses concurrent updating slave_open_temp_tables status counter. The former declaration of the underlying server variable is changed from ulong to int32. While that might affect (shrink) the actual range, there has been no specified range and now after the number of bits is the same on all platforms the range cat be set to be [0, max(int32)] ****** wl#5569 MTS Wl#5754 Query event parallel appying wl#5599 MTS recovery The patch includes some cleanup, including one for temp tables support, realization of few todo:s. ****** wl#5569 MTS wl#5754 Query event parallel applying More cleanup is done; Fixing temp tables manipulation. Asserting an impossible to support use case of group of events not wrapped with BEGIN/COMMIT. Todo: recognize old master binlog to refuse to run in parallel. ****** wl#5569 MTS Implementation of giving out the applier role to Worker for all cases but ones dealing with the Coordinators state. That includes Query event with over-max-db:s and Load-data related events. The current patch also makes old master binlog be handled by MTS though sometimes e.g for Query event to switch to the sequential mode. Fixing a race condition making C to wait endlessly if a Worker has exitted due to its applying error. ****** wl#5569 MTS correcting an assert that was used to fire as warned in the previous commit. Parallel feature tests pass now. ****** wl#5569 MTS This patch contains cleanup and simplification of logics of handling some events sequentially by Coordinator and adds memory-allocation failure branch to workers starting routine. ****** wl#5569 MTS An intermediate patch to address few issues raised by reviewers. To sum up, it's about cleanup and logics simplification of event distribution to Worker and consequent actions. Some efforts were paid to support Old Master Begin-less group of events. ------------------------------------------------------------ revno: 3279 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-05-24 17:29:35 +0300 message: WL#5569 MTS WL#5754 Query parallel appying Changing implementation of temporary tables support in MTS. Cleanup, fixing few todo:s and few potential issues found. ------------------------------------------------------------ revno: 3278 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-05-19 12:36:28 +0300 message: wl#5569 MTS Support for ROWS_QUERY_LOG_EVENT is added. It required refactoring of its handling in the canonical sequential mode. The event life suggests its behavior similar to objects associated with Table_map, in particural, its destoying to occur at the end-of-statement time. Tested against existing ROWS_QUERY_LOG_EVENT feature tests incl rpl_row_ignorable_event in both sequential and parallel mode. ------------------------------------------------------------ revno: 3277 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-05-16 22:43:58 +0300 message: wl#5569 MTS Simplifying Coordinator-Worker interfaces. In essence after this patch Worker execute events in its private context (class Slave_worker :public Relay_log_info). The only exception is Query referring to temporary table. The temp:s are maintained in the Coordinator's "central" rli; removing some dead code; performing a lot of cleanup. There are few todo items incl: 1. To implement several todo:s scattered across MTS' code and tests (e.g to restore protected for few members of RLI of rpl_rli.h); 2. to cover Rows_query_log_event that currently can cause hanging (e.g rpl_parallel_fallback) 3. To sort out names of classes based on Rpl_info, possibly remove Rpl_info_worker ------------------------------------------------------------ revno: 3276 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-05-06 21:33:32 +0300 message: wl#5569 MTS improving benchmarking test. ------------------------------------------------------------ revno: 3275 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-04-06 15:51:58 +0300 message: wl#5569 MTS Statistics for Workers and Coordinator incl waiting times, sleeping is reported now into the error log as slave stopping time. ------------------------------------------------------------ revno: 3274 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-04-05 19:26:37 +0300 message: wl#5569 MTS restoring previous 4 default workers that rpl_parallel works with. ------------------------------------------------------------ revno: 3273 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-04-03 13:07:30 +0300 message: wl#5569 MTS Benchmarking related patch uniforms rpl_parallel to be run with arbitrary number of workers, db:s, tables, etc. TODO: to restore the final constinency check which is given out temporary while i could not find a way to leave it surrounded with a --dis/en-able* stanza. ------------------------------------------------------------ revno: 3272 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-04-02 14:32:02 +0300 message: wl#5569 MTS a test file for benchmarking is added. Benchmarking results can be gained by extracting the master side generating and the slave side applying times like in the following loop: workers=6; for n in `seq 1 3`; do echo; echo loop: $n; echo; my_mtr.sh --mysqld=--mts-slave-parallel-workers=$workers \ rpl_parallel_benchmark --mysqld=--binlog-format=statement \ && cat /dev/shm/var/mysqld.2/data/test/delta.out >> p${workers}_stmt.out 2>&1; done ------------------------------------------------------------ revno: 3271 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-03-30 17:11:24 +0300 message: wl#5754 Query event parallel execution Small cleanup for comments as requested by reviewer. ------------------------------------------------------------ revno: 3270 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-02-27 19:35:25 +0200 message: WL#5754 Query event parallel execution Bundling together implementation the whole DML+DDL Query parallel support. That includes: The earlierst four rev:s to cut off the DML stage of the parallel query project from the following devoted to DDL. The four skeleton parallel applying of Queries containing a temporary table, and implement a core of the design that is the DML queries. Queries can contain arbitrary features including temp tables. The DDL part also refined few items related to the general low-level design. In particular, of the mark of the over-max db:s in the updated-db:s status var is turned to be another new constant value. The very last patch to the bundle addresses the last review mail notes. ------------------------------------------------------------ revno: 3269 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-01-12 01:01:02 +0200 message: merging from mysql-trunk ------------------------------------------------------------ revno: 3268 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-01-12 00:54:12 +0200 message: wl#5569 MTS fixing the worker threads start/stop. ------------------------------------------------------------ revno: 3267 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-27 18:54:41 +0000 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3266 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-24 01:57:03 +0200 message: wl#5569 MTS the timed-wait loop of SQL thread required a break-through parameter in case the signal missed in action and just timeout would be reported ------------------------------------------------------------ revno: 3265 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 19:03:42 +0200 message: merging from the repo wl5569 ------------------------------------------------------------ revno: 3264 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 17:49:19 +0200 message: wl#5569 MTS fixing corner cases that mtr-testing with mts workers against stardard suites reveal. ------------------------------------------------------------ revno: 3263 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 16:00:28 +0200 message: wl#5569 MTS: refining another assert that can force C to delete events that are skipped with the slave skip counter ------------------------------------------------------------ revno: 3262 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 15:34:02 +0200 message: wl#5569 MTS Correcting an assert that is hit by few tests. ------------------------------------------------------------ revno: 3261 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 13:27:15 +0200 message: wl#5569 MTS merging from the repo. ------------------------------------------------------------ revno: 3260 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 13:25:31 +0200 message: wl#5569 MTS fixing failing tests. ------------------------------------------------------------ revno: 3259 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 20:34:26 +0200 message: wl#5569 MTS merging from the repo. ------------------------------------------------------------ revno: 3258 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 20:31:13 +0200 message: wl#5569 MTS fixing tests failure when mtr runs --mts_slave_parallel_workers != 0. rpl000010 is a representative. Fixed with identifying, marking, running carefully ev->update_pos() and destroying an event that can split a group of events to force part to be in different relay logs. ------------------------------------------------------------ revno: 3257 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 13:57:18 +0200 message: wl#5569 MTS and wl#5599 MTS recovery The general recovery implementation is finished by this patch. Tested against ./mtr rpl_parallel_conf_limits. Warning, ./mtr rpl_parallel_conf_limits rpl_parallel_conf_limits ... can fail at the 2nd etc test because of no removal of Worker tables happens at RESET SLAVE. ------------------------------------------------------------ revno: 3256 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 22:12:30 +0200 message: wl#5569 MTS slave_worker_info def is updated in the system db. ------------------------------------------------------------ revno: 3255 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 21:34:58 +0200 message: merging with repo ------------------------------------------------------------ revno: 3254 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 21:31:29 +0200 message: wl#5569 MTS Recovery routine part I: gathering the group recovery bitmap. ------------------------------------------------------------ revno: 3253 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 22:18:33 +0000 message: WL#5599 Fixed routine to compute the bitmap of executed events. ------------------------------------------------------------ revno: 3252 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 21:37:48 +0200 message: wl#5569 MTS adding checkpoint relay_log_name,pos as necessary part to locate a relay-log for recovery. Tested with rpl_parallel. ------------------------------------------------------------ revno: 3251 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 17:58:58 +0200 message: wl#5569 MTS manual merging from the repo and correcting GAQ processing with introducing a volatile byte to indicate whether an item is busy or released. ------------------------------------------------------------ revno: 3250 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-18 21:00:23 +0200 message: wl#5569 MTS fixing --mts-exp-slave-run-query-in-parallel=1 case when Query-log-event can be run in parallel incl DML and DDL. The feature is `exp'erimental still can be tried while there are no temp tables involved neither a db different than the session's default is modified by the query. Tested: Changes sustain mtr rpl_parallel --mysqld=--mts-exp-slave-run-query-in-parallel=1 --mysqld=--binlog-format=statement ------------------------------------------------------------ revno: 3249 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-17 14:46:15 +0200 message: wl#5569 MTS fixing PB2 failures, incl valgrind issues, long exec time and asserting in a test. ------------------------------------------------------------ revno: 3248 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-17 00:00:47 +0200 message: merge from wl#5569 repo to local branch rpl_sequential opt files are added to avoid mtr give up to process a bulk of unsafe warnings. ------------------------------------------------------------ revno: 3247 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-16 23:41:45 +0200 message: wl#5569 MTS Adding transparent support/fallback to the sequential execution cases of 1. Query-log-event 2. Rows_query_log_event info event Both cases can be fully parallelized in future project. Fixing an issue in move_queue_head() that was surficed as an assert in Slave_worker::slave_worker_group_ends(). Fixing destoying an event by Worker. ------------------------------------------------------------ revno: 3246 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-14 16:46:20 +0200 message: merge from wl5569 repo ------------------------------------------------------------ revno: 3245 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-14 10:57:16 +0200 message: wl#5569 MTS a light cleanup to arrange the option/system var names properly - mts_-prefixing, and _exp prefixing for experimental features needed for benchmarking (mts_exp_slave_local_timestamp) or suppored limitly (mts_exp_slave_run_query_in_parallel for Query-log-event). Fixing GAQ size. It might be too tight e.g in case of the max WQ length of 1; tested with running rpl_parallel supplying --mts-slave-worker-queue-len-max=1. ------------------------------------------------------------ revno: 3244 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 18:53:32 +0200 message: wl#5569 MTS fixing a valgrind stack cauased by extra pfs-keys/cond_var. Those are removed with Alfranio`s consent ------------------------------------------------------------ revno: 3243 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 17:57:01 +0200 message: wl#5569 MTS fixing a set of valgrind warning cauased by a c&p ------------------------------------------------------------ revno: 3242 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 16:52:50 +0200 message: wl#5569 MTS updating results for few tests. ------------------------------------------------------------ revno: 3241 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-11 21:00:47 +0200 message: wl#5569 MTS 1. Fixing recovery related issue of DBUG_ASSERT(rli->get_event_relay_log_pos() >= BIN_LOG_HEADER_SIZE); at slave start with shifting mts_recovery_routine() at front of the assert. 2. Making SKIP-ed event to commit to the central RLI. That is correct since Workers are not executing anything at this time. 3. Fixing the default for mts_checkpoint_period which should not be zero normally. Zero makes sense solely for debugging (so we may stress that through VALID_RANGE(1,...). 4. Introduced a general mts-unsupported error/warning to apply to cases of non-zero parallel workers and a feature that parallelization can't work with. ------------------------------------------------------------ revno: 3240 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-10 18:25:27 +0200 message: merge from wl5569 repo to a local branch ------------------------------------------------------------ revno: 3239 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-10 17:50:03 +0200 message: wl#5569 MTS Improving GAQ in a) limit size to be capable to hold items while all WQ:s are full b) move_queue_head() contained a flaw to make no progress falsely c) never let to enque in GAQ while it's full ------------------------------------------------------------ revno: 3238 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-09 19:46:27 +0200 message: merge from wl5569 repo to a local branch ------------------------------------------------------------ revno: 3237 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-09 19:45:02 +0200 message: wl#5569 MTS Integration with wl#5599 recovery for MTS and fixing two asserts. One is due to missed cleanup of errored-out rows-events; the other is a work-around on w->curr_group_exec_parts->dynamic_ids is initialized to have one partition on the Worker startup, but it should not. ------------------------------------------------------------ revno: 3236 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 13:59:07 +0000 message: WL#5599 Fixed warning messages. ------------------------------------------------------------ revno: 3235 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 12:59:07 +0000 message: WL#5599 Fixed test cases. ------------------------------------------------------------ revno: 3234 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 01:30:32 +0000 message: WL#5599 Fixed failures in test cases. ------------------------------------------------------------ revno: 3233 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 00:33:48 +0000 message: merge mysql-trunk --> mysql-next-mr-wl5569 Conflicts: . mysql-test/r/log_tables_upgrade.result . mysql-test/r/mysql_upgrade.result . mysql-test/r/mysql_upgrade_ssl.result . mysql-test/r/mysqlcheck.result . mysql-test/suite/perfschema/r/pfs_upgrade_lc0.result . mysql-test/suite/rpl/t/disabled.def . mysql-test/suite/sys_vars/r/all_vars.result . mysql-test/t/system_mysql_db_fix40123.test . mysql-test/t/system_mysql_db_fix50030.test . mysql-test/t/system_mysql_db_fix50117.test . sql/log_event.cc . sql/log_event.h . sql/rpl_mi.h . sql/rpl_slave.cc . sql/share/errmsg-utf8.txt ------------------------------------------------------------ revno: 3232 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-07 20:01:39 +0200 message: manual merge with a piece of recovery support on repo. rpl_parallel hits an assert that Alfranio is fixing ------------------------------------------------------------ revno: 3231 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-07 19:35:16 +0200 message: wl#5569 MTS Testing related fixes incl master_pos_wait() support and thereafter replacing sleeps with the functioning sync_slave_with_master; Fixing the limitted Q-log-event parallelization. After the fixing mixture of rows- and Q- transactions can run concurrently. Q-transaction will be treated sequentially by default. ------------------------------------------------------------ revno: 3230 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2010-12-05 22:04:17 +0200 message: wl#5569 WL#5599 MTS & recovery Refining and correcting two wl:s integration. The main achievement is events execution status is consistently recorded into the Worker and the central RL recovery tables. That was tested manually in rather agressive env where IO was used to reconnect randomly and load from Master contained Rotate events. TODO: to fix: rpl.rpl_parallel_conf_limits may not pass to address: Multi-stmt Query-log-event transaction case (see todo in sources). to destruct by Workers their executed events (was deferred until ev->update_pos started working). (Alfranio) to deploy mts_checkpoint_routine() call inside the successful event read branch of next_event(). Otherwise no calling happens when Coord is constanly busy with read/distribute. ------------------------------------------------------------ revno: 3229 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-04 19:14:50 +0200 message: merging from the repo wl5569 ------------------------------------------------------------ revno: 3228 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-04 15:45:02 +0000 message: Added mutex to the checkpoint_routine. ------------------------------------------------------------ revno: 3227 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-03 16:56:11 +0000 message: Implemented periodic checkpoint if parallel slave is enabled. ------------------------------------------------------------ revno: 3226 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-03 10:15:45 +0000 message: Fixed commit_positions() and removed unnecessary checkpoint thread. ------------------------------------------------------------ revno: 3225 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-02 20:13:12 +0200 message: manual merge to wl#5569 tree ------------------------------------------------------------ revno: 3224 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-02 19:46:46 +0200 message: wl#5569 MTS User interface related: set @@global.slave_parallel_workers= `non-zero` following with `START SLAVE` starts slave with so many Worker threads. That is non-zero value is defacto the slave parallel execution mode. Earlier introduced enum enum_slave_exec_mode SLAVE_EXEC_MODE_PARALLEL is withdrawn. Fixes rli->mts_pending_jobs_size statistics which might cause assert-crash otherwise. a silly c&p mistake of relay-log name change notification. Made a little clean-up including relocation of init-ion of workers related stuff into start_slave_workers(). Many changes in tests due to SLAVE_EXEC_MODE_PARALLEL and not only. ------------------------------------------------------------ revno: 3223 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-01 19:08:21 +0200 message: wl#5569 MTS The limit conditions such as WQ len, total WQ:s size related changes. Also a new test file is added. ------------------------------------------------------------ revno: 3222 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-30 16:39:40 +0200 message: merging from from wl#5569 repo containing wl#5599 integration ------------------------------------------------------------ revno: 3221 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-30 16:02:15 +0200 message: wl#5569 MTS Fixing group_relay_log_name change propagation from C to W; Garbage collection in the Partition-to-Worker hash is added with a parameter of how many records in the hash are tolerated w/o checking of the usage counter. Adding C-W synchronization due to: - overall WQ:s data max - hitting the limit of a WQ length Adding Flow Control infrastructure with - level of the hungry Worker forcing Coordinator to distribute eagerly symmetrically a Worker whose load is more than 100 % - hungry level is considered as fed-up. - nap time for C in case all WQ:s lengths are above the level. - a weight param to the base nap as a function of the number of fed-up W:s. TODO: UNTIL to force sequential exec; To fix ROWS_QUERY_LOG_EVENT corner case; to fix commented out // if (!ev) delete ev; after wl#5599 is merged (ev->update_pos() is done). ------------------------------------------------------------ revno: 3220 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-11-27 17:36:50 +0200 message: wl#5569 Providing relay-log name for wl#5599. Protocol of action on the C and W sides is described in rpl_rli_pdb.h. Erroring out in case of parallel exec and ROWS_QUERY_LOG_EVENT. (todo: the native sequential mode for the event needs some revision, in particular `delete ev' shall happen *always* in rli->cleanup_context not in two places as of current). ------------------------------------------------------------ revno: 3219 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-26 23:08:30 +0200 message: wl#5569 MTS Partitioning conflict detection and handling is implemented. A new option to run Query in parallel though incompatibly with Rows- case in that the default db not the actual db:s are used as the partition key. User interface gained the global var and the cmd line opt: slave_run_query_in_parallel (Welcome to the set! :-) ------------------------------------------------------------ revno: 3218 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-andrei timestamp: Fri 2010-11-26 16:15:37 +0000 message: There was a mismatching between the number of fields read and write and by consequence the read was failing for the Slave_worker. ------------------------------------------------------------ revno: 3217 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-25 11:03:54 +0200 message: wl#5569 merging with wl#5599 piece of code ------------------------------------------------------------ revno: 3216 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-25 10:47:39 +0200 message: wl#5569 Converting the prototype time db2w hash to be concurrent; Necessary inruduction of the least occupied Worker notion. It's currently computed as Worker having the least number of distributed partitions. Adding parallel support for Query_log_event; caution: 1. the session/default not the actual db as the key 2. may not have been tested against all use cases (e.g int vars) Fixing slave stop issues. ------------------------------------------------------------ revno: 3215 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-11-22 20:57:13 +0200 message: wl#5569 extinding futher interfaces to wl#5599 with propagating future_event_relay_log_pos to the Worker exec context. ------------------------------------------------------------ revno: 3214 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-11-20 19:23:42 +0200 message: wl#5569 MTS Worker pool start, stop, kills, error out implementation. ------------------------------------------------------------ revno: 3213 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-19 16:51:58 +0200 message: wl#5569 recovery interfaces for wl#5599 implementation. The essence of this patch is to provide GAQ object implimentation and valid life cycle. The checkpoint handler prior to call store methods of wl#5599 is supposed to invoke rli->gaq->move_queue_head(&rli->workers). See a simulation of that near ev->update_pos() of the mail sql thread loop. The checkpoint info is composed as instance of Slave_job_group to reside as rli->gap->lwm. Todo: uncomment + // delete ev; // after ev->update_pos() event is garbage once the real checkpoint has been done. Todo: the real implemention needs to take care of filing Slave_job_group::update_current_binlog as initially so at time of executing Rotate/FD methods. + // experimental checkpoint per each scheduling attempt + // logics of next_event() + + rli->gaq->move_queue_head(&rli->workers); ------------------------------------------------------------ revno: 3212 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-18 16:50:54 +0200 message: wl#5569 wl#5599 Recovery related. Prototyping the worker RLI instantiation, to be elaborated on. ------------------------------------------------------------ revno: 3211 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-18 16:00:52 +0200 message: wl#5569 MTS Extending the wl#5563 prototype gradually. This commit addresses: 1. recovery interface (a new Worker rli plus rli->gaq and pseudo-code for checkpoint to update GAQ and the central RLI recovery table. Wrt rli, C and W execute do_apply_event(c_rli) where c_rli is the central instance. C executes update_pos(c_rli), but W update_pos(w_rli). others: - decreased processing time for rpl_parallel, serial. ------------------------------------------------------------ revno: 3210 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2010-11-14 11:55:32 +0000 message: Post-push fix for WL#5599. ------------------------------------------------------------ revno: 3209 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-12 17:58:12 +0000 message: Post-push fix for WL#5599. ------------------------------------------------------------ revno: 3208 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-11 11:53:01 +0000 message: WL#5599 The patch changed the handler's functions, i.e. init_info, check_info, flush_info, remove_info and end_info and the related private member functions, in both file and table handlers, to accept an index that identifies the information that will be read or written. This is necessary now because the handlers will be used by the workers to read and write information from file(s) and table and there may be several workers running at the same time and thus an index is used to identify the worker that is accessing information. This change is also necessary for the multi-master replication as information from each master must be uniquely identified. ------------------------------------------------------------ revno: 3207 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-11-10 10:57:13 +0000 message: Refactory to start work on WL#5599. ------------------------------------------------------------ revno: 3206 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 13:34:18 +0000 message: Removed mysql-test/collections/mysql-next-mr.crash-safe.* in the WL#5569. ------------------------------------------------------------ revno: 3205 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 13:04:14 +0000 message: merge mysql-next-mr.crash-safe --> mysql-next-mr-wl5569 Conflicts: . sql/CMakeLists.txt . sql/Makefile.am . sql/sql_class.h . sql/rpl_slave.cc ------------------------------------------------------------ revno: 3204 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 11:39:37 +0000 message: merge mysql-next-mr-wl5563-labs --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3203 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 23:33:37 +0300 message: wl#5563 simplifying memory handling for the Coor-Workers transport to avoid sporadic crashes ------------------------------------------------------------ revno: 3202 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 21:19:56 +0300 message: wl#5563 leaving out a fine garbage collection. That task is unnessary to solve at prototyping time. Update-pos routine to be implemented is going to eliminated that piece of code ------------------------------------------------------------ revno: 3201 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 20:38:35 +0300 message: wl#5563 Extending the tests base to split the former rpl_parallel into two to run in serial exec mode as well. ------------------------------------------------------------ revno: 3200 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 11:49:00 +0300 message: wl#5563 improved test; fixed a delete issue that was used to crash; added @@global.slave_local_timestamp to fill in timestamp col slave clock value. Performance growth can be seen through the test. todo: merge with Alfranio work on hashing and dyn alloc of PFS obj:s. ------------------------------------------------------------ revno: 3199 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Wed 2010-09-15 14:51:49 +0300 message: wl#5563 tests for the wl. Number of workers and iterations can be tuned. todo: convert as param:s to pass to the test through mtr ------------------------------------------------------------ revno: 3198 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Mon 2010-09-13 18:22:41 +0300 message: wl#5563 adding an ingeneous no-stress-attempting-yet test that also fired an assert. Refined the Worker instance ref computing because cleanup_context() is executed by the sql-thread the coordinator as well ------------------------------------------------------------ revno: 3197 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Mon 2010-09-13 13:15:38 +0300 message: wl#5563 Rows-event parallelization basically is implemented although tested shallowly. Write access to rli central stuct by workers may not be eliminated fully at this phase. E.g that relates to errors. todo: to prove rli gets out of Worker scope todo: to provide a stress test ------------------------------------------------------------ revno: 3196 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Sat 2010-09-11 17:00:08 +0300 message: wl#5563 adding Rows-event limitted to one Worker support. Deferred deletion did not check emptyness of the list ------------------------------------------------------------ revno: 3195 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-10 20:36:07 +0300 message: wl#5563 correcting comments to indicate less limitations ------------------------------------------------------------ revno: 3194 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-10 20:32:39 +0300 message: WL#5563 Prototype for Slave parallelized by db name More progress to the WL in that the STMT binlog-format works while the conceptual limits are held. That is no query/transaction is allowed to deal with more than one db. Addressed a complication in that update pos method that is run by Coordinator belongs to Log_event hierarchy and therefore the event deletion now by Worker must be careful. Todo: 1. (High prior) fix Row-format complications 2. (Hight prior) Elaborate on the hash function to be a function on db text name 3. (Optional) Consider moving update_pos to the RLI class to get rid of the delete logics complication. How-to-use: The instuction can be found in comments of the previous commit, see there for more details. In brief though, the db names have to follow a pattern: `test[0-9]'. E.g test0, test1, test2, test3 for the default four Worker threads. Slave side has to set @@global.slave_exec_mode=PARALLEL; before START SLAVE. ------------------------------------------------------------ revno: 3193 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Thu 2010-09-09 21:43:16 +0300 message: WL#5563 Prototype for Slave parallelized by db name This is an intermediate commit that indicates some progress. Namely, the worker pool operates correctly and with signs of scalable performance. How to test: connection master; set @@global.binlog_format=statement; connection slave; set @@global.slave_exec_mode=PARALLEL; set @@global.binlog_format=mixed; show processlist; => IO, SQL threads + 4 workers by default change master to ... connection master; create database test0; create database test1; create database test2; create database test3; # create databases with magic names "test[0-9]+", where the number will index # a worker. create database test0; create database test1; create database test2; create database test3; # create tables. they are only of MyISAM type for now use test0; create table tm_1(a int, b int) engine=myisam; use ... # DML on tables: use test[0-3]; insert into tm_1 values (1,0); ... ... connection slave; # monitor CPU (visually this time: top etc) # check correctness e.g select count(*) from test[0-3].tm_1; connection master; select count(*) from test[0-3].tm_1; ****** ------------------------------------------------------------ revno: 3364 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 17:09:22 +0300 message: wl#5569 MTS Refining rpl_rotate_logs that could not produce deterministic output. The list of binlogs contained one binlog more than expected. ------------------------------------------------------------ revno: 3363 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 14:56:01 +0300 message: updating result files that were left incorrect by the last merge. ------------------------------------------------------------ revno: 3362 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 14:44:59 +0300 message: wl#5569 MTS Failure in recovery when binlog-checksum is active. The reason of the failure was in that parsing of relay log by MTS recovery gaps computing did not make sure to use the relay-log own FormatDescriptor events that contain checksumming info for all events in the log. Fixed with taking care to find out the checksum algorithm for every relay log as the first step of MTS recovery gaps computing. ------------------------------------------------------------ revno: 3361 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-08-17 11:21:23 +0300 message: merge from trunk forced to resolve few semantical conflicts caused by changes in THD::enter_cond() of the trunk. ------------------------------------------------------------ revno: 3360 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-27 08:56:14 +0100 message: Fixed failure in test rpl_mts_check_concurrency when running in the mts collection. ------------------------------------------------------------ revno: 3359 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-26 19:46:41 +0100 message: Added a test case that checks if MTS allows to concurrently access the replication tables, and as such, concurrently commit transactions that update different databases. ------------------------------------------------------------ revno: 3358 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 20:08:43 +0100 message: Configured rpl_parallel_switch_sequential to run in row and mixed mode to avoid cluttering the error log with messages on unsafe execution. ------------------------------------------------------------ revno: 3357 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 19:02:14 +0100 message: This patch contains the following fixes: . Removed suppressed warning introduced in the wrong test case (i.e. rpl_corruption) and put it in the correct one (i.e. rpl_row_corruption). . Introduced variable to avoid clutering the error log with several warning messages on unsafe execution. ------------------------------------------------------------ revno: 3356 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 11:01:12 +0100 message: This patch has the following changes: . Specific directories were created for the MTS runs in the default.push. . Warning message was suppressed in the rpl_corruption.test. . Annoying debug outputs were removed from the error log. However, this is a temporary solution as it forbids to enable traces. ------------------------------------------------------------ revno: 3355 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-20 11:56:40 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3354 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-19 22:26:30 +0300 message: wl#5569 MTS valgrind reported a stack on rpl_savepoint. The problem appears to be in that at computing slave_sql_running_state in show_mater_info() the sql thread proc_info pointer could refer to a value in a stack that has already gone. Fixed with making proc_info to point to a string literal. ------------------------------------------------------------ revno: 3353 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-19 17:46:43 +0100 message: Suppressed warning messages that could potentially cause problems while running mts crash safe test cases. ------------------------------------------------------------ revno: 3352 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-18 21:46:45 +0300 message: wl#5569 MTS Cosmetic changes are done to address readability and clearness of source code of the MTS patch. ------------------------------------------------------------ revno: 3351 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-18 14:52:44 +0300 message: wl#5569 MTS Inadvertently introduced hunk two rev:s back is reverted to please rpl_*_mts_crash_safe. ------------------------------------------------------------ revno: 3350 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-17 00:51:45 +0300 message: wl#5569 MTS fixing build issue for embedded. Public visibility for Rows_log_event::do_apply_event() is restored. ------------------------------------------------------------ revno: 3349 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-16 20:08:31 +0300 message: wl#5569 MTS The patch contains improvements after code review. Changes are mostly consmetic. ------------------------------------------------------------ revno: 3348 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-16 02:11:11 +0300 message: bug#12755663 MTS: RPL_CIRCULAR_FOR_4_HOSTS FAILS: CANT EXECUTE THE CURRENT EVENT GROUP MTS stopped with an error in the middle of the test. The reason is that a group of events from the slave itself was processed partly to modify the group position. In the following restart the wrong group bondary made slave either to error out or assert. Fixed with locating a possible race condition allowin Coordinator to ignore actual failed status of a Worker. So in the case of the test, the slave server group can't be started. Notice, this is a trial patch since I can't catch the failure on available to me hosts at all. ------------------------------------------------------------ revno: 3347 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-07-14 12:40:06 +0300 message: WL#5569 MTS further extensive rpl_circular_for_4_hosts exersices with --repeat 10 --parallel=8 revealed a race condition in that Coordinator might miss to catch not-running status for a Worker. That made Coordinator to skip only a part of a group of the slave server own events so the slave stops at not the bondary of a group. Fixed with moving marking of the errored-out Worker as failed prior to its APH entries release. TODO: notice there can be a possibility to stop at not the boundary due to graceful STOP SLAVE if one is run at time of skipping self-originated events. However this issue belongs to STS and might be similar with BUG@12604951 and BUG@12728160. ------------------------------------------------------------ revno: 3346 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-07-14 08:03:55 +0100 message: Post-push fixes for WL#5569 Injecting faults while updating a myisam table requires to flush the changes before committing suicide. So we have introduced the follwing code: DBUG_EXECUTE_IF("crash_after_commit_and_update_pos", - DBUG_SUICIDE();); + sql_print_information("Crashing crash_after_commit_and_update_pos."); + flush_info(TRUE); + DBUG_SUICIDE(); Besides we improved some comments. ------------------------------------------------------------ revno: 3345 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-13 16:23:57 +0100 message: WL#5569 ------------------------------------------------------------ revno: 3344 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-13 00:10:43 +0300 message: wl#5569 MTS merge trunk -> wl5569-tree ------------------------------------------------------------ revno: 3343 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-12 23:36:17 +0300 message: wl#5569 MTS adding suppression due to expected warning to rpl_circurlar_for_4_hosts; decreasing a loop limit in rpl_parallel_switch_sequential in case of statement format. ------------------------------------------------------------ revno: 3342 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-12 14:46:23 +0300 message: WL#5569 MTS Fixing code and test due to rpl.rpl_circular_for_4_hosts mismatch failure, like http://pb2.norway.sun.com/?action=archive_download&archive_id=3608382. The reason of the mismatch was that when having two group of events to execute, the first for a Worker and the 2nd for Coordinator, Coordinator waited for the 1st group completion but did not verify success of synchronization. So in a case of the failed applying of the 1st group processing of the 2nd could find an inconsistent state to end up with a segfault (even though only the mismatch has been seen so far). ------------------------------------------------------------ revno: 3341 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-10 22:40:01 +0100 message: Avoiding busy waiting when running mts recovery tests. ------------------------------------------------------------ revno: 3340 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-09 23:11:58 +0100 message: Removed --slave-checkpoint-period from MTS test cases. ------------------------------------------------------------ revno: 3339 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-09 23:08:07 +0100 message: Improved test cases for the WL#5569. ------------------------------------------------------------ revno: 3338 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 22:40:52 +0300 message: wl#5569 MTS The patch refines logics of applying phase of MTS-recovery to always applying events that are for Coordinator; fixes few tests to make them passable on PB; makes GAQ size to be of checkpoint_group value. ------------------------------------------------------------ revno: 3337 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 07:54:34 +0100 message: Reduced the timeout period to run the checkpoint routine by setting slave-checkpoint-period to 30. ------------------------------------------------------------ revno: 3336 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 07:44:35 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3335 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-06 12:46:05 +0300 message: wl#5569 MTS refining wait for db-hash entry release at event distribution. The graceful STOP is not accepted at this point so Coordinator continues to stay in a loop. ------------------------------------------------------------ revno: 3334 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-05 20:43:04 +0300 message: bug#12719875 possible MTS recovery issue. MTS stopped with an error after failing to apply an event. It turned out that the event was sceduled incorrectly due to earlier stop by Single-Threaded Slave not at the group boundary but rather in the middle of it. Fixed with forcing CREATE..SELECT be logged as two groups. The CREATE-TABLE group is surrounded with its own BEGIN/COMMIT braces. ------------------------------------------------------------ revno: 3333 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-04 18:14:09 +0300 message: wl#5569 MTS Adding a rule to run PB with all suites in MTS with binlog-format ROW. ------------------------------------------------------------ revno: 3332 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-03 23:29:34 +0300 message: wl5569 MTS cleanup in one file. ------------------------------------------------------------ revno: 3331 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-03 23:16:02 +0300 message: wl5569 MTS bzr commit mail address changed; a minor cleanup to make mts_is_worker() with const argument; releasing a test to run in MTS. ------------------------------------------------------------ revno: 3330 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-02 08:58:56 +0100 message: Fixed use of the performance schema in the replication code and concurrency issue in the IO Thread. In particular, the IO Thread was calling flush_master_info without grabbing locks. ------------------------------------------------------------ revno: 3329 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 16:41:35 +0300 message: wl5569 MTS merging from the main repo. ------------------------------------------------------------ revno: 3328 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 15:48:25 +0300 message: wl#5569 MTS the final cleanup patch. There are few glitches that were considered as tolerable at least for time of the total wl's code is being reviewed. That includes: - no support to old load-data events - no support for FK to add to the list, there are few places in the patch that suggests to deploy error branches each time flush_info() is called. ------------------------------------------------------------ revno: 3327 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 13:16:52 +0300 message: wl#5569 MTS The patch cleans up some host of code. ------------------------------------------------------------ revno: 3326 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-28 11:30:18 +0300 message: wl#5569 MTS replacing views with regular tables for consistency verification in rpl_parallel_innodb. Also a minor cleanup in rpl_parallel is done. ------------------------------------------------------------ revno: 3325 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-27 20:31:45 +0300 message: wl#5569 MTS Cleanup and addressing sporadic rpl_temp_table_mix_row failure in post-execution mtr.check_testcase(). The check of the test failure was caused by faulty optimization in avoiding to migrate temporary tables from Coordinator to Workers in case of rows-event assignement. while it's correct with the homogenous rows-event only load, the mixture can fail. Fixed with removing the optimization so map_db_to_worker() always relocates which is somewhat suboptimal and should be improved in future. ------------------------------------------------------------ revno: 3324 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-27 13:12:52 +0100 message: Ensured that updates to the worker_info_repository are transactional and fixed the slave_checkpoint_group_basic test case. ------------------------------------------------------------ revno: 3323 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-26 13:02:59 +0100 message: Fixed test case. ------------------------------------------------------------ revno: 3322 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-06-25 15:14:24 +0100 message: Introduced test case for recovery with MTS and fixed bugs in recovery. ------------------------------------------------------------ revno: 3321 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 15:38:19 +0300 message: wl#5569 MTS This patch makes a bit of cleanup, addresses one memory-allocation todo and completes fixing valgrind report (rpl_parallel_start_stop) due to strings allocation in Slave_job_group items. ------------------------------------------------------------ revno: 3320 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 12:38:34 +0300 message: wl#5569 MTS this patch completes the previous one to fixes a result file and make the innodb specific test verification to base on tables not views. ------------------------------------------------------------ revno: 3319 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 00:11:22 +0300 message: wl#5569 MTS this is an exploratory patch to sort out if verification method what was based on views has its own not related to mts flaw. The patch calls verification macro on the tables that required some adjustment. ------------------------------------------------------------ revno: 3318 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-23 07:56:15 +0300 message: wl#5569 MTS fixing results of mysqld--help-win. ------------------------------------------------------------ revno: 3317 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 19:20:40 +0100 message: merge mysql-next-mr-wl5569 (local) --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3316 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 19:17:43 +0100 message: In some platforms, such as Windows, thread's wait time is stored in 100ns units. However, when computing the difference between two values, the result value was not multiplied by 100. Besides, there was a casting problem when the aforementioned result value was assigned to an ulong. ------------------------------------------------------------ revno: 3315 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 18:54:23 +0100 message: Fixed how mts copes with recovery. ------------------------------------------------------------ revno: 3314 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-21 19:10:54 +0300 message: wl#5569 MTS Fixing valgrind warnings. ------------------------------------------------------------ revno: 3313 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-21 18:15:43 +0300 message: wl#5569 MTS rpl_parallel_start_stop.test could fail sporadicaly with timeout. ------------------------------------------------------------ revno: 3312 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 23:21:56 +0100 message: merge mysql-next-mr-wl5569 (local) --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3311 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 23:19:06 +0100 message: Fixed error when computing the Lower-Water-Mark. If two or more jobs were removed from the Group of assigned jobs and one of the jobs had a non-empty group relay log but the last one had an empty group relay log. The Lower-Water-Mark was not correctly updated, because the algorithm assumed that the group relay log was null. ------------------------------------------------------------ revno: 3310 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 11:52:44 +0100 message: Fixed valgrind errors. Slave_job_group was silently being cast to LOG_POS_COORD while calling sort_dynamic(&above_lwm_jobs, (qsort_cmp) mts_event_coord_cmp) and by consequence mts_event_coord_cmp(LOG_POS_COORD *, LOG_POS_COORD *). This had two problems: . The first two entries in the Slave_job_group were not a pointer to a char * and my_offset. . Even if the first two entries were char * and my_offset, such casting could lead to alignment problems. To fix the problem, we avoid this casting. ------------------------------------------------------------ revno: 3309 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 19:14:50 +0300 message: wl#5569 MTS fixing slave_transaction_retries_basic_64.result ------------------------------------------------------------ revno: 3308 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 16:11:25 +0300 message: wl#5569 MTS fixing tests. ------------------------------------------------------------ revno: 3307 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 12:33:36 +0300 message: wl#5569 MTS Fixing rpl.rpl_mixed_binlog_max_cache_size that revealed incorrect asynchronous handling of a Rotate event which does not split the current group and therefore has to be executed after all previously scheduled events. Fixing sensetivity of two other tests to mtr's invocation environment that includes inital values of slave_parallel_workers and slave_transaction_retries. ------------------------------------------------------------ revno: 3306 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 09:04:19 +0100 message: Fixed some windows failures. ------------------------------------------------------------ revno: 3305 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-06-18 19:58:21 +0100 message: Fixed some recovery issues. ------------------------------------------------------------ revno: 3304 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 21:01:58 +0300 message: wl#5569 MTS fixing tests and a segfault at the end of handle_slave_sql() happened after worker initialization failed (e.g rpl_row_log on win). ------------------------------------------------------------ revno: 3303 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 18:34:16 +0300 message: wl#5569 MTS fixing tests. ------------------------------------------------------------ revno: 3302 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 14:00:41 +0300 message: wl#5569 MTS fixing rpl_row_basic_3innodb similarly to the previous patch. ------------------------------------------------------------ revno: 3301 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 13:51:59 +0300 message: wl#5569 MTS fixing few tests. 1. Policy is implemented for reacting with a warning in a case of failing worker leaves the total slave state with gaps thereby inconsistent. 2. Two tests that were used to time out due to reset master/slave was disabled in there. ------------------------------------------------------------ revno: 3300 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 02:24:59 +0100 message: Removed unnecessary test cases and augment others in order to test recovery. ------------------------------------------------------------ revno: 3299 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-16 19:46:22 +0300 message: wl#5569 MTS fixing slave_parallel_workers_basic and rpl_stop_middle_group which cant run in MTS ------------------------------------------------------------ revno: 3298 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-16 11:29:53 +0300 message: wl#5569 MTS adding new tests to sys_vars.\ ------------------------------------------------------------ revno: 3297 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:41:32 +0100 message: WL#5569 Adding a global suppression for the warning that may appear when stopping the slave sql thread in the middle of a group. This should affect MTS mode only. ------------------------------------------------------------ revno: 3296 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:40:41 +0100 message: WL#5569 Renames worker-info-repository to slave-worker-info-repository in some tests option files. ------------------------------------------------------------ revno: 3295 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:32:37 +0100 message: WL#5569 More test fixes. Removing remaining prefixes 'mts' from mts variables, which have been renamed recently. ------------------------------------------------------------ revno: 3294 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 00:27:20 +0100 message: WL#5569 Fixing rpl_parallel result file. ------------------------------------------------------------ revno: 3293 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:41:33 +0300 message: wl#5569 MTS correcting --slave-parallel-workers in few more files ------------------------------------------------------------ revno: 3292 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:31:46 +0300 message: wl#5569 MTS correcting --slave-parallel-workers in collections/default.push ------------------------------------------------------------ revno: 3291 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:12:11 +0300 message: wl#5569 MTS Cleanup, including 1. decreasing number and renaming system variables. Important for debugging command line options are replaced with reasonble constant values and only necessary are retained. 2. Small encapsulation in ha_blackhole.cc is done. ------------------------------------------------------------ revno: 3290 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 15:59:23 +0100 message: Fixed replication valgring failures caused by the MTS. ------------------------------------------------------------ revno: 3289 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 21:23:13 +0300 message: wl#5569 MTS wl#5754 Query event parallel execution Fixing failing tests and a failure in gathering accessed databases that was caused by a recent merge from trunk. ------------------------------------------------------------ revno: 3288 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 13:35:20 +0300 message: merge from trunk ------------------------------------------------------------ revno: 3287 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 12:27:38 +0300 message: wl#5569 MTS Fixing failing tests due to a. a flaw in `isolated parallel' mode implementation. Isolation applies to a group of event rather than to an instance. And event that contains over-max accessed db:s or event from Old master trigger marking the current being scheduled group. Such group will be executed having all prior scheduled done and nomore will be scheduled until the group is done. b. Notification to Coordinator about errored-out Worker is corrected. ------------------------------------------------------------ revno: 3286 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-12 22:33:32 +0300 message: wl#5569 MTS making default.push to run rpl suite with non-default --mts-slave-parallel-workers > 0 in all three format/mode (row,stmt, mixed). The default is run for all suites in mixed mode and rpl suites with row+ps, stmt formats. ------------------------------------------------------------ revno: 3285 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-12 22:05:05 +0300 message: wl#5569 MTS manual merge with few fixes for segfault of the last merge from the trunk etc, compilation issue on embedded. ------------------------------------------------------------ revno: 3284 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-09 18:35:59 +0100 message: Post-fixes for merge. Fixed compilation in Windows and removed an used options. ------------------------------------------------------------ revno: 3283 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-09 16:27:47 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3282 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-06 13:51:19 +0300 message: wl#5569 MTS STOP SLAVE now stops consistently w/o gaps, KILL shall be used for an urgent stop, an error case behaves like the killed. For instance, a Worker errors out, it sends KILL to Coordinator through THD::awake(), and Coordinator kill the rest through setting a special Worker-running status to killed (which breaks the read-exec loop of a Worker). ------------------------------------------------------------ revno: 3281 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-05 20:01:51 +0300 message: wl#5569 MTS More cleanup, fixes due to found issues when running tests, some improvements incl in stopping Workers to make routine to distinguish between killed and gracefully stopped cases so in the end STOP SLAVE will guarantee consistent state (some todo remains still). ------------------------------------------------------------ revno: 3280 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-05-30 13:05:07 +0300 message: WL#5569 MTS WL#5754 Query event parallel applying ----------------------------------------------------------------- Aggregating 7 commits that are not pushed yet to the wl5569 repo. Find comments for each cset below. ------------------------------------------------------------------ The current patch addresses concurrent updating slave_open_temp_tables status counter. The former declaration of the underlying server variable is changed from ulong to int32. While that might affect (shrink) the actual range, there has been no specified range and now after the number of bits is the same on all platforms the range cat be set to be [0, max(int32)] ****** wl#5569 MTS Wl#5754 Query event parallel appying wl#5599 MTS recovery The patch includes some cleanup, including one for temp tables support, realization of few todo:s. ****** wl#5569 MTS wl#5754 Query event parallel applying More cleanup is done; Fixing temp tables manipulation. Asserting an impossible to support use case of group of events not wrapped with BEGIN/COMMIT. Todo: recognize old master binlog to refuse to run in parallel. ****** wl#5569 MTS Implementation of giving out the applier role to Worker for all cases but ones dealing with the Coordinators state. That includes Query event with over-max-db:s and Load-data related events. The current patch also makes old master binlog be handled by MTS though sometimes e.g for Query event to switch to the sequential mode. Fixing a race condition making C to wait endlessly if a Worker has exitted due to its applying error. ****** wl#5569 MTS correcting an assert that was used to fire as warned in the previous commit. Parallel feature tests pass now. ****** wl#5569 MTS This patch contains cleanup and simplification of logics of handling some events sequentially by Coordinator and adds memory-allocation failure branch to workers starting routine. ****** wl#5569 MTS An intermediate patch to address few issues raised by reviewers. To sum up, it's about cleanup and logics simplification of event distribution to Worker and consequent actions. Some efforts were paid to support Old Master Begin-less group of events. ------------------------------------------------------------ revno: 3279 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-05-24 17:29:35 +0300 message: WL#5569 MTS WL#5754 Query parallel appying Changing implementation of temporary tables support in MTS. Cleanup, fixing few todo:s and few potential issues found. ------------------------------------------------------------ revno: 3278 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-05-19 12:36:28 +0300 message: wl#5569 MTS Support for ROWS_QUERY_LOG_EVENT is added. It required refactoring of its handling in the canonical sequential mode. The event life suggests its behavior similar to objects associated with Table_map, in particural, its destoying to occur at the end-of-statement time. Tested against existing ROWS_QUERY_LOG_EVENT feature tests incl rpl_row_ignorable_event in both sequential and parallel mode. ------------------------------------------------------------ revno: 3277 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-05-16 22:43:58 +0300 message: wl#5569 MTS Simplifying Coordinator-Worker interfaces. In essence after this patch Worker execute events in its private context (class Slave_worker :public Relay_log_info). The only exception is Query referring to temporary table. The temp:s are maintained in the Coordinator's "central" rli; removing some dead code; performing a lot of cleanup. There are few todo items incl: 1. To implement several todo:s scattered across MTS' code and tests (e.g to restore protected for few members of RLI of rpl_rli.h); 2. to cover Rows_query_log_event that currently can cause hanging (e.g rpl_parallel_fallback) 3. To sort out names of classes based on Rpl_info, possibly remove Rpl_info_worker ------------------------------------------------------------ revno: 3276 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-05-06 21:33:32 +0300 message: wl#5569 MTS improving benchmarking test. ------------------------------------------------------------ revno: 3275 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-04-06 15:51:58 +0300 message: wl#5569 MTS Statistics for Workers and Coordinator incl waiting times, sleeping is reported now into the error log as slave stopping time. ------------------------------------------------------------ revno: 3274 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-04-05 19:26:37 +0300 message: wl#5569 MTS restoring previous 4 default workers that rpl_parallel works with. ------------------------------------------------------------ revno: 3273 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-04-03 13:07:30 +0300 message: wl#5569 MTS Benchmarking related patch uniforms rpl_parallel to be run with arbitrary number of workers, db:s, tables, etc. TODO: to restore the final constinency check which is given out temporary while i could not find a way to leave it surrounded with a --dis/en-able* stanza. ------------------------------------------------------------ revno: 3272 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-04-02 14:32:02 +0300 message: wl#5569 MTS a test file for benchmarking is added. Benchmarking results can be gained by extracting the master side generating and the slave side applying times like in the following loop: workers=6; for n in `seq 1 3`; do echo; echo loop: $n; echo; my_mtr.sh --mysqld=--mts-slave-parallel-workers=$workers \ rpl_parallel_benchmark --mysqld=--binlog-format=statement \ && cat /dev/shm/var/mysqld.2/data/test/delta.out >> p${workers}_stmt.out 2>&1; done ------------------------------------------------------------ revno: 3271 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-03-30 17:11:24 +0300 message: wl#5754 Query event parallel execution Small cleanup for comments as requested by reviewer. ------------------------------------------------------------ revno: 3270 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-02-27 19:35:25 +0200 message: WL#5754 Query event parallel execution Bundling together implementation the whole DML+DDL Query parallel support. That includes: The earlierst four rev:s to cut off the DML stage of the parallel query project from the following devoted to DDL. The four skeleton parallel applying of Queries containing a temporary table, and implement a core of the design that is the DML queries. Queries can contain arbitrary features including temp tables. The DDL part also refined few items related to the general low-level design. In particular, of the mark of the over-max db:s in the updated-db:s status var is turned to be another new constant value. The very last patch to the bundle addresses the last review mail notes. ------------------------------------------------------------ revno: 3269 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-01-12 01:01:02 +0200 message: merging from mysql-trunk ------------------------------------------------------------ revno: 3268 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-01-12 00:54:12 +0200 message: wl#5569 MTS fixing the worker threads start/stop. ------------------------------------------------------------ revno: 3267 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-27 18:54:41 +0000 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3266 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-24 01:57:03 +0200 message: wl#5569 MTS the timed-wait loop of SQL thread required a break-through parameter in case the signal missed in action and just timeout would be reported ------------------------------------------------------------ revno: 3265 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 19:03:42 +0200 message: merging from the repo wl5569 ------------------------------------------------------------ revno: 3264 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 17:49:19 +0200 message: wl#5569 MTS fixing corner cases that mtr-testing with mts workers against stardard suites reveal. ------------------------------------------------------------ revno: 3263 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 16:00:28 +0200 message: wl#5569 MTS: refining another assert that can force C to delete events that are skipped with the slave skip counter ------------------------------------------------------------ revno: 3262 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 15:34:02 +0200 message: wl#5569 MTS Correcting an assert that is hit by few tests. ------------------------------------------------------------ revno: 3261 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 13:27:15 +0200 message: wl#5569 MTS merging from the repo. ------------------------------------------------------------ revno: 3260 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 13:25:31 +0200 message: wl#5569 MTS fixing failing tests. ------------------------------------------------------------ revno: 3259 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 20:34:26 +0200 message: wl#5569 MTS merging from the repo. ------------------------------------------------------------ revno: 3258 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 20:31:13 +0200 message: wl#5569 MTS fixing tests failure when mtr runs --mts_slave_parallel_workers != 0. rpl000010 is a representative. Fixed with identifying, marking, running carefully ev->update_pos() and destroying an event that can split a group of events to force part to be in different relay logs. ------------------------------------------------------------ revno: 3257 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 13:57:18 +0200 message: wl#5569 MTS and wl#5599 MTS recovery The general recovery implementation is finished by this patch. Tested against ./mtr rpl_parallel_conf_limits. Warning, ./mtr rpl_parallel_conf_limits rpl_parallel_conf_limits ... can fail at the 2nd etc test because of no removal of Worker tables happens at RESET SLAVE. ------------------------------------------------------------ revno: 3256 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 22:12:30 +0200 message: wl#5569 MTS slave_worker_info def is updated in the system db. ------------------------------------------------------------ revno: 3255 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 21:34:58 +0200 message: merging with repo ------------------------------------------------------------ revno: 3254 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 21:31:29 +0200 message: wl#5569 MTS Recovery routine part I: gathering the group recovery bitmap. ------------------------------------------------------------ revno: 3253 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 22:18:33 +0000 message: WL#5599 Fixed routine to compute the bitmap of executed events. ------------------------------------------------------------ revno: 3252 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 21:37:48 +0200 message: wl#5569 MTS adding checkpoint relay_log_name,pos as necessary part to locate a relay-log for recovery. Tested with rpl_parallel. ------------------------------------------------------------ revno: 3251 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 17:58:58 +0200 message: wl#5569 MTS manual merging from the repo and correcting GAQ processing with introducing a volatile byte to indicate whether an item is busy or released. ------------------------------------------------------------ revno: 3250 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-18 21:00:23 +0200 message: wl#5569 MTS fixing --mts-exp-slave-run-query-in-parallel=1 case when Query-log-event can be run in parallel incl DML and DDL. The feature is `exp'erimental still can be tried while there are no temp tables involved neither a db different than the session's default is modified by the query. Tested: Changes sustain mtr rpl_parallel --mysqld=--mts-exp-slave-run-query-in-parallel=1 --mysqld=--binlog-format=statement ------------------------------------------------------------ revno: 3249 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-17 14:46:15 +0200 message: wl#5569 MTS fixing PB2 failures, incl valgrind issues, long exec time and asserting in a test. ------------------------------------------------------------ revno: 3248 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-17 00:00:47 +0200 message: merge from wl#5569 repo to local branch rpl_sequential opt files are added to avoid mtr give up to process a bulk of unsafe warnings. ------------------------------------------------------------ revno: 3247 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-16 23:41:45 +0200 message: wl#5569 MTS Adding transparent support/fallback to the sequential execution cases of 1. Query-log-event 2. Rows_query_log_event info event Both cases can be fully parallelized in future project. Fixing an issue in move_queue_head() that was surficed as an assert in Slave_worker::slave_worker_group_ends(). Fixing destoying an event by Worker. ------------------------------------------------------------ revno: 3246 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-14 16:46:20 +0200 message: merge from wl5569 repo ------------------------------------------------------------ revno: 3245 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-14 10:57:16 +0200 message: wl#5569 MTS a light cleanup to arrange the option/system var names properly - mts_-prefixing, and _exp prefixing for experimental features needed for benchmarking (mts_exp_slave_local_timestamp) or suppored limitly (mts_exp_slave_run_query_in_parallel for Query-log-event). Fixing GAQ size. It might be too tight e.g in case of the max WQ length of 1; tested with running rpl_parallel supplying --mts-slave-worker-queue-len-max=1. ------------------------------------------------------------ revno: 3244 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 18:53:32 +0200 message: wl#5569 MTS fixing a valgrind stack cauased by extra pfs-keys/cond_var. Those are removed with Alfranio`s consent ------------------------------------------------------------ revno: 3243 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 17:57:01 +0200 message: wl#5569 MTS fixing a set of valgrind warning cauased by a c&p ------------------------------------------------------------ revno: 3242 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 16:52:50 +0200 message: wl#5569 MTS updating results for few tests. ------------------------------------------------------------ revno: 3241 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-11 21:00:47 +0200 message: wl#5569 MTS 1. Fixing recovery related issue of DBUG_ASSERT(rli->get_event_relay_log_pos() >= BIN_LOG_HEADER_SIZE); at slave start with shifting mts_recovery_routine() at front of the assert. 2. Making SKIP-ed event to commit to the central RLI. That is correct since Workers are not executing anything at this time. 3. Fixing the default for mts_checkpoint_period which should not be zero normally. Zero makes sense solely for debugging (so we may stress that through VALID_RANGE(1,...). 4. Introduced a general mts-unsupported error/warning to apply to cases of non-zero parallel workers and a feature that parallelization can't work with. ------------------------------------------------------------ revno: 3240 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-10 18:25:27 +0200 message: merge from wl5569 repo to a local branch ------------------------------------------------------------ revno: 3239 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-10 17:50:03 +0200 message: wl#5569 MTS Improving GAQ in a) limit size to be capable to hold items while all WQ:s are full b) move_queue_head() contained a flaw to make no progress falsely c) never let to enque in GAQ while it's full ------------------------------------------------------------ revno: 3238 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-09 19:46:27 +0200 message: merge from wl5569 repo to a local branch ------------------------------------------------------------ revno: 3237 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-09 19:45:02 +0200 message: wl#5569 MTS Integration with wl#5599 recovery for MTS and fixing two asserts. One is due to missed cleanup of errored-out rows-events; the other is a work-around on w->curr_group_exec_parts->dynamic_ids is initialized to have one partition on the Worker startup, but it should not. ------------------------------------------------------------ revno: 3236 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 13:59:07 +0000 message: WL#5599 Fixed warning messages. ------------------------------------------------------------ revno: 3235 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 12:59:07 +0000 message: WL#5599 Fixed test cases. ------------------------------------------------------------ revno: 3234 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 01:30:32 +0000 message: WL#5599 Fixed failures in test cases. ------------------------------------------------------------ revno: 3233 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 00:33:48 +0000 message: merge mysql-trunk --> mysql-next-mr-wl5569 Conflicts: . mysql-test/r/log_tables_upgrade.result . mysql-test/r/mysql_upgrade.result . mysql-test/r/mysql_upgrade_ssl.result . mysql-test/r/mysqlcheck.result . mysql-test/suite/perfschema/r/pfs_upgrade_lc0.result . mysql-test/suite/rpl/t/disabled.def . mysql-test/suite/sys_vars/r/all_vars.result . mysql-test/t/system_mysql_db_fix40123.test . mysql-test/t/system_mysql_db_fix50030.test . mysql-test/t/system_mysql_db_fix50117.test . sql/log_event.cc . sql/log_event.h . sql/rpl_mi.h . sql/rpl_slave.cc . sql/share/errmsg-utf8.txt ------------------------------------------------------------ revno: 3232 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-07 20:01:39 +0200 message: manual merge with a piece of recovery support on repo. rpl_parallel hits an assert that Alfranio is fixing ------------------------------------------------------------ revno: 3231 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-07 19:35:16 +0200 message: wl#5569 MTS Testing related fixes incl master_pos_wait() support and thereafter replacing sleeps with the functioning sync_slave_with_master; Fixing the limitted Q-log-event parallelization. After the fixing mixture of rows- and Q- transactions can run concurrently. Q-transaction will be treated sequentially by default. ------------------------------------------------------------ revno: 3230 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2010-12-05 22:04:17 +0200 message: wl#5569 WL#5599 MTS & recovery Refining and correcting two wl:s integration. The main achievement is events execution status is consistently recorded into the Worker and the central RL recovery tables. That was tested manually in rather agressive env where IO was used to reconnect randomly and load from Master contained Rotate events. TODO: to fix: rpl.rpl_parallel_conf_limits may not pass to address: Multi-stmt Query-log-event transaction case (see todo in sources). to destruct by Workers their executed events (was deferred until ev->update_pos started working). (Alfranio) to deploy mts_checkpoint_routine() call inside the successful event read branch of next_event(). Otherwise no calling happens when Coord is constanly busy with read/distribute. ------------------------------------------------------------ revno: 3229 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-04 19:14:50 +0200 message: merging from the repo wl5569 ------------------------------------------------------------ revno: 3228 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-04 15:45:02 +0000 message: Added mutex to the checkpoint_routine. ------------------------------------------------------------ revno: 3227 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-03 16:56:11 +0000 message: Implemented periodic checkpoint if parallel slave is enabled. ------------------------------------------------------------ revno: 3226 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-03 10:15:45 +0000 message: Fixed commit_positions() and removed unnecessary checkpoint thread. ------------------------------------------------------------ revno: 3225 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-02 20:13:12 +0200 message: manual merge to wl#5569 tree ------------------------------------------------------------ revno: 3224 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-02 19:46:46 +0200 message: wl#5569 MTS User interface related: set @@global.slave_parallel_workers= `non-zero` following with `START SLAVE` starts slave with so many Worker threads. That is non-zero value is defacto the slave parallel execution mode. Earlier introduced enum enum_slave_exec_mode SLAVE_EXEC_MODE_PARALLEL is withdrawn. Fixes rli->mts_pending_jobs_size statistics which might cause assert-crash otherwise. a silly c&p mistake of relay-log name change notification. Made a little clean-up including relocation of init-ion of workers related stuff into start_slave_workers(). Many changes in tests due to SLAVE_EXEC_MODE_PARALLEL and not only. ------------------------------------------------------------ revno: 3223 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-01 19:08:21 +0200 message: wl#5569 MTS The limit conditions such as WQ len, total WQ:s size related changes. Also a new test file is added. ------------------------------------------------------------ revno: 3222 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-30 16:39:40 +0200 message: merging from from wl#5569 repo containing wl#5599 integration ------------------------------------------------------------ revno: 3221 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-30 16:02:15 +0200 message: wl#5569 MTS Fixing group_relay_log_name change propagation from C to W; Garbage collection in the Partition-to-Worker hash is added with a parameter of how many records in the hash are tolerated w/o checking of the usage counter. Adding C-W synchronization due to: - overall WQ:s data max - hitting the limit of a WQ length Adding Flow Control infrastructure with - level of the hungry Worker forcing Coordinator to distribute eagerly symmetrically a Worker whose load is more than 100 % - hungry level is considered as fed-up. - nap time for C in case all WQ:s lengths are above the level. - a weight param to the base nap as a function of the number of fed-up W:s. TODO: UNTIL to force sequential exec; To fix ROWS_QUERY_LOG_EVENT corner case; to fix commented out // if (!ev) delete ev; after wl#5599 is merged (ev->update_pos() is done). ------------------------------------------------------------ revno: 3220 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-11-27 17:36:50 +0200 message: wl#5569 Providing relay-log name for wl#5599. Protocol of action on the C and W sides is described in rpl_rli_pdb.h. Erroring out in case of parallel exec and ROWS_QUERY_LOG_EVENT. (todo: the native sequential mode for the event needs some revision, in particular `delete ev' shall happen *always* in rli->cleanup_context not in two places as of current). ------------------------------------------------------------ revno: 3219 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-26 23:08:30 +0200 message: wl#5569 MTS Partitioning conflict detection and handling is implemented. A new option to run Query in parallel though incompatibly with Rows- case in that the default db not the actual db:s are used as the partition key. User interface gained the global var and the cmd line opt: slave_run_query_in_parallel (Welcome to the set! :-) ------------------------------------------------------------ revno: 3218 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-andrei timestamp: Fri 2010-11-26 16:15:37 +0000 message: There was a mismatching between the number of fields read and write and by consequence the read was failing for the Slave_worker. ------------------------------------------------------------ revno: 3217 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-25 11:03:54 +0200 message: wl#5569 merging with wl#5599 piece of code ------------------------------------------------------------ revno: 3216 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-25 10:47:39 +0200 message: wl#5569 Converting the prototype time db2w hash to be concurrent; Necessary inruduction of the least occupied Worker notion. It's currently computed as Worker having the least number of distributed partitions. Adding parallel support for Query_log_event; caution: 1. the session/default not the actual db as the key 2. may not have been tested against all use cases (e.g int vars) Fixing slave stop issues. ------------------------------------------------------------ revno: 3215 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-11-22 20:57:13 +0200 message: wl#5569 extinding futher interfaces to wl#5599 with propagating future_event_relay_log_pos to the Worker exec context. ------------------------------------------------------------ revno: 3214 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-11-20 19:23:42 +0200 message: wl#5569 MTS Worker pool start, stop, kills, error out implementation. ------------------------------------------------------------ revno: 3213 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-19 16:51:58 +0200 message: wl#5569 recovery interfaces for wl#5599 implementation. The essence of this patch is to provide GAQ object implimentation and valid life cycle. The checkpoint handler prior to call store methods of wl#5599 is supposed to invoke rli->gaq->move_queue_head(&rli->workers). See a simulation of that near ev->update_pos() of the mail sql thread loop. The checkpoint info is composed as instance of Slave_job_group to reside as rli->gap->lwm. Todo: uncomment + // delete ev; // after ev->update_pos() event is garbage once the real checkpoint has been done. Todo: the real implemention needs to take care of filing Slave_job_group::update_current_binlog as initially so at time of executing Rotate/FD methods. + // experimental checkpoint per each scheduling attempt + // logics of next_event() + + rli->gaq->move_queue_head(&rli->workers); ------------------------------------------------------------ revno: 3212 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-18 16:50:54 +0200 message: wl#5569 wl#5599 Recovery related. Prototyping the worker RLI instantiation, to be elaborated on. ------------------------------------------------------------ revno: 3211 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-18 16:00:52 +0200 message: wl#5569 MTS Extending the wl#5563 prototype gradually. This commit addresses: 1. recovery interface (a new Worker rli plus rli->gaq and pseudo-code for checkpoint to update GAQ and the central RLI recovery table. Wrt rli, C and W execute do_apply_event(c_rli) where c_rli is the central instance. C executes update_pos(c_rli), but W update_pos(w_rli). others: - decreased processing time for rpl_parallel, serial. ------------------------------------------------------------ revno: 3210 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2010-11-14 11:55:32 +0000 message: Post-push fix for WL#5599. ------------------------------------------------------------ revno: 3209 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-12 17:58:12 +0000 message: Post-push fix for WL#5599. ------------------------------------------------------------ revno: 3208 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-11 11:53:01 +0000 message: WL#5599 The patch changed the handler's functions, i.e. init_info, check_info, flush_info, remove_info and end_info and the related private member functions, in both file and table handlers, to accept an index that identifies the information that will be read or written. This is necessary now because the handlers will be used by the workers to read and write information from file(s) and table and there may be several workers running at the same time and thus an index is used to identify the worker that is accessing information. This change is also necessary for the multi-master replication as information from each master must be uniquely identified. ------------------------------------------------------------ revno: 3207 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-11-10 10:57:13 +0000 message: Refactory to start work on WL#5599. ------------------------------------------------------------ revno: 3206 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 13:34:18 +0000 message: Removed mysql-test/collections/mysql-next-mr.crash-safe.* in the WL#5569. ------------------------------------------------------------ revno: 3205 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 13:04:14 +0000 message: merge mysql-next-mr.crash-safe --> mysql-next-mr-wl5569 Conflicts: . sql/CMakeLists.txt . sql/Makefile.am . sql/sql_class.h . sql/rpl_slave.cc ------------------------------------------------------------ revno: 3204 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 11:39:37 +0000 message: merge mysql-next-mr-wl5563-labs --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3203 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 23:33:37 +0300 message: wl#5563 simplifying memory handling for the Coor-Workers transport to avoid sporadic crashes ------------------------------------------------------------ revno: 3202 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 21:19:56 +0300 message: wl#5563 leaving out a fine garbage collection. That task is unnessary to solve at prototyping time. Update-pos routine to be implemented is going to eliminated that piece of code ------------------------------------------------------------ revno: 3201 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 20:38:35 +0300 message: wl#5563 Extending the tests base to split the former rpl_parallel into two to run in serial exec mode as well. ------------------------------------------------------------ revno: 3200 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 11:49:00 +0300 message: wl#5563 improved test; fixed a delete issue that was used to crash; added @@global.slave_local_timestamp to fill in timestamp col slave clock value. Performance growth can be seen through the test. todo: merge with Alfranio work on hashing and dyn alloc of PFS obj:s. ------------------------------------------------------------ revno: 3199 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Wed 2010-09-15 14:51:49 +0300 message: wl#5563 tests for the wl. Number of workers and iterations can be tuned. todo: convert as param:s to pass to the test through mtr ------------------------------------------------------------ revno: 3198 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Mon 2010-09-13 18:22:41 +0300 message: wl#5563 adding an ingeneous no-stress-attempting-yet test that also fired an assert. Refined the Worker instance ref computing because cleanup_context() is executed by the sql-thread the coordinator as well ------------------------------------------------------------ revno: 3197 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Mon 2010-09-13 13:15:38 +0300 message: wl#5563 Rows-event parallelization basically is implemented although tested shallowly. Write access to rli central stuct by workers may not be eliminated fully at this phase. E.g that relates to errors. todo: to prove rli gets out of Worker scope todo: to provide a stress test ------------------------------------------------------------ revno: 3196 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Sat 2010-09-11 17:00:08 +0300 message: wl#5563 adding Rows-event limitted to one Worker support. Deferred deletion did not check emptyness of the list ------------------------------------------------------------ revno: 3195 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-10 20:36:07 +0300 message: wl#5563 correcting comments to indicate less limitations ------------------------------------------------------------ revno: 3194 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-10 20:32:39 +0300 message: WL#5563 Prototype for Slave parallelized by db name More progress to the WL in that the STMT binlog-format works while the conceptual limits are held. That is no query/transaction is allowed to deal with more than one db. Addressed a complication in that update pos method that is run by Coordinator belongs to Log_event hierarchy and therefore the event deletion now by Worker must be careful. Todo: 1. (High prior) fix Row-format complications 2. (Hight prior) Elaborate on the hash function to be a function on db text name 3. (Optional) Consider moving update_pos to the RLI class to get rid of the delete logics complication. How-to-use: The instuction can be found in comments of the previous commit, see there for more details. In brief though, the db names have to follow a pattern: `test[0-9]'. E.g test0, test1, test2, test3 for the default four Worker threads. Slave side has to set @@global.slave_exec_mode=PARALLEL; before START SLAVE. ------------------------------------------------------------ revno: 3193 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Thu 2010-09-09 21:43:16 +0300 message: WL#5563 Prototype for Slave parallelized by db name This is an intermediate commit that indicates some progress. Namely, the worker pool operates correctly and with signs of scalable performance. How to test: connection master; set @@global.binlog_format=statement; connection slave; set @@global.slave_exec_mode=PARALLEL; set @@global.binlog_format=mixed; show processlist; => IO, SQL threads + 4 workers by default change master to ... connection master; create database test0; create database test1; create database test2; create database test3; # create databases with magic names "test[0-9]+", where the number will index # a worker. create database test0; create database test1; create database test2; create database test3; # create tables. they are only of MyISAM type for now use test0; create table tm_1(a int, b int) engine=myisam; use ... # DML on tables: use test[0-3]; insert into tm_1 values (1,0); ... ... connection slave; # monitor CPU (visually this time: top etc) # check correctness e.g select count(*) from test[0-3].tm_1; connection master; select count(*) from test[0-3].tm_1; ****** wl#5569 MTS merging a compined bundle cset to mysql-trunk.
Andrei Elkin authoredHere is the total cset combining all revisions done since Sep 2010. Comments from the original commits are pasted in reverse chronological order. ------------------------------------------------------------ revno: 3364 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 17:09:22 +0300 message: wl#5569 MTS Refining rpl_rotate_logs that could not produce deterministic output. The list of binlogs contained one binlog more than expected. @ mysql-test/suite/rpl/r/rpl_rotate_logs.result results updated. @ mysql-test/suite/rpl/t/rpl_rotate_logs.test Refining a method of verification of the binlog rotation due to its max size: we check if the first log has been rotated by comparing its name before and after feeding load to the master. Notice, that as the former so the new current proof methods are not perfect as that part of the test really needs to demostrate every binlog file is less than @@max_binlog_size. ------------------------------------------------------------ revno: 3363 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 14:56:01 +0300 message: updating result files that were left incorrect by the last merge. ------------------------------------------------------------ revno: 3362 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 14:44:59 +0300 message: wl#5569 MTS Failure in recovery when binlog-checksum is active. The reason of the failure was in that parsing of relay log by MTS recovery gaps computing did not make sure to use the relay-log own FormatDescriptor events that contain checksumming info for all events in the log. Fixed with taking care to find out the checksum algorithm for every relay log as the first step of MTS recovery gaps computing. @ mysql-test/suite/rpl/t/rpl_mixed_mts_rec_crash_safe_checksum-master.opt forcing master to checksum. @ mysql-test/suite/rpl/t/rpl_mixed_mts_rec_crash_safe_checksum-slave.opt forcing slave to *not* checksum. @ mysql-test/suite/rpl/t/rpl_mixed_mts_rec_crash_safe_checksum.test same as rpl_mixed_mts_rec_crash_safe but to run in master with checksum and slave without own checksum. The test verifies that checksum does not affect recovery. Lack of own checksumming on slave allows to test more scenarios. @ sql/rpl_slave.cc Search for the checksum algorithm FD is added. Notice that the first three events to read is enough to find out the master side checksum alg. ------------------------------------------------------------ revno: 3361 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-08-17 11:21:23 +0300 message: merge from trunk forced to resolve few semantical conflicts caused by changes in THD::enter_cond() of the trunk. ------------------------------------------------------------ revno: 3360 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-27 08:56:14 +0100 message: Fixed failure in test rpl_mts_check_concurrency when running in the mts collection. ------------------------------------------------------------ revno: 3359 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-26 19:46:41 +0100 message: Added a test case that checks if MTS allows to concurrently access the replication tables, and as such, concurrently commit transactions that update different databases. ------------------------------------------------------------ revno: 3358 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 20:08:43 +0100 message: Configured rpl_parallel_switch_sequential to run in row and mixed mode to avoid cluttering the error log with messages on unsafe execution. ------------------------------------------------------------ revno: 3357 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 19:02:14 +0100 message: This patch contains the following fixes: . Removed suppressed warning introduced in the wrong test case (i.e. rpl_corruption) and put it in the correct one (i.e. rpl_row_corruption). . Introduced variable to avoid clutering the error log with several warning messages on unsafe execution. ------------------------------------------------------------ revno: 3356 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 11:01:12 +0100 message: This patch has the following changes: . Specific directories were created for the MTS runs in the default.push. . Warning message was suppressed in the rpl_corruption.test. . Annoying debug outputs were removed from the error log. However, this is a temporary solution as it forbids to enable traces. ------------------------------------------------------------ revno: 3355 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-20 11:56:40 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3354 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-19 22:26:30 +0300 message: wl#5569 MTS valgrind reported a stack on rpl_savepoint. The problem appears to be in that at computing slave_sql_running_state in show_mater_info() the sql thread proc_info pointer could refer to a value in a stack that has already gone. Fixed with making proc_info to point to a string literal. ------------------------------------------------------------ revno: 3353 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-19 17:46:43 +0100 message: Suppressed warning messages that could potentially cause problems while running mts crash safe test cases. ------------------------------------------------------------ revno: 3352 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-18 21:46:45 +0300 message: wl#5569 MTS Cosmetic changes are done to address readability and clearness of source code of the MTS patch. @ sql/binlog.cc Comments improved. @ sql/log_event.cc Warning text is improved. @ sql/log_event.h More comments are added. @ sql/rpl_rli.h More comments are added. @ sql/rpl_slave.cc Error constant was changed. @ sql/share/errmsg-utf8.txt Error constant is changed. ------------------------------------------------------------ revno: 3351 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-18 14:52:44 +0300 message: wl#5569 MTS Inadvertently introduced hunk two rev:s back is reverted to please rpl_*_mts_crash_safe. ------------------------------------------------------------ revno: 3350 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-17 00:51:45 +0300 message: wl#5569 MTS fixing build issue for embedded. Public visibility for Rows_log_event::do_apply_event() is restored. ------------------------------------------------------------ revno: 3349 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-16 20:08:31 +0300 message: wl#5569 MTS The patch contains improvements after code review. Changes are mostly consmetic. @ mysql-test/suite/rpl/r/rpl_parallel_start_stop.result results updated. @ sql/binlog.cc correcting comments. @ sql/field.cc renaming. @ sql/log_event.cc renaming and separating out a block of code in Log_event::get_slave_worker() into a new method of Slave_job_group class; some cleanup. @ sql/log_event.h Extending and improving comments; renaming to follow is_, get_, set_ pattern; restoring the private access to do_apply_event() in Rows_log_event. @ sql/mysqld.cc removing extra declaration. @ sql/rpl_info_factory.cc Minor comments is added. @ sql/rpl_rli.cc renaming to make _cnt suffix to all entities that have counter meaning in mts; improving comments. @ sql/rpl_rli.h Renaming, and improving comments for the new members to Relay_log_info. @ sql/rpl_rli_pdb.cc remaning. @ sql/rpl_rli_pdb.h Improving comments readability through adding legengs defining MTS specific abbreviations. @ sql/rpl_slave.cc Renaming; minor cleanup in sql_slave_killed(); adding comments on Seconds_behind_master update policy with MTS. @ sql/share/errmsg-utf8.txt Improving text of few errors. ------------------------------------------------------------ revno: 3348 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-16 02:11:11 +0300 message: bug#12755663 MTS: RPL_CIRCULAR_FOR_4_HOSTS FAILS: CANT EXECUTE THE CURRENT EVENT GROUP MTS stopped with an error in the middle of the test. The reason is that a group of events from the slave itself was processed partly to modify the group position. In the following restart the wrong group bondary made slave either to error out or assert. Fixed with locating a possible race condition allowin Coordinator to ignore actual failed status of a Worker. So in the case of the test, the slave server group can't be started. Notice, this is a trial patch since I can't catch the failure on available to me hosts at all. @ sql/rpl_rli_pdb.cc Changing the running status of the Worker before it releases assigned entries. That ensure that the waiting in wait_for_workers_to_finish() Coordinator exits the function with a negative result and therefore stops without attempting to apply an event due to which it attempted synchronization. Couple of diagnostics into error log are added. They may be removed in short while but currently might be helpful to provide details if the failure won't disappear after this push. ------------------------------------------------------------ revno: 3347 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-07-14 12:40:06 +0300 message: WL#5569 MTS further extensive rpl_circular_for_4_hosts exersices with --repeat 10 --parallel=8 revealed a race condition in that Coordinator might miss to catch not-running status for a Worker. That made Coordinator to skip only a part of a group of the slave server own events so the slave stops at not the bondary of a group. Fixed with moving marking of the errored-out Worker as failed prior to its APH entries release. TODO: notice there can be a possibility to stop at not the boundary due to graceful STOP SLAVE if one is run at time of skipping self-originated events. However this issue belongs to STS and might be similar with BUG@12604951 and BUG@12728160. @ mysql-test/suite/rpl/r/rpl_circular_for_4_hosts.result results are updated. @ mysql-test/suite/rpl/t/rpl_circular_for_4_hosts.test tests is updated with a new text of a suppression. @ sql/log_event.cc Adding clarifying text to an error message when parallel execution fails. @ sql/rpl_rli_pdb.cc Moving marking of the errored-out Worker as failed prior to its APH entries release. That ensures Coordinator always finds the non-running status in a case it has to know that (wait_for_workers_to_finish()). @ sql/share/errmsg-utf8.txt Adding a format specifier for a clarifying text. ------------------------------------------------------------ revno: 3346 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-07-14 08:03:55 +0100 message: Post-push fixes for WL#5569 Injecting faults while updating a myisam table requires to flush the changes before committing suicide. So we have introduced the follwing code: DBUG_EXECUTE_IF("crash_after_commit_and_update_pos", - DBUG_SUICIDE();); + sql_print_information("Crashing crash_after_commit_and_update_pos."); + flush_info(TRUE); + DBUG_SUICIDE(); Besides we improved some comments. ------------------------------------------------------------ revno: 3345 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-13 16:23:57 +0100 message: WL#5569 @ mysql-test/extra/rpl_tests/rpl_mts_crash_safe.inc Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. @ mysql-test/include/not_slave_worker_info_table.inc Removed this feature as option --slave-worker-info-repository was removed too. @ mysql-test/suite/rpl/t/rpl_mixed_mts_crash_safe-slave.opt Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. @ mysql-test/suite/rpl/t/rpl_mixed_mts_rec_crash_safe-slave.opt Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. @ mysql-test/suite/rpl/t/rpl_row_crash_safe-slave.opt Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. @ mysql-test/suite/rpl/t/rpl_row_mts_crash_safe-slave.opt Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. @ mysql-test/suite/rpl/t/rpl_row_mts_rec_crash_safe-slave.opt Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. @ mysql-test/suite/rpl/t/rpl_stm_mixed_crash_safe-slave.opt Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. @ mysql-test/suite/rpl/t/rpl_stm_mts_crash_safe-slave.opt Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. @ mysql-test/suite/rpl/t/rpl_stm_mts_rec_crash_safe-slave.opt Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. @ mysql-test/suite/sys_vars/t/slave_worker_info_repository_basic.test Removed this test case as option --slave-worker-info-repository was removed too. @ sql/binlog.cc Improved code as requested by reviewers. @ sql/lock.cc Removed mistake that got into sql/lock.cc after merging with trunk. @ sql/log_event.cc Introduced parameter force in commit_positions function to determine if flush must be executed regardless of sync options. @ sql/rpl_info.h Updated doxygen comments and removed a change to avoid conflicts when merging with trunk. @ sql/rpl_info_factory.h Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. @ sql/rpl_rli.cc Introduced parameter force in commit_positions function to determine if flush must be executed regardless of sync options. @ sql/rpl_rli_pdb.cc Improved the code and introduced parameter force in commit_positions function to determine if flush must be executed regardless of sync options. @ sql/rpl_rli_pdb.h Introduced parameter force in commit_positions function to determine if flush must be executed regardless of sync options. @ sql/rpl_slave.cc Removed duplicated code. @ sql/sql_parse.cc Reintroduced flag removed by mistake when merging with trunk. See also sql/lock.cc. @ sql/sys_vars.cc Removed option --slave-worker-info-repository as workers repositories are defined according to --relay-log-info-repository. ------------------------------------------------------------ revno: 3344 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-13 00:10:43 +0300 message: wl#5569 MTS merge trunk -> wl5569-tree ------------------------------------------------------------ revno: 3343 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-12 23:36:17 +0300 message: wl#5569 MTS adding suppression due to expected warning to rpl_circurlar_for_4_hosts; decreasing a loop limit in rpl_parallel_switch_sequential in case of statement format. ------------------------------------------------------------ revno: 3342 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-12 14:46:23 +0300 message: WL#5569 MTS Fixing code and test due to rpl.rpl_circular_for_4_hosts mismatch failure, like http://pb2.norway.sun.com/?action=archive_download&archive_id=3608382. The reason of the mismatch was that when having two group of events to execute, the first for a Worker and the 2nd for Coordinator, Coordinator waited for the 1st group completion but did not verify success of synchronization. So in a case of the failed applying of the 1st group processing of the 2nd could find an inconsistent state to end up with a segfault (even though only the mismatch has been seen so far). @ mysql-test/suite/rpl/r/rpl_circular_for_4_hosts.result results are updated. @ mysql-test/suite/rpl/t/rpl_circular_for_4_hosts.test Test is updated to include a part specific to MTS. While all former conditions hold, the new section makes sure B server has two group of events to send which was not previously guaraneed nor necessary. Further, when the first of the two fails with Duplicate entry at applying of the 2nd Coordinator senses the first failure and gives out the 2nd. The first error remains to be seen in show-slave-status. @ sql/log_event.cc Checking wait_for_workers_to_finish() return code in case Coordinator executes a sequential-mode event. Comments are deployed in few other places where that is unnecessary to do. @ sql/rpl_rli_pdb.cc Worker marks itself as failed to apply which fact is reported to Coordinator also through wait_for_workers_to_finish(). Coodinator shall check the error code in a branch of a sequential event applying. @ sql/rpl_rli_pdb.h Adding a new state that Worker sets itself to indiate its failure to apply. @ sql/rpl_slave.cc Refining an assert as consequence of the new state and its actual setting by Worker. ------------------------------------------------------------ revno: 3341 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-10 22:40:01 +0100 message: Avoiding busy waiting when running mts recovery tests. ------------------------------------------------------------ revno: 3340 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-09 23:11:58 +0100 message: Removed --slave-checkpoint-period from MTS test cases. ------------------------------------------------------------ revno: 3339 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-09 23:08:07 +0100 message: Improved test cases for the WL#5569. ------------------------------------------------------------ revno: 3338 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 22:40:52 +0300 message: wl#5569 MTS The patch refines logics of applying phase of MTS-recovery to always applying events that are for Coordinator; fixes few tests to make them passable on PB; makes GAQ size to be of checkpoint_group value. @ mysql-test/suite/rpl/t/rpl_parallel_switch_sequential.test attempting to decrease execution time that currently might be too much for some PB hosts. @ mysql-test/suite/rpl/t/rpl_row_crash_safe-slave.opt Making the test to run in parallel mode with Workers having the table as their info storage. @ mysql-test/suite/sys_vars/r/slave_checkpoint_period_basic.result results updated. @ mysql-test/suite/sys_vars/t/slave_checkpoint_period_basic.test masking out the actual value of slave_checkpoint_period. @ sql/log_event.cc Never skip events that are for Coordinator as indicated by mts_execution_mode(). @ sql/rpl_rli.h Improving comments. @ sql/rpl_slave.cc Simplifying while condition of the GAQ-progress loop and deploying an assert ensuring checkpoint_group parameter and GAQ state are combined correctly. ------------------------------------------------------------ revno: 3337 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 07:54:34 +0100 message: Reduced the timeout period to run the checkpoint routine by setting slave-checkpoint-period to 30. ------------------------------------------------------------ revno: 3336 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 07:44:35 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3335 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-06 12:46:05 +0300 message: wl#5569 MTS refining wait for db-hash entry release at event distribution. The graceful STOP is not accepted at this point so Coordinator continues to stay in a loop. ------------------------------------------------------------ revno: 3334 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-05 20:43:04 +0300 message: bug#12719875 possible MTS recovery issue. MTS stopped with an error after failing to apply an event. It turned out that the event was sceduled incorrectly due to earlier stop by Single-Threaded Slave not at the group boundary but rather in the middle of it. Fixed with forcing CREATE..SELECT be logged as two groups. The CREATE-TABLE group is surrounded with its own BEGIN/COMMIT braces. @ mysql-test/suite/rpl/r/rpl_parallel_switch_sequential.result new results file is added. @ mysql-test/suite/rpl/t/rpl_parallel_switch_sequential-slave.opt transaction retry is not supported yet by MTS. @ mysql-test/suite/rpl/t/rpl_parallel_switch_sequential.test Regression test for bug#12719875 is added. Notice, created tables engine is Innodb also because with MyISAM stop-slave can be actually in the middle a group of myisam table events so the following restart fails with a dup key error. CREATE-SELECT is not tested according to another bug as commented. @ sql/log_event.cc changing error report style to be actually effective: rli->report() does not make rli->info_thd to return from is_error() true. my_error() message eventually gets to the show-slave-status sql-error at the end of slave sql thread. @ sql/rpl_slave.cc fixing a possible hanging that can happen due to errored-out worker at time of gaq is full and the worker was the first to update it; refining asserts; shifting stop_workers() routine to a point where slave sql has not reset its errors which pleases a refined assert in slave_stop_workers(rli). ------------------------------------------------------------ revno: 3333 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-04 18:14:09 +0300 message: wl#5569 MTS Adding a rule to run PB with all suites in MTS with binlog-format ROW. @ .bzr-mysql/default.conf restoring commits@. @ mysql-test/collections/default.push adding a rule to run all suites in MTS with binlog-format ROW. ------------------------------------------------------------ revno: 3332 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-03 23:29:34 +0300 message: wl5569 MTS cleanup in one file. @ sql/rpl_rli.cc removing traces of a mutex that was served in prototyping support for temporary tables. ------------------------------------------------------------ revno: 3331 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-03 23:16:02 +0300 message: wl5569 MTS bzr commit mail address changed; a minor cleanup to make mts_is_worker() with const argument; releasing a test to run in MTS. ------------------------------------------------------------ revno: 3330 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-02 08:58:56 +0100 message: Fixed use of the performance schema in the replication code and concurrency issue in the IO Thread. In particular, the IO Thread was calling flush_master_info without grabbing locks. ------------------------------------------------------------ revno: 3329 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 16:41:35 +0300 message: wl5569 MTS merging from the main repo. ------------------------------------------------------------ revno: 3328 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 15:48:25 +0300 message: wl#5569 MTS the final cleanup patch. There are few glitches that were considered as tolerable at least for time of the total wl's code is being reviewed. That includes: - no support to old load-data events - no support for FK to add to the list, there are few places in the patch that suggests to deploy error branches each time flush_info() is called. @ sql/log_event.h cleanup. @ sql/rpl_reporting.cc introducing a new public method in order to be callable from Slave_worker executed code. @ sql/rpl_reporting.h the earlier do_report is renamed and a new do_report() is made a way to allow child classes to redefine its own way. The child class is suppose to call child->report() and to have child::do_report() 's designed behaviour. @ sql/rpl_rli_pdb.cc addressing an OOM issue at delete of curr_group_exec_parts. @ sql/rpl_rli_pdb.h deploying do_`method' pattern. @ sql/rpl_slave.cc cleanup. ------------------------------------------------------------ revno: 3327 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 13:16:52 +0300 message: wl#5569 MTS The patch cleans up some host of code. @ sql/log_event.cc cleanup, comments improved, logics of decision in Log_event::apply_event on mts execution mode is simplified. Moving flush_info() of Rotate_log_event::do_update_pos() into inc_group_relay_log_pos(). @ sql/log_event.h cleanup and merging logics of former mts_async_exec_by_coordinator() with mts_sequential_exec() which is turned to be called from a new mts_execution_mode(). Reducing visibility of mts members of Log_event hierarchy to match the needs. @ sql/rpl_rli.cc cleanup, renames and moving flush_info() inside inc_group_relay_log_pos(). @ sql/rpl_rli.h Cleanup and comments improved. @ sql/rpl_rli_pdb.cc Cleanup; renames; comments; a new Slave_worker::init_worker() is defined to be called at starting the Worker pool per each worker. Its initialization instructions are migrated from from slave_start_single_worker(). @ sql/rpl_rli_pdb.h Cleanup and comments improved. @ sql/rpl_slave.cc cleanup; replacing collection of initializations for a Worker in slave_start_single_worker() into a new Worker::init_worker(). @ sql/sql_class.h cleanup. ------------------------------------------------------------ revno: 3326 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-28 11:30:18 +0300 message: wl#5569 MTS replacing views with regular tables for consistency verification in rpl_parallel_innodb. Also a minor cleanup in rpl_parallel is done. ------------------------------------------------------------ revno: 3325 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-27 20:31:45 +0300 message: wl#5569 MTS Cleanup and addressing sporadic rpl_temp_table_mix_row failure in post-execution mtr.check_testcase(). The check of the test failure was caused by faulty optimization in avoiding to migrate temporary tables from Coordinator to Workers in case of rows-event assignement. while it's correct with the homogenous rows-event only load, the mixture can fail. Fixed with removing the optimization so map_db_to_worker() always relocates which is somewhat suboptimal and should be improved in future. @ mysql-test/suite/rpl/t/rpl_temp_table_mix_row.test Adding slave synchronization. @ sql/log_event.cc cleanup to move circular_buffer releated definitions into rpl_rli_pdb that is specialized on objects dealing with Worker, its assignement etc. improving comments; also instead of former separate flag indicating a T-event requires post-scheduling synchronization with the Worker is turned into a bit of existing Log_event::flags which also avoids ungliness of #if/#endif:s. @ sql/log_event.h instead of former separate flag indicating a T-event requires post-scheduling synchronization with the Worker is turned into a bit of existing Log_event::flags; @ sql/rpl_rli.cc cleanup: renaming. @ sql/rpl_rli.h cleanup: renaming, more comments. The former mts_wqs_overrun is converted into two: the statistics parameter mts_wq_overrun_cnt and the internal control parameter mts_wq_excess. @ sql/rpl_rli_pdb.cc Included rpl_slave.h that holds two necessary declarations; Cleanup: accepting circular_buffer related definitions migrated from log_event, improved comments, renaming, removing dead code @ sql/rpl_rli_pdb.h Cleanup: renaming and more comments are added. @ sql/rpl_slave.cc Augmenting print-out of statistics at the end of MTS session; cleanup: renaming. @ sql/rpl_slave.h Introducing two constants to define range of worker_id domain and a magic value of undefined worker. @ sql/sys_vars.cc replacing a literal int value with a symbilic constant. ------------------------------------------------------------ revno: 3324 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-27 13:12:52 +0100 message: Ensured that updates to the worker_info_repository are transactional and fixed the slave_checkpoint_group_basic test case. ------------------------------------------------------------ revno: 3323 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-26 13:02:59 +0100 message: Fixed test case. ------------------------------------------------------------ revno: 3322 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-06-25 15:14:24 +0100 message: Introduced test case for recovery with MTS and fixed bugs in recovery. ------------------------------------------------------------ revno: 3321 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 15:38:19 +0300 message: wl#5569 MTS This patch makes a bit of cleanup, addresses one memory-allocation todo and completes fixing valgrind report (rpl_parallel_start_stop) due to strings allocation in Slave_job_group items. ------------------------------------------------------------ revno: 3320 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 12:38:34 +0300 message: wl#5569 MTS this patch completes the previous one to fixes a result file and make the innodb specific test verification to base on tables not views. ------------------------------------------------------------ revno: 3319 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 00:11:22 +0300 message: wl#5569 MTS this is an exploratory patch to sort out if verification method what was based on views has its own not related to mts flaw. The patch calls verification macro on the tables that required some adjustment. ------------------------------------------------------------ revno: 3318 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-23 07:56:15 +0300 message: wl#5569 MTS fixing results of mysqld--help-win. ------------------------------------------------------------ revno: 3317 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 19:20:40 +0100 message: merge mysql-next-mr-wl5569 (local) --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3316 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 19:17:43 +0100 message: In some platforms, such as Windows, thread's wait time is stored in 100ns units. However, when computing the difference between two values, the result value was not multiplied by 100. Besides, there was a casting problem when the aforementioned result value was assigned to an ulong. ------------------------------------------------------------ revno: 3315 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 18:54:23 +0100 message: Fixed how mts copes with recovery. ------------------------------------------------------------ revno: 3314 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-21 19:10:54 +0300 message: wl#5569 MTS Fixing valgrind warnings. @ sql/log_event.cc w->running_status is verfied to find out the actually sought running status of a Worker. THD can be unavainlable that's what a valgrind report was about. @ sql/rpl_rli_pdb.cc commenting out an assert that valgrind does not like. @ sql/rpl_rli_pdb.h new method is added to be invoked at MTS shutdown. @ sql/rpl_slave.cc Invoking gaq cleanup at the end of MTS session. ------------------------------------------------------------ revno: 3313 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-21 18:15:43 +0300 message: wl#5569 MTS rpl_parallel_start_stop.test could fail sporadicaly with timeout. @ mysql-test/include/wait_for_slave_param.inc Correcting comments and handling of passed by caller $slave_timeout to make sure the unit of 1 second really holds. Introduced symbolic default_timeout, sleep_freq(uency) to procude time to sleep in between of two polls. @ mysql-test/suite/rpl/t/rpl_parallel_start_stop.test Since the default time to wait is less than one for innodb's wait for lock, the time to wait for error is set explicitly. ------------------------------------------------------------ revno: 3312 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 23:21:56 +0100 message: merge mysql-next-mr-wl5569 (local) --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3311 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 23:19:06 +0100 message: Fixed error when computing the Lower-Water-Mark. If two or more jobs were removed from the Group of assigned jobs and one of the jobs had a non-empty group relay log but the last one had an empty group relay log. The Lower-Water-Mark was not correctly updated, because the algorithm assumed that the group relay log was null. ------------------------------------------------------------ revno: 3310 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 11:52:44 +0100 message: Fixed valgrind errors. Slave_job_group was silently being cast to LOG_POS_COORD while calling sort_dynamic(&above_lwm_jobs, (qsort_cmp) mts_event_coord_cmp) and by consequence mts_event_coord_cmp(LOG_POS_COORD *, LOG_POS_COORD *). This had two problems: . The first two entries in the Slave_job_group were not a pointer to a char * and my_offset. . Even if the first two entries were char * and my_offset, such casting could lead to alignment problems. To fix the problem, we avoid this casting. ------------------------------------------------------------ revno: 3309 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 19:14:50 +0300 message: wl#5569 MTS fixing slave_transaction_retries_basic_64.result ------------------------------------------------------------ revno: 3308 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 16:11:25 +0300 message: wl#5569 MTS fixing tests. @ mysql-test/extra/rpl_tests/rpl_extra_col_master.test MTS-supperssion is necessary because the test is supposed to stop slave due to an error. @ mysql-test/extra/rpl_tests/rpl_relayrotate.test Load decreasing to prove a warning was caused by slow environment so waiting to accept the killed status by SQL thread was ended by 1 min timeout. @ mysql-test/suite/rpl/r/rpl_relayrotate.result results updated. @ mysql-test/suite/rpl/t/rpl_stm_000001.test A macro is expanded in order to isolate which branch of two activities a suffered timeout failure belongs in. @ mysql-test/suite/sys_vars/r/slave_transaction_retries_basic_64.result Fixing results of 64 version of the test that was editted in the prev push. ------------------------------------------------------------ revno: 3307 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 12:33:36 +0300 message: wl#5569 MTS Fixing rpl.rpl_mixed_binlog_max_cache_size that revealed incorrect asynchronous handling of a Rotate event which does not split the current group and therefore has to be executed after all previously scheduled events. Fixing sensetivity of two other tests to mtr's invocation environment that includes inital values of slave_parallel_workers and slave_transaction_retries. @ mysql-test/suite/sys_vars/inc/slave_transaction_retries_basic.inc made test insensetive to the value of slave_transaction_retries in mtr env. @ mysql-test/suite/sys_vars/r/slave_parallel_workers_basic.result made test insensetive to the value of slave_parallel_workers in mtr env. @ mysql-test/suite/sys_vars/r/slave_transaction_retries_basic_32.result made test insensetive to the value of slave_transaction_retries in mtr env. @ mysql-test/suite/sys_vars/t/slave_parallel_workers_basic.test made test insensetive to the value of slave_parallel_workers in mtr env. @ sql/log_event.cc get_slave_worker() passes need_temps argument as FALSE is case of rows-events. Correcting the actual value of `mts_in_group' of mts_async_exec_by_coordinator(). ------------------------------------------------------------ revno: 3306 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 09:04:19 +0100 message: Fixed some windows failures. ------------------------------------------------------------ revno: 3305 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-06-18 19:58:21 +0100 message: Fixed some recovery issues. ------------------------------------------------------------ revno: 3304 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 21:01:58 +0300 message: wl#5569 MTS fixing tests and a segfault at the end of handle_slave_sql() happened after worker initialization failed (e.g rpl_row_log on win). @ mysql-test/extra/rpl_tests/rpl_loaddata.test MTS-suppression is added. @ mysql-test/suite/rpl/r/rpl_loaddata.result MTS-suppression is added. @ mysql-test/suite/rpl/r/rpl_stm_loaddata_concurrent.result MTS-suppression is added. @ mysql-test/suite/sys_vars/t/disabled.def constant nuisanse is disabled in the feature tree. Todo: do not merge it when pushing to the main tree. @ sql/rpl_slave.cc Moved workers initialization after one of the coordinator so that failure in the former routine is handled with a proper state of coordinator. . This fix eliminates segfault at the end of handle_slave_sql() for few tests but does not address the reason of worker initialization failure, like in rpl_row_log on win: 110616 7:37:57 [Note] Info file G:\pb2\test\sb_1-3486364-1308189142.46\mysql-5.6.3-m5-win-x86_64-test\mysql-test\var-rpl-ps_row\4\mysqld.2\data\relay-log.info.0 cannot be accessed (errno 13). Most likely this is a new slave or you are changing the repository type. 110616 7:37:57 [ERROR] G:/pb2/test/sb_1-3486364-1308189142.46/mysql-5.6.3-m5-win-x86_64-test/sql/Debug/mysqld.exe: File 'G:\pb2\test\sb_1-3486364-1308189142.46\mysql-5.6.3-m5-win-x86_64-test\mysql-test\var-rpl-ps_row\4\mysqld.2\data\relay-log.info.0' not found (Errcode: 13) 110616 7:37:57 [ERROR] Failed to create a new info file (file 'G:\pb2\test\sb_1-3486364-1308189142.46\mysql-5.6.3-m5-win-x86_64-test\mysql-test\var-rpl-ps_row\4\mysqld.2\data\relay-log.info.0', errno 13) 110616 7:37:57 [ERROR] Error reading slave worker configuration 110616 7:37:57 [ERROR] Failed during slave worker thread create 110616 7:37:57 [ERROR] Slave SQL: Failed during slave workers initialization, Error_code: 1593 ------------------------------------------------------------ revno: 3303 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 18:34:16 +0300 message: wl#5569 MTS fixing tests. @ mysql-test/extra/rpl_tests/rpl_parallel_benchmark_load.test making aux file names unique to please mtr, pb. @ mysql-test/extra/rpl_tests/rpl_parallel_load_innodb.test making aux file names unique to please mtr, pb. @ mysql-test/suite/rpl/r/rpl_filter_tables_not_exist.result MTS-suppression is added. @ mysql-test/suite/rpl/r/rpl_mixed_binlog_max_cache_size.result MTS-suppression is added. @ mysql-test/suite/rpl/r/rpl_parallel_benchmark.result making aux file names unique to please mtr, pb. @ mysql-test/suite/rpl/r/rpl_parallel_innodb.result making aux file names unique to please mtr, pb. @ mysql-test/suite/rpl/r/rpl_stm_binlog_max_cache_size.result MTS-suppression is added. @ mysql-test/suite/rpl/r/rpl_typeconv.result MTS-suppression is added. @ mysql-test/suite/rpl/t/rpl_filter_tables_not_exist.test MTS-suppression is added. @ mysql-test/suite/rpl/t/rpl_parallel_benchmark-slave.opt cleanup. @ mysql-test/suite/rpl/t/rpl_typeconv.test MTS-suppression is added. @ mysql-test/suite/sys_vars/r/slave_parallel_workers_basic.result results updated. @ sql/sql_class.h Cleanup to remove early debug-related options. @ sql/sys_vars.cc Fixating slave_parallel_workers' max as 1024. Cleanup to remove early debug-related options. ------------------------------------------------------------ revno: 3302 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 14:00:41 +0300 message: wl#5569 MTS fixing rpl_row_basic_3innodb similarly to the previous patch. @ mysql-test/suite/rpl/r/rpl_row_basic_3innodb.result a suppression is added. ------------------------------------------------------------ revno: 3301 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 13:51:59 +0300 message: wl#5569 MTS fixing few tests. 1. Policy is implemented for reacting with a warning in a case of failing worker leaves the total slave state with gaps thereby inconsistent. 2. Two tests that were used to time out due to reset master/slave was disabled in there. @ mysql-test/extra/rpl_tests/rpl_binlog_max_cache_size.test a suppression is added. @ mysql-test/extra/rpl_tests/rpl_row_basic.test a suppression is added. @ mysql-test/suite/rpl/r/rpl_known_bugs_detection.result a suppression is added. @ mysql-test/suite/rpl/r/rpl_row_basic_2myisam.result a suppression is added. @ mysql-test/suite/rpl/r/rpl_row_binlog_max_cache_size.result a suppression is added. @ mysql-test/suite/rpl/r/rpl_row_event_max_size.result a suppression is added. @ mysql-test/suite/rpl/r/rpl_row_idempotency.result a suppression is added. @ mysql-test/suite/rpl/t/rpl_known_bugs_detection.test a suppression is added. @ mysql-test/suite/rpl/t/rpl_parallel_benchmark-slave.opt removing unnecessary options causing test to fail. @ mysql-test/suite/rpl/t/rpl_parallel_benchmark.test removing an erronous assignment. The former disabling of reset was intended for benchmarking w/o binlog on the slave to please master-slave.inc. @ mysql-test/suite/rpl/t/rpl_parallel_innodb-slave.opt removing unnecessary options causing test to fail. @ mysql-test/suite/rpl/t/rpl_parallel_innodb.test removing an erronous assignment. The former disabling of reset was intended for benchmarking w/o binlog on the slave to please master-slave.inc. @ mysql-test/suite/rpl/t/rpl_row_event_max_size.test a suppression is added. @ mysql-test/suite/rpl/t/rpl_row_idempotency.test a suppression is added. @ sql/rpl_slave.cc Downgrading error to warning in a case of Coordinator fails due to a Worker error. Improving messages. Merging two if:s to have just one report(). @ sql/share/errmsg-utf8.txt Improved the text of an error; Added a new error code. ------------------------------------------------------------ revno: 3300 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 02:24:59 +0100 message: Removed unnecessary test cases and augment others in order to test recovery. ------------------------------------------------------------ revno: 3299 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-16 19:46:22 +0300 message: wl#5569 MTS fixing slave_parallel_workers_basic and rpl_stop_middle_group which cant run in MTS ------------------------------------------------------------ revno: 3298 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-16 11:29:53 +0300 message: wl#5569 MTS adding new tests to sys_vars.\ ------------------------------------------------------------ revno: 3297 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:41:32 +0100 message: WL#5569 Adding a global suppression for the warning that may appear when stopping the slave sql thread in the middle of a group. This should affect MTS mode only. ------------------------------------------------------------ revno: 3296 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:40:41 +0100 message: WL#5569 Renames worker-info-repository to slave-worker-info-repository in some tests option files. ------------------------------------------------------------ revno: 3295 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:32:37 +0100 message: WL#5569 More test fixes. Removing remaining prefixes 'mts' from mts variables, which have been renamed recently. ------------------------------------------------------------ revno: 3294 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 00:27:20 +0100 message: WL#5569 Fixing rpl_parallel result file. ------------------------------------------------------------ revno: 3293 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:41:33 +0300 message: wl#5569 MTS correcting --slave-parallel-workers in few more files ------------------------------------------------------------ revno: 3292 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:31:46 +0300 message: wl#5569 MTS correcting --slave-parallel-workers in collections/default.push ------------------------------------------------------------ revno: 3291 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:12:11 +0300 message: wl#5569 MTS Cleanup, including 1. decreasing number and renaming system variables. Important for debugging command line options are replaced with reasonble constant values and only necessary are retained. 2. Small encapsulation in ha_blackhole.cc is done. @ mysql-test/extra/rpl_tests/rpl_parallel_benchmark_load.test cleanup. @ mysql-test/extra/rpl_tests/rpl_parallel_load.test cleanup. @ mysql-test/extra/rpl_tests/rpl_parallel_load_innodb.test cleanup. @ mysql-test/r/mysqld--help-notwin.result cleanup. @ mysql-test/suite/rpl/r/rpl_parallel_benchmark.result cleanup. @ mysql-test/suite/rpl/r/rpl_parallel_conf_limits.result cleanup. @ mysql-test/suite/rpl/r/rpl_parallel_conflicts.result cleanup. @ mysql-test/suite/rpl/r/rpl_parallel_ddl.result cleanup. @ mysql-test/suite/rpl/r/rpl_parallel_multi_db.result cleanup. @ mysql-test/suite/rpl/r/rpl_parallel_seconds_behind_master.result cleanup. @ mysql-test/suite/rpl/r/rpl_parallel_start_stop.result cleanup. @ mysql-test/suite/rpl/r/rpl_parallel_temp_query.result cleanup. @ mysql-test/suite/rpl/t/rpl_parallel.test cleanup. @ mysql-test/suite/rpl/t/rpl_parallel_benchmark.test cleanup. @ mysql-test/suite/rpl/t/rpl_parallel_conf_limits.test cleanup. @ mysql-test/suite/rpl/t/rpl_parallel_conflicts.test cleanup. @ mysql-test/suite/rpl/t/rpl_parallel_ddl.test cleanup. @ mysql-test/suite/rpl/t/rpl_parallel_innodb.test cleanup. @ mysql-test/suite/rpl/t/rpl_parallel_multi_db.test cleanup. @ mysql-test/suite/rpl/t/rpl_parallel_seconds_behind_master.test cleanup. @ mysql-test/suite/rpl/t/rpl_parallel_start_stop.test cleanup. @ mysql-test/suite/rpl/t/rpl_parallel_temp_query.test cleanup. @ mysql-test/suite/sys_vars/r/all_vars.result cleanup. @ mysql-test/suite/sys_vars/r/slave_checkpoint_group_basic.result cleanup. @ mysql-test/suite/sys_vars/r/slave_checkpoint_period_basic.result cleanup. @ mysql-test/suite/sys_vars/r/slave_worker_info_repository_basic.result cleanup. @ mysql-test/suite/sys_vars/t/slave_checkpoint_group_basic.test cleanup. @ mysql-test/suite/sys_vars/t/slave_checkpoint_period_basic.test cleanup. @ sql/log_event.cc removing experimental (for benchmarking) mts_slave_local_timestamp option. @ sql/mysqld.cc few debugging time options are replaced with constants. Interface-variables are non needed anymore. @ sql/mysqld.h few debugging time options are replaced with constants. Interface-variables are non needed anymore. @ sql/rpl_rli_pdb.cc few debugging time options are replaced with constants. @ sql/rpl_slave.cc few debugging time options are replaced with constants. @ sql/sys_vars.cc few debugging time options are replaced with constants; renaming the rest that deal with MTS to be prefixed with `slave_'. ------------------------------------------------------------ revno: 3290 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 15:59:23 +0100 message: Fixed replication valgring failures caused by the MTS. ------------------------------------------------------------ revno: 3289 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 21:23:13 +0300 message: wl#5569 MTS wl#5754 Query event parallel execution Fixing failing tests and a failure in gathering accessed databases that was caused by a recent merge from trunk. @ mysql-test/suite/rpl/r/rpl_parallel_multi_db.result results updated. @ mysql-test/suite/rpl/r/rpl_parallel_seconds_behind_master.result results updated. @ mysql-test/suite/rpl/r/rpl_parallel_start_stop.result results updated. @ mysql-test/suite/rpl/t/rpl_parallel_multi_db.test moving mtr.add_supp to eliminate possibility of warning in the slave's error; adding graceful termination lines the test. @ mysql-test/suite/rpl/t/rpl_parallel_seconds_behind_master.test moving mtr.add_supp to eliminate possibility of warning in the slave's error. @ mysql-test/suite/rpl/t/rpl_parallel_start_stop.test Suppression are added for errors that are expected by test logics; adding graceful termination lines the test. @ sql/log_event.cc fixing the last argument to report() which should be c-string; fixing gathering of db:s on the master side. Because of a query can be preceeded in binlog by engineered BEGIN (the current pattern of logging from the trunk) resetting in Query::write() can't be any longer. However another reset point exists at the end of the top-level query and that suffices. @ sql/rpl_rli.h is_mts_in_group() to mimic STS' is_in_group() is added though semantics are different. @ sql/rpl_slave.cc further cleanup in sql_slave_killed() as requested by reviewers. ------------------------------------------------------------ revno: 3288 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 13:35:20 +0300 message: merge from trunk ------------------------------------------------------------ revno: 3287 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 12:27:38 +0300 message: wl#5569 MTS Fixing failing tests due to a. a flaw in `isolated parallel' mode implementation. Isolation applies to a group of event rather than to an instance. And event that contains over-max accessed db:s or event from Old master trigger marking the current being scheduled group. Such group will be executed having all prior scheduled done and nomore will be scheduled until the group is done. b. Notification to Coordinator about errored-out Worker is corrected. @ sql/log_event.cc isolation applies to a group of event rather than to an instance. Logics of isolation while the group is still executed by a Worker is refined through use of `bool curr_group_isolated' that lasts the group sceduling time and is set and reset in Log_event::get_slave_worker_id(). Assert is added to monitor tmp tables correct migration. . get_slave_worker() is called with `need_temp_tables' set to TRUE. @ sql/log_event.h Renaming to indicate that isolation applies to a group of event. Adding more candidate event to mts_do_isolate_group() assert. @ sql/rpl_rli.h Isolation mode related declaration. @ sql/rpl_rli_pdb.cc Refining notification logics. Coordinator needs both its THD::KILLED and signalling to slave_worker_hash_cond. @ sql/rpl_slave.cc Isolation mode related init-ion. ------------------------------------------------------------ revno: 3286 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-12 22:33:32 +0300 message: wl#5569 MTS making default.push to run rpl suite with non-default --mts-slave-parallel-workers > 0 in all three format/mode (row,stmt, mixed). The default is run for all suites in mixed mode and rpl suites with row+ps, stmt formats. ------------------------------------------------------------ revno: 3285 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-12 22:05:05 +0300 message: wl#5569 MTS manual merge with few fixes for segfault of the last merge from the trunk etc, compilation issue on embedded. ------------------------------------------------------------ revno: 3284 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-09 18:35:59 +0100 message: Post-fixes for merge. Fixed compilation in Windows and removed an used options. ------------------------------------------------------------ revno: 3283 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-09 16:27:47 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3282 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-06 13:51:19 +0300 message: wl#5569 MTS STOP SLAVE now stops consistently w/o gaps, KILL shall be used for an urgent stop, an error case behaves like the killed. For instance, a Worker errors out, it sends KILL to Coordinator through THD::awake(), and Coordinator kill the rest through setting a special Worker-running status to killed (which breaks the read-exec loop of a Worker). @ sql/log_event.cc Changing style of computing mts-in-group bool arg into mts_async_exec_by_coordinator(). @ sql/rpl_rli.cc Changing style of computing mts-in-group arg of an if in stmt_done(). @ sql/rpl_rli.h Adding more states to Coordinator's MTS-group view. @ sql/rpl_rli_pdb.cc Relocating notification of a Worker's failure by the Worker into the error-branch of a functioning releasing common resources (entries of APH hash). The failed Worker trying awakening possibly waiting for the signal Coordinator. The latter's behaviour in it's turn is refined to not enter the waiting phase when it has been already killed. @ sql/rpl_slave.cc sql_slave_killed() is made of two flavors of the error branches. STOPped MTS coordinator does not give out too early and wait till its MTS-group state allows that. Notification with kill to Coordinator from the errored-out or killed worker is moved into a functioning releasing common resources (entries of APH hash). This case designates a hard stop. In case of the soft (SLAVE-STOPped) MTS, Coordinator is made to wait for Workers' assignements full completion before to mark their running status for stopping. ------------------------------------------------------------ revno: 3281 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-05 20:01:51 +0300 message: wl#5569 MTS More cleanup, fixes due to found issues when running tests, some improvements incl in stopping Workers to make routine to distinguish between killed and gracefully stopped cases so in the end STOP SLAVE will guarantee consistent state (some todo remains still). @ mysql-test/extra/rpl_tests/rpl_parallel_benchmark_load.test decreasing execution time. @ mysql-test/suite/rpl/t/rpl_begin_commit_rollback.test Marking the test as limited to Single-Thread-Slave. @ mysql-test/suite/rpl/t/rpl_deadlock_innodb.test Marking the test as limited to Single-Thread-Slave. @ mysql-test/suite/rpl/t/rpl_slave_skip.test Marking the test as limited to Single-Thread-Slave. @ sql/log_event.cc addressing few reviewing comments; asserting do_update_pos() can't run by Workers; cleaning up and separating Slave_worker *Log_event::get_slave_worker_id() and its caller's interest to rli-> last_assigned_worker; Deploying MTS group status marking in Log_event::apply_event(); Making Worker's exec loop break to obey to a new Worker's running status too; Deploying mts_checkpoint_routine() in Rotate_log_event::do_update_pos() (sim action for FD event's handler); Fixing relay-log update notification in Log_event::get_slave_worker_id(); @ sql/log_event.h renaming and re-typing of func:s as suggested by reviewer; leaving a todo item for the final cleanup; correcting logics of mts_async_exec_by_coordinator(); @ sql/rpl_rli.cc Initialization of a new MTS group status proverty: mts_group_status(MTS_NOT_IN_GROUP); asserting Relay_log_info::stmt_done() can't be run by Workers; deploying mts_checkpoint_routine() alike Rotate_log_event::do_update_pos() this time in Relay_log_info::stmt_done() to cover FD-event case and consulting mts_group_status in order to decide which branch to follow; @ sql/rpl_rli.h Augmenting Relay_log_info with mts_group_status to contain MTS group status; @ sql/rpl_rli_pdb.cc Slave_worker::commit_positions() is fixed to carry update relay-log info further to the following checkpoint routine action; Slave_worker *get_slave_worker() was cleaned, interfaces improved, few asserts corrected; Slave_worker::slave_worker_ends_group() cleaned a bit, and now frees extra memory of CGEP dynarray. wait_for_workers_to_finish() is made to set the Coordinator's state as not in MTS group after synchronization with all workers; @ sql/rpl_rli_pdb.h Slave_jobs_queue is augmented with running_status member. @ sql/rpl_slave.cc apply_event_and_update_pos(): corrects asserts, synch with *all* Workers at the end of dynamically marked as End of group event (mts_is_event_isolated() -> TRUE); exec_relay_log_event(): correts NULL event read out case; slave_stop_workers(): simplifying logics of stopping Workers, to mark them with w->running_status= Slave_worker::KILLED instead of killing workers' thd. . slave_stop_workers() finilizes reset of Coordinator's state with rli->mts_group_status= Relay_log_info::MTS_NOT_IN_GROUP to make sure a next restart will proceed with the reset value. ------------------------------------------------------------ revno: 3280 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-05-30 13:05:07 +0300 message: WL#5569 MTS WL#5754 Query event parallel applying ----------------------------------------------------------------- Aggregating 7 commits that are not pushed yet to the wl5569 repo. Find comments for each cset below. ------------------------------------------------------------------ The current patch addresses concurrent updating slave_open_temp_tables status counter. The former declaration of the underlying server variable is changed from ulong to int32. While that might affect (shrink) the actual range, there has been no specified range and now after the number of bits is the same on all platforms the range cat be set to be [0, max(int32)] ****** wl#5569 MTS Wl#5754 Query event parallel appying wl#5599 MTS recovery The patch includes some cleanup, including one for temp tables support, realization of few todo:s. ****** wl#5569 MTS wl#5754 Query event parallel applying More cleanup is done; Fixing temp tables manipulation. Asserting an impossible to support use case of group of events not wrapped with BEGIN/COMMIT. Todo: recognize old master binlog to refuse to run in parallel. ****** wl#5569 MTS Implementation of giving out the applier role to Worker for all cases but ones dealing with the Coordinators state. That includes Query event with over-max-db:s and Load-data related events. The current patch also makes old master binlog be handled by MTS though sometimes e.g for Query event to switch to the sequential mode. Fixing a race condition making C to wait endlessly if a Worker has exitted due to its applying error. ****** wl#5569 MTS correcting an assert that was used to fire as warned in the previous commit. Parallel feature tests pass now. ****** wl#5569 MTS This patch contains cleanup and simplification of logics of handling some events sequentially by Coordinator and adds memory-allocation failure branch to workers starting routine. ****** wl#5569 MTS An intermediate patch to address few issues raised by reviewers. To sum up, it's about cleanup and logics simplification of event distribution to Worker and consequent actions. Some efforts were paid to support Old Master Begin-less group of events. @ mysql-test/extra/rpl_tests/rpl_parallel_load_innodb.test Elaborated version of rpl_parallel_load generator still narrowed down to test performance with Innodb. @ mysql-test/suite/rpl/r/rpl_parallel_ddl.result results updated. @ mysql-test/suite/rpl/r/rpl_parallel_multi_db.result results updated. @ mysql-test/suite/rpl/r/rpl_parallel_temp_query.result results got updated. ****** results updated. @ mysql-test/suite/rpl/t/disabled.def Disabling few tests that triggers the assert installed in log_event.cc of this commit. . ****** Restoring tree tests as this patch makes them runable. @ mysql-test/suite/rpl/t/rpl_deadlock_innodb.test test can't run in MTS because of trans retry. @ mysql-test/suite/rpl/t/rpl_dual_pos_advance.test test can't run in MTS because of Until option of START SLAVE is not yet supported by MTS. @ mysql-test/suite/rpl/t/rpl_parallel_ddl-slave.opt rpl_parallel tests need --slave-transaction-retries=0 @ mysql-test/suite/rpl/t/rpl_parallel_innodb-master.opt new test opt file is added. @ mysql-test/suite/rpl/t/rpl_parallel_innodb-slave.opt new test opt file is added. ****** rpl_parallel tests need --slave-transaction-retries=0 @ mysql-test/suite/rpl/t/rpl_parallel_innodb.test Elaborated version of rpl_parallel narrowed down to test performance with Innodb. @ mysql-test/suite/rpl/t/rpl_parallel_multi_db-slave.opt rpl_parallel tests need --slave-transaction-retries=0 @ mysql-test/suite/rpl/t/rpl_parallel_temp_query-slave.opt rpl_parallel tests need --slave-transaction-retries=0 @ mysql-test/suite/rpl/t/rpl_parallel_temp_query.test Adding logics to watch Slave_open_temp_tables in face of its concurrent updating. @ sql/event_parse_data.cc Pleasing some tests. @ sql/field.cc Restoring asserts that were before changes to sql_base.cc. . ****** Old master binlog events can't be run in parallel for few reasons. Therefore that paticular branch of code is irrelevant for MTS. @ sql/handler.cc Restoring assert that were before changes to sql_base.cc. @ sql/log_event.cc cleanup, incl restoration of the trunk version of some pieces of code. passing future_event_relay_log_pos to Worker to stike out a todo in rpl_rli.cc. . ****** Asserting a not-implemented support of group of events not braced with BEGIN/COMMIT(Xid). Such groups are possible in stored routine logging and when an old server binlog file is adopted by MTS-aware slave. . ****** Making a group of event w/o B/C braces be handled by Worker. Such group can happen from an old master or the current master bilogging some SP queries. Also over-max db:s events are made to be handled by Worker. Coordinator only handles asyncrounously events dealing with Relay-log state and synchrounously events dealing with checkpoint changes (master-group coordinates). Also few types of events from OM are left to Coordinator to execute. . . ****** Cleanup and simplification of logics of handling some events sequentially by Coordinator. An event is marked as parallel or sequential through C's rli that affects commit to info table by C as well as the event's destruction. . ****** Cleanup and logics simplification in Log_event::get_slave_worker_id(), Log_event::apply_event(). The essense is: a. to return back to apply_event_and_update_pos() event associated either with the single-threaded sql-thread rli, or one of Coord or Worker. b. while the beginning of a group and corresponding actions are left to Log_event::get_slave_worker_id(), other actions including passing the event to a Worker and the final closure of the current group is moved into apply_event_and_update_pos(). . Correcting Query_log_event::at-,de-tach_temp_tables() to expect the magic "-empty string name db partition through which the applier thread receives temp tables. @ sql/log_event.h Leaving in mts_sequential_exec() only events that either can deal with Coordinator state, or are from old master. Making Query_log_event::mts_get_dbs to return a list with a magic ""-empty string partition name in case of over-max db:s query. The empty magic is converted into a record to APH to indicate the whole hash records lock. ****** More members are added to Log_event a. to associate the event with applier. b. to provide marking a B-less group of events (old master, select sf()). @ sql/mysqld.cc Turning slave_open_temp_tables from ulong to int32 and adding atomic locks declaration for the counter updating. @ sql/mysqld.h Extern-lizing slave_open_temp_tables_lock; @ sql/rpl_rli.cc Initializing/destorying slave_open_temp_tables lock at the same time with Workers. ****** passing future_event_relay_log_pos is done via an assignment to Worker's member in slave_worker_exec_job(). @ sql/rpl_rli.h restoring the original version of get_table_data() though no real changes. . ****** simplified (curr_group_is_parallel + curr_group_split) into curr_*event*_is_parallel. ****** Removing rli members that aren't necessary any longer. @ sql/rpl_rli_pdb.cc cleanup. ****** Removing redundant my_hash_update; cleanup; Fixing temp tables related issue of leaving wait_for_worker without all entries of APH given out their temp tables. ****** Changes due to redifining an object responsible to hold assigned partitions in few methods incl Slave_worker::slave_worker_ends_group(). Some cleanup in get_slave_worker(). ****** cleanup, a new assert, and init of an debug-related member. @ sql/rpl_rli_pdb.h Redifining an object responsible to hold assigned partitions. Now it's a Dyn-array holding *pointers* to records on Assigned Partition Hash. That simplifies few routines for Worker. E.g search for the records (entries of APH) by Worker at time of committing. . ****** Adding GAQ memory-allocation failure notification. ****** Memorizing last deleted event for debugging purpose. @ sql/rpl_slave.cc Adding info message to the error log; improving comments. ****** Restoring original sequential mode version of assert in sql_slave_killed. Worker is not supposed to run this function. Testing of skipping logics is left to the rpl suite be run in the parallel mode. Cleanup. Marking recovery related todo items explicitly. Setting up guards to guarantee sequential mode in requested points of the code. . ****** Streamlining Workers state identification with a boolean running_status; worker start and stop are controlled by means of the disignator. . ****** simplified (curr_group_is_parallel + curr_group_split) into curr_event_is_parallel; GAQ memory-allocation failure branch is added to workers starting routine. ****** Cleanup and, moving append_item_to_jobs() invocation into apply_event_and_update_pos() as well as other actions mentioned in log_event.cc comments; changing signature of apply_event_and_update_pos() to return NULL in place of referrenced pointer in case the event is handed over to a Worker; checking of the pointer value is done in places dealing with update-pos and event's destruction. @ sql/sql_base.cc Replacing slave opened temp tables counter incr/decr with a function perfoming atomic locking in case Worker runs it. ****** removing unnecessary return value in incr_slave_open_temp_tables def. ****** Func is renamed. Removing all traces of previous idea to return value out of modify_slave_open_temp_tables. @ sql/sql_parse.cc cleanup. ------------------------------------------------------------ revno: 3279 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-05-24 17:29:35 +0300 message: WL#5569 MTS WL#5754 Query parallel appying Changing implementation of temporary tables support in MTS. Cleanup, fixing few todo:s and few potential issues found. @ mysql-test/suite/rpl/t/rpl_parallel.test commetting failure in /include/rpl_end.inc (todo: explore and fix). @ mysql-test/suite/rpl/t/rpl_parallel_fallback.test The only Rows_query_log_event case of testing is no longer valid because the event is parallizable now. The test is removed. @ mysql-test/suite/rpl/t/rpl_parallel_multi_db.test Adding comments about possible issue of somewhat loose behaviour of sync_slave_with_master in parallel mode. TODO: investigate and fix. @ sql/binlog.cc Renaming only. @ sql/events.cc Renaming only. @ sql/log_event.cc Fixing found issues, cleanup and temp tables support. . The assigned partition as represented by an entry is passed through the assigned Worker. via Log_event::get_slave_worker(). The method attaches the entry to the Query event which do_exec_event() calls new attach and detach methods that grabs temp tables list on each involved db and returns possibly updated lists back to APH at the end of Query event applying. @ sql/log_event.h Mostly renaming. @ sql/rpl_rli.cc relocating mts_get_coordinator_thd() definition. @ sql/rpl_rli.h re-defining mts_is_worker() through SYSTEM_THREAD_SLAVE_WORKER. @ sql/rpl_rli_pdb.cc Changes mostly due to temp table support. Coordinator disaccosiates temporary tables of a being schedule db-partition from its thd and attaches the list to APH's entry. In the following the Worker finds the list and adopts it to return possibly updated version back to the entry at the end of the query. The list resides most of time in either APH's passive (usage == 0) entry, or in Worker's thd->temporary_tables. It can be relocated back to the Coordinator's repository via wait_for_workers_to_finish() that is called in case an event requires the sequential execution. . Few auxiliary functions are defined dealing with migration and merging temp tables. @ sql/rpl_rli_pdb.h Adding TABLE* pointer to list of temp tables in entry of Assigned Partition Hash. The entry pointer carries temp tables from C to W and backward. Changes in few function signitures motivated by temp table support. Adding auxiliary funcs to help with temp tables manipulations. @ sql/rpl_slave.cc renaming, cleanup and improving Worker identification. @ sql/rpl_slave.h cleanup. @ sql/rpl_utility.h cleanup. @ sql/sql_base.cc removing a hack to access temp tables in MTS. @ sql/sql_class.cc Renaming only. @ sql/sql_class.h Renaming only. @ sql/sql_rename.cc Renaming only. @ sql/sql_table.cc Renaming only. @ sql/sql_view.cc Renaming only. ------------------------------------------------------------ revno: 3278 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-05-19 12:36:28 +0300 message: wl#5569 MTS Support for ROWS_QUERY_LOG_EVENT is added. It required refactoring of its handling in the canonical sequential mode. The event life suggests its behavior similar to objects associated with Table_map, in particural, its destoying to occur at the end-of-statement time. Tested against existing ROWS_QUERY_LOG_EVENT feature tests incl rpl_row_ignorable_event in both sequential and parallel mode. @ sql/log_event.cc cleanup of MTS code; relocating handle_rows_query_log_event() logics into a. do_apply_event() and b. rli->cleanup_context(). @ sql/log_event.h cleanup of MTS code; @ sql/rpl_rli.cc Deploying ROWS_QUERY_LOG_EVENT destruction in context_cleanup(). @ sql/rpl_rli.h cleanup of MTS code; @ sql/rpl_slave.cc cleanup of MTS code; @ sql/sql_binlog.cc Simplifying ROWS_QUERY_LOG_EVENT handling in the case of BINLOG pseudo-query. ------------------------------------------------------------ revno: 3277 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-05-16 22:43:58 +0300 message: wl#5569 MTS Simplifying Coordinator-Worker interfaces. In essence after this patch Worker execute events in its private context (class Slave_worker :public Relay_log_info). The only exception is Query referring to temporary table. The temp:s are maintained in the Coordinator's "central" rli; removing some dead code; performing a lot of cleanup. There are few todo items incl: 1. To implement several todo:s scattered across MTS' code and tests (e.g to restore protected for few members of RLI of rpl_rli.h); 2. to cover Rows_query_log_event that currently can cause hanging (e.g rpl_parallel_fallback) 3. To sort out names of classes based on Rpl_info, possibly remove Rpl_info_worker @ mysql-test/suite/rpl/t/rpl_parallel_start_stop.test The test as most of rpl_parallel* bunch can't yet stand `include/rpl_end.inc'. @ sql/log_event.cc Defining the default Log_event::do_apply_event_worker() that simply executes canonical do_apply_event() however supplying Slave_worker intance reference that is critical in order to execute different rli->methods(), e.g `report'. Xid_log_event::do_apply_event_worker() runs the Worker version of Xid commit; simplifying Rows event parallel applying to remove or elaborate some host of the early prototype code incl. rli->get_tables_to_lock() and related logics; @ sql/log_event.h Adding virtual int do_apply_event_worker() to Log_event and specializing it for Xid class; @ sql/rpl_reporting.cc Spliting report() into two methods in order to make possible to call the functional part of the two with va_list as an arg be called from Slave_worker class. @ sql/rpl_reporting.h New va_list version of report method is declared. @ sql/rpl_rli.cc removing early prototype time support to Rows-event parallel execution. The new scheme of applying is almost equivalent to the standard sequential algorith thanks to Slave_worker :public Relay_log_info inheritence implementation. @ sql/rpl_rli.h Removing unnecessary interfaces; TODO: restore `protected' for few members. @ sql/rpl_rli_pdb.cc Some cleanup and defining Slave_worker::report() to eventially call the Coordinator's rli->report() and exploit a fact that the latter was designed for concurrent use. @ sql/rpl_rli_pdb.h Changing base class for Slave_worker to make it behaving as Relay_log_info when needed; Removing some dead code; Adding report() methods to run it in do_apply_event(). @ sql/rpl_slave.cc Removed UNTIL todo as it's actually not supported with a warning; Removed a todo for cleanup of error-out statement format transaction because w->cleanup_context() impelements it indeed; Cleanup or transition from w->w_rli (of Relay_log_info) to w (of Slave_worker); Adding forgotten unlock_mutex; Simplifying definitions of few func:s (mts_is_worker() etc); ------------------------------------------------------------ revno: 3276 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-05-06 21:33:32 +0300 message: wl#5569 MTS improving benchmarking test. ------------------------------------------------------------ revno: 3275 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-04-06 15:51:58 +0300 message: wl#5569 MTS Statistics for Workers and Coordinator incl waiting times, sleeping is reported now into the error log as slave stopping time. @ sql/log_event.cc statistics addded. @ sql/rpl_rli.h statistics added. @ sql/rpl_slave.cc print-out mts statistics into the error log at stopping the slave. ------------------------------------------------------------ revno: 3274 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-04-05 19:26:37 +0300 message: wl#5569 MTS restoring previous 4 default workers that rpl_parallel works with. ------------------------------------------------------------ revno: 3273 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-04-03 13:07:30 +0300 message: wl#5569 MTS Benchmarking related patch uniforms rpl_parallel to be run with arbitrary number of workers, db:s, tables, etc. TODO: to restore the final constinency check which is given out temporary while i could not find a way to leave it surrounded with a --dis/en-able* stanza. @ mysql-test/extra/rpl_tests/rpl_parallel_load.test making the load generator to be indifferent to all parameters incl the number of db:s. Have to comment out the final consistency check sinc could not find a way to hide the verified table(s) line out of the results. @ mysql-test/suite/rpl/r/rpl_parallel.result results got updated. @ mysql-test/suite/rpl/r/rpl_sequential.result results got updated. @ mysql-test/suite/rpl/t/rpl_parallel-slave.opt the test caller has to supply -mysqld=--mts-slave-parallel-workers=[:num:]. With :num: == 0 the test is equivalent to rpl_sequential. @ mysql-test/suite/rpl/t/rpl_parallel.test removed traces of the number of workers that can vary in [0 - ..] range. The test caller has to supply -mysqld=--mts-slave-parallel-workers=[:num:]. With :num: == 0 the test is equivalent to rpl_sequential. ------------------------------------------------------------ revno: 3272 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-04-02 14:32:02 +0300 message: wl#5569 MTS a test file for benchmarking is added. Benchmarking results can be gained by extracting the master side generating and the slave side applying times like in the following loop: workers=6; for n in `seq 1 3`; do echo; echo loop: $n; echo; my_mtr.sh --mysqld=--mts-slave-parallel-workers=$workers \ rpl_parallel_benchmark --mysqld=--binlog-format=statement \ && cat /dev/shm/var/mysqld.2/data/test/delta.out >> p${workers}_stmt.out 2>&1; done @ mysql-test/extra/rpl_tests/rpl_parallel_benchmark_load.test the load generator for a test file for benchmarking is added. @ mysql-test/suite/rpl/r/rpl_parallel_benchmark.result a new results file is added. @ mysql-test/suite/rpl/t/rpl_parallel_benchmark-slave.opt slave does to log into binary log. The number of workers is supposed to set via --mysqld at mtr invocation. @ mysql-test/suite/rpl/t/rpl_parallel_benchmark.test a test file for benchmarking is added. ------------------------------------------------------------ revno: 3271 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-03-30 17:11:24 +0300 message: wl#5754 Query event parallel execution Small cleanup for comments as requested by reviewer. @ sql/log_event.cc only comments cleanup. @ sql/rpl_slave.cc only comments cleanup. ------------------------------------------------------------ revno: 3270 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-02-27 19:35:25 +0200 message: WL#5754 Query event parallel execution Bundling together implementation the whole DML+DDL Query parallel support. That includes: The earlierst four rev:s to cut off the DML stage of the parallel query project from the following devoted to DDL. The four skeleton parallel applying of Queries containing a temporary table, and implement a core of the design that is the DML queries. Queries can contain arbitrary features including temp tables. The DDL part also refined few items related to the general low-level design. In particular, of the mark of the over-max db:s in the updated-db:s status var is turned to be another new constant value. The very last patch to the bundle addresses the last review mail notes. @ mysql-test/r/mysqld--help-notwin.result results get updated. ****** results gets updated. @ mysql-test/suite/rpl/r/rpl_packet.result results updated. @ mysql-test/suite/rpl/r/rpl_parallel_ddl.result the new test results are added. . ****** results get updated. @ mysql-test/suite/rpl/r/rpl_parallel_multi_db.result new result file is added. ****** results get updated. @ mysql-test/suite/rpl/r/rpl_parallel_start_stop.result results get updated. @ mysql-test/suite/rpl/r/rpl_parallel_temp_query.result a new results file ****** new result file is added. @ mysql-test/suite/rpl/t/rpl_packet.test making a hashing fixes in order the test to pass. todo: refine logics of max_allowed_packed for master & slave. @ mysql-test/suite/rpl/t/rpl_parallel_ddl.test DDL specifics for parallelization tests are added. ****** added over-the-max updated db:s case through RENAME tables. ****** added remained DDL set members to test. @ mysql-test/suite/rpl/t/rpl_parallel_fallback.test Marked a todo. @ mysql-test/suite/rpl/t/rpl_parallel_multi_db.test multi-db DML query test is added. todo: add triggers, sf(), SP. ****** adding stored routines testing. ****** increased the number of db:s. Notice that forces to change the default of --thread-stack size; added over-the-max updated db:s case through multi-updates. @ mysql-test/suite/rpl/t/rpl_parallel_start_stop.test removed explicit log pos from the results. @ mysql-test/suite/rpl/t/rpl_parallel_temp_query.test query with temporary tables testing. @ mysql-test/suite/sys_vars/r/all_vars.result results get updated. @ sql/binlog.cc gathering to be updated db in the DML case. Over-MAX_DBS_IN_QUERY_MTS-sized list won't be shipped to the slave. ****** correcting memory allocation to be in thd's memroot. ****** separating out multiple db gathering into a THD method to be invoked as for DML so for few cases of DDL. ****** Changed location of comparisions against MAX_DBS_IN_QUERY_MTS to be inside the adding to db list method; refined logics of gathering db in decide_(): *all* db:s whenever there is at least one table to update are picked up. ****** Comments are added, other changes are due to MAX_DBS_IN_QUERY_MTS + 1 ceased to be the over-max mark. @ sql/events.cc gathering updated dbs for create/drop events. @ sql/field.cc relaxing an assert (todo: add to it more specific claim field->table is temp). ****** adding comments to asserts. ****** improved comments. @ sql/handler.cc relaxing an assert (todo: add to it more specific claim table is temp). ****** adding comments to asserts. @ sql/log_event.cc master and slave (Coord's and Worker's) handling of updated db:s. The Coordinator's distribution changed to involve a loop per db, similaraly for the Worker at applying. ****** adding comments and correcting clearence of binlog_updated_db_names to not let BEGIN, COMMIT in particular to get the updated list. ****** removed an extraneous assert. ****** cleaned some parts of the code; improved comments; refined an assert; turned Coordinator to use a specific new mem-root; other changes are due to MAX_DBS_IN_QUERY_MTS + 1 ceased to be the over-max mark. @ sql/log_event.h Hardcoding the max updated db:s. Static allocation for updated db:s in Query log event is motivated by the fact of the event is shared by both C and W and the standard malloc/free can't be a reasonble choice either. Added a new status and changed dependent info. Added a new method to return the *list* of updated db:s which in all but Query case is just a wrapper over get_db(). . ****** adding commits, and interfaces to helper functions. ****** updated some comments. ****** added OVER_MAX_DBS_IN_QUERY_MTS to serve as the over-max db:s mark instead of the former MAX_DBS_IN_QUERY_MTS + 1; mts_get_dbs() receives a mem-root arg supplied by Worker or Coord. @ sql/mysqld.cc removed opt_mts_master_updated_dbs_max. @ sql/mysqld.h removed opt_mts_master_updated_dbs_max. @ sql/rpl_rli.cc a new temp table mutex init, destroy and a set of helper functions providing access to C,W's thd:s in arbitrary place of the server code are added. ****** fixing an error. ****** relocalating helper functions to rpl_slave.cc. @ sql/rpl_rli.h a new temp table mutex is added to RLI class. ****** improving comments. ****** A memroot for the Coordiator is placed into rli. @ sql/rpl_slave.cc SLAVE_THD_WORKER appeared to be redundant. Worker's thd->system_thread is set to the same as the Coordinator thread constant. ****** Added a work-around/cleanup needed by the standard temp table closing algorithm. ****** comments explaining close_temp_tables() not to run by Workers. Accepting relocated functions. ****** init alloc and the final destuction of the Coord rli->mts_coor_mem_root mem-root. @ sql/rpl_slave.h declarations of auxiliary func:s defined in rpl_slave.cc are moved from log_event.h. @ sql/share/errmsg-utf8.txt Added a new error/warning on master specific to Query parallel replication. @ sql/sp.cc covering db gathering for create/drop SP. @ sql/sql_base.cc replacing refs to thd->temporary with an appropriate one corresponding to the Coord's thd->t_t:s. Also surrounding critical sections of codes dealing with opening, finding, closing or changing temproray_tables' list with a specific mutex lock/unlock. ****** Correcting and simplifying logics for the temp table parallel support. In particular close_temporary_tables() does not need to know about thd of the caller. . ****** simplified the temp table support related addons. The double ref to thd->temporary_table is needed only in one place. @ sql/sql_class.cc master side gathering updated db:s new memeber initializations. ****** Correcting logics of merging the updated db:s of a child to the parent's top-level. ****** removed dead-comments. @ sql/sql_class.h master side gathering updated db:s list and accessor members. ****** adding a necessary cleanup method. ****** adding two base methods of db gathering: one for a queries that can update only one db, and the other for multiple db:s. . ****** added more comments, removed dead-codes. @ sql/sql_db.cc create/drop database case of db gathering. @ sql/sql_rename.cc rename table(s) case of db gathering. @ sql/sql_table.cc create, drop, alter cases of db gathering. ****** Changed location of comparisions against MAX_DBS_IN_QUERY_MTS to be inside the adding to db list method. @ sql/sql_trigger.cc create/drop trigger case of db gathering. @ sql/sql_view.cc support for CREATE/DROP views is added. @ sql/sys_vars.cc Added a system variable (todo/fixme: may turn out to be unnecessary though). . ****** removed ealier added variable. ------------------------------------------------------------ revno: 3269 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-01-12 01:01:02 +0200 message: merging from mysql-trunk ------------------------------------------------------------ revno: 3268 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-01-12 00:54:12 +0200 message: wl#5569 MTS fixing the worker threads start/stop. @ sql/rpl_rli.h adding RLI::opt_slave_parallel_workers to cache the server's namesake global var. @ sql/rpl_slave.cc moving rli->recovery_parallel_workers resetting down to the exit point from starting routine. ------------------------------------------------------------ revno: 3267 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-27 18:54:41 +0000 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3266 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-24 01:57:03 +0200 message: wl#5569 MTS the timed-wait loop of SQL thread required a break-through parameter in case the signal missed in action and just timeout would be reported ------------------------------------------------------------ revno: 3265 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 19:03:42 +0200 message: merging from the repo wl5569 ------------------------------------------------------------ revno: 3264 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 17:49:19 +0200 message: wl#5569 MTS fixing corner cases that mtr-testing with mts workers against stardard suites reveal. @ sql/log_event.cc removing COMMIT Query event from a set of ones containing the partition info. @ sql/log_event.h ROLLBACK TO can be inside of a replicated trans. ------------------------------------------------------------ revno: 3263 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 16:00:28 +0200 message: wl#5569 MTS: refining another assert that can force C to delete events that are skipped with the slave skip counter ------------------------------------------------------------ revno: 3262 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 15:34:02 +0200 message: wl#5569 MTS Correcting an assert that is hit by few tests. @ sql/rpl_rli_pdb.cc Indeed, Coordinator can be awakened with abort_slave flag UP and not being killed. ------------------------------------------------------------ revno: 3261 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 13:27:15 +0200 message: wl#5569 MTS merging from the repo. ------------------------------------------------------------ revno: 3260 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 13:25:31 +0200 message: wl#5569 MTS fixing failing tests. @ sql/rpl_slave.cc fixing an issue where a Rotate event could appear in between of events of a group. That case should not force any rli->flush_info() but rather normal relay log coordinates incr. ------------------------------------------------------------ revno: 3259 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 20:34:26 +0200 message: wl#5569 MTS merging from the repo. ------------------------------------------------------------ revno: 3258 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 20:31:13 +0200 message: wl#5569 MTS fixing tests failure when mtr runs --mts_slave_parallel_workers != 0. rpl000010 is a representative. Fixed with identifying, marking, running carefully ev->update_pos() and destroying an event that can split a group of events to force part to be in different relay logs. @ sql/log_event.cc Identifying and marking an event that can split a group of events to force part to be in different relay logs. @ sql/log_event.h FD and Rotate both can be the group splitter but only if they are "artificial". @ sql/rpl_rli.h a marker flag to be set when the group splitter such as FD is spotted. @ sql/rpl_slave.cc identifying, marking, running carefully ev->update_pos() and destroying an event that can split a group of events. ------------------------------------------------------------ revno: 3257 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 13:57:18 +0200 message: wl#5569 MTS and wl#5599 MTS recovery The general recovery implementation is finished by this patch. Tested against ./mtr rpl_parallel_conf_limits. Warning, ./mtr rpl_parallel_conf_limits rpl_parallel_conf_limits ... can fail at the 2nd etc test because of no removal of Worker tables happens at RESET SLAVE. @ sql/log_event.cc adding a special to mts-recovery branch to the event scheduling routine located in Log_event::apply_event(). todo: think about rli->flush_info() at the end of gap-filling. @ sql/rpl_rli.cc to be recovered group counter and a running index on the recovery bitmap are init-ed, also renaming. In recovery phase Coordinator can execute rows-events now. @ sql/rpl_rli.h to be recovered group counter and a running index on the recovery bitmap is added. @ sql/rpl_slave.cc engaging to be recovered group counter in mts_recovery_groups() in the end of which the recovery bitmap is ready and rli->mts_recovery_group_cnt counted how many bits of interest in there. . No actual recovery case is followed by rli->recovery_parallel_workers= rli->slave_parallel_workers at Workers startup time. ------------------------------------------------------------ revno: 3256 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 22:12:30 +0200 message: wl#5569 MTS slave_worker_info def is updated in the system db. ------------------------------------------------------------ revno: 3255 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 21:34:58 +0200 message: merging with repo ------------------------------------------------------------ revno: 3254 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 21:31:29 +0200 message: wl#5569 MTS Recovery routine part I: gathering the group recovery bitmap. @ sql/log_event.h Introducing a typedef for getting frequently used struct. @ sql/rpl_rli_pdb.cc checkpoint_seqno is added to the Worker table to index of the last committed group in the bitmap; Init, read, write, propagation of its value are addressed. @ sql/rpl_rli_pdb.h Worker class gets checkpoint_seqno members. @ sql/rpl_slave.cc mts_recovery_groups() is refined to follow a simpler design scheme. Checkpoint info that Worker must have at recovery consists of seqno, bitmap and the master binlog coordinates. ------------------------------------------------------------ revno: 3253 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 22:18:33 +0000 message: WL#5599 Fixed routine to compute the bitmap of executed events. ------------------------------------------------------------ revno: 3252 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 21:37:48 +0200 message: wl#5569 MTS adding checkpoint relay_log_name,pos as necessary part to locate a relay-log for recovery. Tested with rpl_parallel. @ mysql-test/extra/rpl_tests/rpl_parallel_load.test restoring $iter to 1k as it was used to be. @ sql/log_event.cc adding checkpoint relay_log_name,pos @ sql/rpl_rli_pdb.cc adding checkpoint relay_log_name,pos @ sql/rpl_rli_pdb.h adding checkpoint relay_log_name,pos ------------------------------------------------------------ revno: 3251 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 17:58:58 +0200 message: wl#5569 MTS manual merging from the repo and correcting GAQ processing with introducing a volatile byte to indicate whether an item is busy or released. ------------------------------------------------------------ revno: 3250 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-18 21:00:23 +0200 message: wl#5569 MTS fixing --mts-exp-slave-run-query-in-parallel=1 case when Query-log-event can be run in parallel incl DML and DDL. The feature is `exp'erimental still can be tried while there are no temp tables involved neither a db different than the session's default is modified by the query. Tested: Changes sustain mtr rpl_parallel --mysqld=--mts-exp-slave-run-query-in-parallel=1 --mysqld=--binlog-format=statement @ sql/log_event.cc making a single-query group such as DDL to be distributed to Workers. ------------------------------------------------------------ revno: 3249 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-17 14:46:15 +0200 message: wl#5569 MTS fixing PB2 failures, incl valgrind issues, long exec time and asserting in a test. @ mysql-test/extra/rpl_tests/rpl_parallel_load.test Making slow win machines happy on PB2 to lessen load. @ mysql-test/suite/rpl/t/rpl_parallel_start_stop.test adding an assert and print out if one will fire. @ sql/rpl_rli_pdb.cc the \0 term char was not allocated. @ sql/rpl_slave.cc missed initialization is added. ------------------------------------------------------------ revno: 3248 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-17 00:00:47 +0200 message: merge from wl#5569 repo to local branch rpl_sequential opt files are added to avoid mtr give up to process a bulk of unsafe warnings. ------------------------------------------------------------ revno: 3247 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-16 23:41:45 +0200 message: wl#5569 MTS Adding transparent support/fallback to the sequential execution cases of 1. Query-log-event 2. Rows_query_log_event info event Both cases can be fully parallelized in future project. Fixing an issue in move_queue_head() that was surficed as an assert in Slave_worker::slave_worker_group_ends(). Fixing destoying an event by Worker. @ mysql-test/extra/rpl_tests/rpl_parallel_load.test Edited to compare all involved tables, add explicitly multi-statement transaction, and letting verification even in a case of statement format events. @ mysql-test/suite/rpl/r/rpl_parallel_fallback.result new results file is added. @ mysql-test/suite/rpl/t/rpl_parallel.test is changed to run with all formats because it starts verifying the transparent fallback to sequential for Query-log-event and related. @ mysql-test/suite/rpl/t/rpl_parallel_fallback.test A new test file is added to contain cases of transparent fallback to the sequential execution. Rows_query_log_event case is placed there. Notice, the Query-log-event fallback is largely tested by rpl_parallel. @ sql/log_event.cc Refining event distribution logics in order to support fallback to the sequential execution. The following definions are framed out . curr_group_is_parallel - indicates a Worker is engaged for all operation incl dectruction for all events of the group. The value lasts till a next group is decided to be "pure" sequential so that C will execute it, update rli synchronously and destroy the events. curr_group_seen_begin - indicates if the current group is started with a B-event (BEGIN query). The value lasts till T-event is distributed. . Deploying a w/a for Rows_query_log_event that involves a nap to protect from a case of multiple Rows_query_log in one group. Notice, a specific (w/a as well) rule of destroying the event. @ sql/log_event.h Rows_query_log_event fallback to sequential support is added. @ sql/rpl_rli.cc Rows_query_log_event fallback to sequential support is added. @ sql/rpl_rli.h curr_group_isolated is defined to be a parallel group that is executed in isolation from any other ahead and behind workers. Coordinator is supposed to provide such environment, the new member is a facility to control it. @ sql/rpl_rli_pdb.cc Fixing usage of circular_buffer_queue::gt() to deploy an assert suggested by the heading comments. Refining logics of finding a gap in GAQ. Adding 2nd arg to wait_for_workers...() to cover the 2nd use case of waiting Workers by C. The two are: wait for all, and wait for all but not one being currently scheduled. @ sql/rpl_slave.cc Refining logics of C's commit to the main rli due to a pure sequential event (e.g FD, Rotate), similarly refining logics of freeing. Deploying a w/a for Rows_query_log_event. ------------------------------------------------------------ revno: 3246 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-14 16:46:20 +0200 message: merge from wl5569 repo ------------------------------------------------------------ revno: 3245 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-14 10:57:16 +0200 message: wl#5569 MTS a light cleanup to arrange the option/system var names properly - mts_-prefixing, and _exp prefixing for experimental features needed for benchmarking (mts_exp_slave_local_timestamp) or suppored limitly (mts_exp_slave_run_query_in_parallel for Query-log-event). Fixing GAQ size. It might be too tight e.g in case of the max WQ length of 1; tested with running rpl_parallel supplying --mts-slave-worker-queue-len-max=1. @ sql/rpl_slave.cc Fixing GAQ size. It might be too tight e.g in case of the max WQ length of 1. ------------------------------------------------------------ revno: 3244 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 18:53:32 +0200 message: wl#5569 MTS fixing a valgrind stack cauased by extra pfs-keys/cond_var. Those are removed with Alfranio`s consent ------------------------------------------------------------ revno: 3243 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 17:57:01 +0200 message: wl#5569 MTS fixing a set of valgrind warning cauased by a c&p ------------------------------------------------------------ revno: 3242 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 16:52:50 +0200 message: wl#5569 MTS updating results for few tests. ------------------------------------------------------------ revno: 3241 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-11 21:00:47 +0200 message: wl#5569 MTS 1. Fixing recovery related issue of DBUG_ASSERT(rli->get_event_relay_log_pos() >= BIN_LOG_HEADER_SIZE); at slave start with shifting mts_recovery_routine() at front of the assert. 2. Making SKIP-ed event to commit to the central RLI. That is correct since Workers are not executing anything at this time. 3. Fixing the default for mts_checkpoint_period which should not be zero normally. Zero makes sense solely for debugging (so we may stress that through VALID_RANGE(1,...). 4. Introduced a general mts-unsupported error/warning to apply to cases of non-zero parallel workers and a feature that parallelization can't work with. @ mysql-test/suite/rpl/r/rpl_parallel_start_stop.result results are updated. @ mysql-test/suite/rpl/t/rpl_parallel_start_stop.test Extending the test to cover UNTIL, SKIP, a temporary to the regular error escalation. ------------------------------------------------------------ revno: 3240 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-10 18:25:27 +0200 message: merge from wl5569 repo to a local branch ------------------------------------------------------------ revno: 3239 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-10 17:50:03 +0200 message: wl#5569 MTS Improving GAQ in a) limit size to be capable to hold items while all WQ:s are full b) move_queue_head() contained a flaw to make no progress falsely c) never let to enque in GAQ while it's full @ sql/log_event.cc Fixing impossible gaq_idx == -1. GAQ may not be full at this point. The total counter of executed groups starts from 1, that is nothing is done yet when 0. @ sql/rpl_rli_pdb.cc move_queue_head() contained a flaw to break the progress loop falsely. Fixed with comparing the current index with the Worker::last_group_done_index instead of this->last_done. The latter changed to become of pure statictics character and to contain the total seqno which is guaranteed to grow monotonically by its ulonglong size. @ sql/rpl_rli_pdb.h changes due to last_done turned into statistics holder. @ sql/rpl_slave.cc Improving GAQ in limit size to be capable to hold items while all WQ:s are full. Wait to release item at checkpoint() when GAQ is full. @ sql/sys_vars.cc opt_mts_coordinator_basic_nap is set to non-zero 5 msecs default value. ------------------------------------------------------------ revno: 3238 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-09 19:46:27 +0200 message: merge from wl5569 repo to a local branch ------------------------------------------------------------ revno: 3237 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-09 19:45:02 +0200 message: wl#5569 MTS Integration with wl#5599 recovery for MTS and fixing two asserts. One is due to missed cleanup of errored-out rows-events; the other is a work-around on w->curr_group_exec_parts->dynamic_ids is initialized to have one partition on the Worker startup, but it should not. @ sql/log_event.cc Propagating CP related info from C to W. @ sql/rpl_rli.cc Added a part of CP info from C to W propagation. @ sql/rpl_rli.h New members to RLI due to CP info from C to W propagation. @ sql/rpl_rli_pdb.cc Worker stores the new CP to mention it in flush_info() along with (todo) a bitmap of the executed groups within the checkpoint interval. @ sql/rpl_rli_pdb.h New members to a transport and the Worker class due to CP info. @ sql/rpl_slave.cc missed cleanup of errored-out rows-events; work-around on w->curr_group_exec_parts->dynamic_ids is initialized to have one partition on the Worker startup, but it should not. ------------------------------------------------------------ revno: 3236 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 13:59:07 +0000 message: WL#5599 Fixed warning messages. ------------------------------------------------------------ revno: 3235 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 12:59:07 +0000 message: WL#5599 Fixed test cases. ------------------------------------------------------------ revno: 3234 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 01:30:32 +0000 message: WL#5599 Fixed failures in test cases. ------------------------------------------------------------ revno: 3233 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 00:33:48 +0000 message: merge mysql-trunk --> mysql-next-mr-wl5569 Conflicts: . mysql-test/r/log_tables_upgrade.result . mysql-test/r/mysql_upgrade.result . mysql-test/r/mysql_upgrade_ssl.result . mysql-test/r/mysqlcheck.result . mysql-test/suite/perfschema/r/pfs_upgrade_lc0.result . mysql-test/suite/rpl/t/disabled.def . mysql-test/suite/sys_vars/r/all_vars.result . mysql-test/t/system_mysql_db_fix40123.test . mysql-test/t/system_mysql_db_fix50030.test . mysql-test/t/system_mysql_db_fix50117.test . sql/log_event.cc . sql/log_event.h . sql/rpl_mi.h . sql/rpl_slave.cc . sql/share/errmsg-utf8.txt ------------------------------------------------------------ revno: 3232 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-07 20:01:39 +0200 message: manual merge with a piece of recovery support on repo. rpl_parallel hits an assert that Alfranio is fixing ------------------------------------------------------------ revno: 3231 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-07 19:35:16 +0200 message: wl#5569 MTS Testing related fixes incl master_pos_wait() support and thereafter replacing sleeps with the functioning sync_slave_with_master; Fixing the limitted Q-log-event parallelization. After the fixing mixture of rows- and Q- transactions can run concurrently. Q-transaction will be treated sequentially by default. @ mysql-test/suite/rpl/r/rpl_parallel.result results updated. @ mysql-test/suite/rpl/r/rpl_sequential.result results updated. @ mysql-test/suite/rpl/t/disabled.def a nuisance test gets disabled. @ mysql-test/suite/rpl/t/rpl_parallel_conf_limits.test sleeps go away. @ mysql-test/suite/rpl/t/rpl_parallel_conflicts.test sleeps go away. @ mysql-test/suite/rpl/t/rpl_parallel_start_stop.test sleeps go away. @ sql/log_event.cc Fullfiling long perding time todo:s wrt update_pos and delete ev, update_pos() is redundant being superseded by a special commit of the Worker; Addressing {B, Q, T} not-parallel case. The issue was due to unability to support Q-log-event as quickly as Rows- parallelization. @ sql/rpl_rli_pdb.cc circular_buffer_queue::de_tail() a very specific method is motivated by the limitted support for Q-log-ev parallelization. It may happen to be unnessary once Q has become parallel. @ sql/rpl_slave.cc Implementing CP in successful read branch. ------------------------------------------------------------ revno: 3230 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2010-12-05 22:04:17 +0200 message: wl#5569 WL#5599 MTS & recovery Refining and correcting two wl:s integration. The main achievement is events execution status is consistently recorded into the Worker and the central RL recovery tables. That was tested manually in rather agressive env where IO was used to reconnect randomly and load from Master contained Rotate events. TODO: to fix: rpl.rpl_parallel_conf_limits may not pass to address: Multi-stmt Query-log-event transaction case (see todo in sources). to destruct by Workers their executed events (was deferred until ev->update_pos started working). (Alfranio) to deploy mts_checkpoint_routine() call inside the successful event read branch of next_event(). Otherwise no calling happens when Coord is constanly busy with read/distribute. ------------------------------------------------------------ revno: 3229 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-04 19:14:50 +0200 message: merging from the repo wl5569 ------------------------------------------------------------ revno: 3228 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-04 15:45:02 +0000 message: Added mutex to the checkpoint_routine. ------------------------------------------------------------ revno: 3227 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-03 16:56:11 +0000 message: Implemented periodic checkpoint if parallel slave is enabled. ------------------------------------------------------------ revno: 3226 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-03 10:15:45 +0000 message: Fixed commit_positions() and removed unnecessary checkpoint thread. ------------------------------------------------------------ revno: 3225 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-02 20:13:12 +0200 message: manual merge to wl#5569 tree ------------------------------------------------------------ revno: 3224 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-02 19:46:46 +0200 message: wl#5569 MTS User interface related: set @@global.slave_parallel_workers= `non-zero` following with `START SLAVE` starts slave with so many Worker threads. That is non-zero value is defacto the slave parallel execution mode. Earlier introduced enum enum_slave_exec_mode SLAVE_EXEC_MODE_PARALLEL is withdrawn. Fixes rli->mts_pending_jobs_size statistics which might cause assert-crash otherwise. a silly c&p mistake of relay-log name change notification. Made a little clean-up including relocation of init-ion of workers related stuff into start_slave_workers(). Many changes in tests due to SLAVE_EXEC_MODE_PARALLEL and not only. @ sql/log_event.cc few asserts are motivated by a silly c&p mistake of relay-log name change notification. Fixing rli->mts_pending_jobs_size statistics which might cause assert-crash otherwise. @ sql/rpl_rli.cc relocating Worker related stuff from init at RLI constructor to the start slave workers; Intruduced a public Relay_log_info::reset_notified_relay_log_change() to call when C discovers the relay log name change (next_event). @ sql/rpl_rli.h the server @@global.slave_parallel_workers has affect only when the slave is stopped. At start the var's value is copied to rli::slave_parallel_workers and this value is in used in the slave session time. Refining is_parallel_exec() to base on the rli's value; @ sql/rpl_rli_pdb.cc Fixing a c&p bug for relay-log name; @ sql/rpl_slave.cc removing ealier intruduced extra rli->slave_worker_is_error; relocating Worker related stuff from init at RLI constructor to the start slave workers; @ sql/sql_class.h removing explicit slave exec paral mode. @ sql/sys_vars.cc changing default 4 to 0 for slave_parallel_workers. Non-zero value means so many Worker threads is launched. Conversely zero is the sequential slave execution mode. Fixing the name of the server var: mts_partition_hash_soft_max. ------------------------------------------------------------ revno: 3223 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-01 19:08:21 +0200 message: wl#5569 MTS The limit conditions such as WQ len, total WQ:s size related changes. Also a new test file is added. @ mysql-test/suite/rpl/r/rpl_parallel_conf_limits.result new results file. @ mysql-test/suite/rpl/r/rpl_parallel_conflicts.result results updated. @ mysql-test/suite/rpl/t/rpl_parallel_conf_limits.test Testing two RAM usage by Workers limit parameters. @ mysql-test/suite/rpl/t/rpl_parallel_conflicts.test Converting an assert into wait for that condition. Todo: improve the test to let it run with slave_run_query_in_parallel. @ sql/log_event.cc limit condition (wq len, total wql sizes) related changes. fixing a compilation warn. @ sql/mysqld.cc renaming. @ sql/mysqld.h renaming. @ sql/rpl_rli.cc renaming. @ sql/rpl_rli.h s / slave_max_pending_jobs / opt_mts_slave_worker_queue_len_max / the new name is supposed to indicate the purpose of the entity more clearly. @ sql/rpl_slave.cc renaming. @ sql/sys_vars.cc renaming. ------------------------------------------------------------ revno: 3222 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-30 16:39:40 +0200 message: merging from from wl#5569 repo containing wl#5599 integration ------------------------------------------------------------ revno: 3221 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-30 16:02:15 +0200 message: wl#5569 MTS Fixing group_relay_log_name change propagation from C to W; Garbage collection in the Partition-to-Worker hash is added with a parameter of how many records in the hash are tolerated w/o checking of the usage counter. Adding C-W synchronization due to: - overall WQ:s data max - hitting the limit of a WQ length Adding Flow Control infrastructure with - level of the hungry Worker forcing Coordinator to distribute eagerly symmetrically a Worker whose load is more than 100 % - hungry level is considered as fed-up. - nap time for C in case all WQ:s lengths are above the level. - a weight param to the base nap as a function of the number of fed-up W:s. TODO: UNTIL to force sequential exec; To fix ROWS_QUERY_LOG_EVENT corner case; to fix commented out // if (!ev) delete ev; after wl#5599 is merged (ev->update_pos() is done). @ sql/log_event.cc changes due to FC and WQ:s data size and WQ-lenght synch-ions; @ sql/mysqld.cc placeholders for few mts user interfaces variables are added. @ sql/mysqld.h mts user interfaces variables are interfaced. @ sql/rpl_rli.cc Correcting a cast that otherwise would not let relay log change be seen by Worker. @ sql/rpl_rli.h A set of user options is reflected by new members of the central RLI. A user var propagates its value into RLI at slave sys startup and can't affect the running slave anymore until the slave is stopped. @ sql/rpl_rli_pdb.cc Garbage collection in the Partition-to-Worker hash. @ sql/rpl_rli_pdb.h Exetending Slave_jobs_queue::waited_overfill. and Slave_worker::wq_overrun_set. . Overfill is seen as the queue's property whereas wq_overrun_set is about C-W flow-control. @ sql/rpl_slave.cc Initialization of the mts user option in the central RLI is added. Fixing a cast; Todo about ROWS_QUERY_LOG_EVENT; Comments on UNTIL forcing the sequential exec; @ sql/sys_vars.cc A set of mts related user options is added. ------------------------------------------------------------ revno: 3220 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-11-27 17:36:50 +0200 message: wl#5569 Providing relay-log name for wl#5599. Protocol of action on the C and W sides is described in rpl_rli_pdb.h. Erroring out in case of parallel exec and ROWS_QUERY_LOG_EVENT. (todo: the native sequential mode for the event needs some revision, in particular `delete ev' shall happen *always* in rli->cleanup_context not in two places as of current). @ sql/log_event.cc Erroring out in case of parallel exec and ROWS_QUERY_LOG_EVENT; Deploying C role of handling relay-log name change; @ sql/rpl_rli_pdb.cc Providing relay-log name for wl#5599. Freeing allocated memory for relay-log name at the end of the group execution by Worker. @ sql/rpl_rli_pdb.h Protocol of action on the C and W sides is here. Removing current_binlog; Adding a pointer group_relay_log_name member to st_slave_job_group. ------------------------------------------------------------ revno: 3219 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-26 23:08:30 +0200 message: wl#5569 MTS Partitioning conflict detection and handling is implemented. A new option to run Query in parallel though incompatibly with Rows- case in that the default db not the actual db:s are used as the partition key. User interface gained the global var and the cmd line opt: slave_run_query_in_parallel (Welcome to the set! :-) @ mysql-test/suite/rpl/r/rpl_parallel_conflicts.result new tests result file is added. @ mysql-test/suite/rpl/t/rpl_parallel_conflicts.test Partitioning conflicts detection, handling basic initial test is added. @ sql/log_event.cc Refining parallel vs sequential decider to cover optional support for Query parallelization. @ sql/log_event.h Refining only_serial_exec() with providing hints through two new args. @ sql/mysqld.cc new Query limited parallelization support related. @ sql/mysqld.h new Query limited parallelization support related. @ sql/rpl_rli.h changed are due to new Query limited parallelization support. @ sql/rpl_rli_pdb.cc Conflict detection, waiting, partition release is implemented. ------------------------------------------------------------ revno: 3218 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-andrei timestamp: Fri 2010-11-26 16:15:37 +0000 message: There was a mismatching between the number of fields read and write and by consequence the read was failing for the Slave_worker. ------------------------------------------------------------ revno: 3217 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-25 11:03:54 +0200 message: wl#5569 merging with wl#5599 piece of code ------------------------------------------------------------ revno: 3216 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-25 10:47:39 +0200 message: wl#5569 Converting the prototype time db2w hash to be concurrent; Necessary inruduction of the least occupied Worker notion. It's currently computed as Worker having the least number of distributed partitions. Adding parallel support for Query_log_event; caution: 1. the session/default not the actual db as the key 2. may not have been tested against all use cases (e.g int vars) Fixing slave stop issues. @ sql/log_event.cc dding parallel support for Query_log_event that forces changes in both Coord and Worker scopes; a query can have both B and g parallel properties. @ sql/rpl_rli.h Changes necessary for the concurrent hash. Although east occupied defined as one having the least number of partitions atm, that may be too coarse so a method basing on distributed jobs can be deployed in a while. @ sql/rpl_rli_pdb.cc Least occupied defined as one having the least number of partitions atm (may be too coarse so a method basing on distributed jobs can be deployed in a while). @ sql/rpl_rli_pdb.h Changes necessary for the concurrent hash and the parallelizable query-log-event. @ sql/rpl_slave.cc rli->least_occupied_workers is prepared to be used in the least occupied calc as a finer option. Improving Workers stop. ------------------------------------------------------------ revno: 3215 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-11-22 20:57:13 +0200 message: wl#5569 extinding futher interfaces to wl#5599 with propagating future_event_relay_log_pos to the Worker exec context. @ sql/log_event.cc extract the stored future_event_relay_log_pos to copy to worker rli. @ sql/rpl_slave.cc Store future_event_relay_log_pos into event member. ------------------------------------------------------------ revno: 3214 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-11-20 19:23:42 +0200 message: wl#5569 MTS Worker pool start, stop, kills, error out implementation. @ mysql-test/extra/rpl_tests/rpl_parallel_load.test increasing the load param to get more reliable benchmarking data out of the test. @ mysql-test/suite/rpl/r/rpl_parallel_start_stop.result a new tests results. @ mysql-test/suite/rpl/t/rpl_parallel_start_stop.test worker pool start, stop, kills, errors testing. @ sql/log_event.cc removing a false and unnessary extention-arg to exit_cond(); Refining start-stop alg to base on the Worker private info, not the common info. In particular handshakes organized through magic value of length of the Worker private queue to is set by an initiator. @ sql/rpl_slave.cc Starting a worker thread with passing its Slave_worker * pointer. Simplifying and refining start-stop. @ sql/sql_class.h removing a false and unnessary extention-arg to exit_cond(); @ sql/sys_vars.cc Reckoning a magic value outside of the valid range for pending_jobs. ------------------------------------------------------------ revno: 3213 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-19 16:51:58 +0200 message: wl#5569 recovery interfaces for wl#5599 implementation. The essence of this patch is to provide GAQ object implimentation and valid life cycle. The checkpoint handler prior to call store methods of wl#5599 is supposed to invoke rli->gaq->move_queue_head(&rli->workers). See a simulation of that near ev->update_pos() of the mail sql thread loop. The checkpoint info is composed as instance of Slave_job_group to reside as rli->gap->lwm. Todo: uncomment + // delete ev; // after ev->update_pos() event is garbage once the real checkpoint has been done. Todo: the real implemention needs to take care of filing Slave_job_group::update_current_binlog as initially so at time of executing Rotate/FD methods. + // experimental checkpoint per each scheduling attempt + // logics of next_event() + + rli->gaq->move_queue_head(&rli->workers); @ sql/log_event.cc Log_event::get_slave_worker_id() got shaped more to the final version with elements necessary to rli->gaq lify cycle. @ sql/log_event.h Log_event::mts_group_cnt is added as a part of GAQ index propagation path from C to W. @ sql/rpl_rli.h Further extension to RLI necessary to the distribution hash function (APH). @ sql/rpl_rli_pdb.cc Implementing circular_buffer_queue::*queue and few other methods incl ulong Slave_committed_queue::move_queue_head() the main concern for checkpoint. @ sql/rpl_rli_pdb.h Extending classes with few new member definitions necessary for GAQ interface / checkpoint / recovery. @ sql/rpl_slave.cc Simulation of the lwm-checkpoint and changes due to rpl_rli_pdb classes extensions. ------------------------------------------------------------ revno: 3212 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-18 16:50:54 +0200 message: wl#5569 wl#5599 Recovery related. Prototyping the worker RLI instantiation, to be elaborated on. ------------------------------------------------------------ revno: 3211 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-18 16:00:52 +0200 message: wl#5569 MTS Extending the wl#5563 prototype gradually. This commit addresses: 1. recovery interface (a new Worker rli plus rli->gaq and pseudo-code for checkpoint to update GAQ and the central RLI recovery table. Wrt rli, C and W execute do_apply_event(c_rli) where c_rli is the central instance. C executes update_pos(c_rli), but W update_pos(w_rli). others: - decreased processing time for rpl_parallel, serial. @ sql/log_event.cc Enhance Log_event::get_slave_worker_id() to classify events by set of parallelization properties; Presence of a property in an event forces some actions both on C and W side. en_queue etc are prepared to turn into circular_buffer_queue methods. Pseudo-coded numerious todo:s wrt to low-level-design implementation. Deployed changes due to Worker private rli. Annotated on Deferred Array for B,p,r property events. . delete ev is moved from C to W which is fault-prone but it could not be kept any longer as a part of de_queue() that transits into cir_buf_queue class. @ sql/log_event.h removed `soiled' that was used to make delete ev run safely. Added Log_event methods identifying the parallelization properties, incl - contains_partition_info() to identify events containing info to be processed by the partition hash func - starts,ends_group() - also updated the list of only_serial(). @ sql/rpl_rli.cc Only Coordinator can destroy Workers dynarray; Relay_log_info::get_current_worker() turned out to become more complicated, see comments; Reminder to migrate rli->future... into ev-> future_event_relay_log_pos which would make Worker to find the value out the event's context; Prototyped // w->flush_info() in stmt_done; @ sql/rpl_rli.h The worker RLI has `this_worker' pointing to the actual worker instance. @ sql/rpl_rli_pdb.cc Annotated with fine details APH etc implementation. @ sql/rpl_rli_pdb.h Trasformed earlier queue struct into a family of classes. Recovery interface: last_group_done_index of Slave_worker to be filled in with an index of GAQ queue by W. To poll the value by C at checkpoint. Added CGEP to W context (sim to CGAP of C). @ sql/rpl_slave.cc Simplified the Worker poll. Deployed worker rli initialization. Recovery: rli->gaq is instantiated by C at worker poll activization. Recovery: pseudo-code for checkpoint in next_event(). @ sql/sys_vars.cc editted help lines for slave_max_pending_jobs. ------------------------------------------------------------ revno: 3210 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2010-11-14 11:55:32 +0000 message: Post-push fix for WL#5599. ------------------------------------------------------------ revno: 3209 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-12 17:58:12 +0000 message: Post-push fix for WL#5599. ------------------------------------------------------------ revno: 3208 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-11 11:53:01 +0000 message: WL#5599 The patch changed the handler's functions, i.e. init_info, check_info, flush_info, remove_info and end_info and the related private member functions, in both file and table handlers, to accept an index that identifies the information that will be read or written. This is necessary now because the handlers will be used by the workers to read and write information from file(s) and table and there may be several workers running at the same time and thus an index is used to identify the worker that is accessing information. This change is also necessary for the multi-master replication as information from each master must be uniquely identified. @ sql/binlog.cc Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/log_event.cc Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/rpl_constants.h Introduced an array and a variable that states the array's size and are used as parameters to init_info, check_info, flush_info, remove_info and end_info. . This is ok for now as we assume a single master and uses slave's id to identify entries in a system table, if there is any. However, this code needs to be changed when we start handling multi-master replication. @ sql/rpl_info.cc Introduced an array and a variable that states the array's size and are used as parameters to init_info, check_info, flush_info, remove_info and end_info. . This is ok for now as we assume a single master and uses slave's id to identify entries in a system table, if there is any. However, this code needs to be changed when we start handling multi-master replication. @ sql/rpl_info.h Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/rpl_info_factory.cc Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. . Removed static references to field indexes used as primary key. @ sql/rpl_info_factory.h Removed static references to field indexes used as primary key. @ sql/rpl_info_file.cc Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/rpl_info_file.h Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/rpl_info_handler.h Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/rpl_info_table.cc Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. . Changed the calls to find_info_for_server_id. @ sql/rpl_info_table.h Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/rpl_info_table_access.cc Changed the funciton find_info_server_id in order to put the cursor at a row identified by uidx which is an array of fields that composes a primary key. . The name of the function was also changed to reflect the new behavior. @ sql/rpl_info_table_access.h Changed the funciton find_info_server_id in order to put the cursor at a row identified by uidx which is an array of fields that composes a primary key. . The name of the function was also changed to reflect the new behavior. @ sql/rpl_mi.cc Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. . Moved the call to handler->flush_info from write_info to flush_info in order to avoid passing uidx and idx as parameters. @ sql/rpl_mi.h Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/rpl_rli.cc Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/rpl_rli.h Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/rpl_rli_pdb.cc Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. . Moved the call to handler->flush_info from write_info to flush_info in order to avoid passing uidx and idx as parameters. @ sql/rpl_rli_pdb.h Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. @ sql/rpl_slave.cc Changed the calls to init_info, check_info, flush_info, remove_info and end_info and the related private member functions. ------------------------------------------------------------ revno: 3207 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-11-10 10:57:13 +0000 message: Refactory to start work on WL#5599. ------------------------------------------------------------ revno: 3206 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 13:34:18 +0000 message: Removed mysql-test/collections/mysql-next-mr.crash-safe.* in the WL#5569. ------------------------------------------------------------ revno: 3205 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 13:04:14 +0000 message: merge mysql-next-mr.crash-safe --> mysql-next-mr-wl5569 Conflicts: . sql/CMakeLists.txt . sql/Makefile.am . sql/sql_class.h . sql/rpl_slave.cc ------------------------------------------------------------ revno: 3204 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 11:39:37 +0000 message: merge mysql-next-mr-wl5563-labs --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3203 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 23:33:37 +0300 message: wl#5563 simplifying memory handling for the Coor-Workers transport to avoid sporadic crashes ------------------------------------------------------------ revno: 3202 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 21:19:56 +0300 message: wl#5563 leaving out a fine garbage collection. That task is unnessary to solve at prototyping time. Update-pos routine to be implemented is going to eliminated that piece of code ------------------------------------------------------------ revno: 3201 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 20:38:35 +0300 message: wl#5563 Extending the tests base to split the former rpl_parallel into two to run in serial exec mode as well. @ sql/log_event.cc Condition-out few debug purpose print:s. ------------------------------------------------------------ revno: 3200 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 11:49:00 +0300 message: wl#5563 improved test; fixed a delete issue that was used to crash; added @@global.slave_local_timestamp to fill in timestamp col slave clock value. Performance growth can be seen through the test. todo: merge with Alfranio work on hashing and dyn alloc of PFS obj:s. ------------------------------------------------------------ revno: 3199 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Wed 2010-09-15 14:51:49 +0300 message: wl#5563 tests for the wl. Number of workers and iterations can be tuned. todo: convert as param:s to pass to the test through mtr ------------------------------------------------------------ revno: 3198 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Mon 2010-09-13 18:22:41 +0300 message: wl#5563 adding an ingeneous no-stress-attempting-yet test that also fired an assert. Refined the Worker instance ref computing because cleanup_context() is executed by the sql-thread the coordinator as well ------------------------------------------------------------ revno: 3197 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Mon 2010-09-13 13:15:38 +0300 message: wl#5563 Rows-event parallelization basically is implemented although tested shallowly. Write access to rli central stuct by workers may not be eliminated fully at this phase. E.g that relates to errors. todo: to prove rli gets out of Worker scope todo: to provide a stress test @ sql/log_event.cc changing from the direct to api-based access to RBR-applying context. @ sql/rpl_rli.cc implementation of RBR-applying context api. @ sql/rpl_rli.h copying RBR applying part of context info from rli to the Worker class; RLI gets accessor methods to RBR applying context to choose a right object from either the central (RLI) or the Worker repository. ------------------------------------------------------------ revno: 3196 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Sat 2010-09-11 17:00:08 +0300 message: wl#5563 adding Rows-event limitted to one Worker support. Deferred deletion did not check emptyness of the list ------------------------------------------------------------ revno: 3195 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-10 20:36:07 +0300 message: wl#5563 correcting comments to indicate less limitations ------------------------------------------------------------ revno: 3194 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-10 20:32:39 +0300 message: WL#5563 Prototype for Slave parallelized by db name More progress to the WL in that the STMT binlog-format works while the conceptual limits are held. That is no query/transaction is allowed to deal with more than one db. Addressed a complication in that update pos method that is run by Coordinator belongs to Log_event hierarchy and therefore the event deletion now by Worker must be careful. Todo: 1. (High prior) fix Row-format complications 2. (Hight prior) Elaborate on the hash function to be a function on db text name 3. (Optional) Consider moving update_pos to the RLI class to get rid of the delete logics complication. How-to-use: The instuction can be found in comments of the previous commit, see there for more details. In brief though, the db names have to follow a pattern: `test[0-9]'. E.g test0, test1, test2, test3 for the default four Worker threads. Slave side has to set @@global.slave_exec_mode=PARALLEL; before START SLAVE. @ sql/log_event.cc Hashing function is refined to circumvent lack of db info in associated with a query event internal events; Executed event can't be just delete-d by Worker since SQL-thread needs it to update positions. Hence a piece of code added to defer delete till time SQL has marked the event as `soiled'. @ sql/log_event.h A new member to allow deletion of an executed event. @ sql/rpl_rli.h A new member to Worker class for deferred delete of the exec-d event. A new member to RLI class to memorize the last time assigned worker. @ sql/rpl_slave.cc Setting marks on event by SQL coord thread. ------------------------------------------------------------ revno: 3193 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Thu 2010-09-09 21:43:16 +0300 message: WL#5563 Prototype for Slave parallelized by db name This is an intermediate commit that indicates some progress. Namely, the worker pool operates correctly and with signs of scalable performance. How to test: connection master; set @@global.binlog_format=statement; connection slave; set @@global.slave_exec_mode=PARALLEL; set @@global.binlog_format=mixed; show processlist; => IO, SQL threads + 4 workers by default change master to ... connection master; create database test0; create database test1; create database test2; create database test3; # create databases with magic names "test[0-9]+", where the number will index # a worker. create database test0; create database test1; create database test2; create database test3; # create tables. they are only of MyISAM type for now use test0; create table tm_1(a int, b int) engine=myisam; use ... # DML on tables: use test[0-3]; insert into tm_1 values (1,0); ... ... connection slave; # monitor CPU (visually this time: top etc) # check correctness e.g select count(*) from test[0-3].tm_1; connection master; select count(*) from test[0-3].tm_1; @ sql/log_event.cc mts coordinator (C) and worker (W) code is added. C - fills in a job assignment and queues it to a selected via get_slave_worker_id (todo: to elaborate on) W private queue; W - spins in wait-extract-exec loop of slave_worker_exec_job(); Redefining Log_event::apply_event() to continue serving as the usual serial applier and making it to distribute events between Workers. @ sql/log_event.h Log_event accepted get_slave_worker_id() to assign a worker basing on hashing function implemented in the method, only_serial_exec() to filter out not-parallelizable events; do_apply_event() made of public scope; @ sql/mysqld.cc PSI interfaces required to add keys for all mutex:s, cond:s that MTS introduces to sources. Only for the prototype implementation declaration contain explicit max index in arrays (16) - to be elaborated on in following patches. `slave_parallel_workers' as placeholder of the value for a new glob sysvar is added. @ sql/mysqld.h externalization to access `slave_parallel_workers' and the PSI keys in other parts of the code. @ sql/rpl_rli.cc Instantiations, initial allocations and destruction for RLI members added by MTS; @ sql/rpl_rli.h Definitions for the Worker, its communication with Coordinator and gathering statistics; is_parallel_exec() is a compromize because of unreported yet bug (see comments). @ sql/rpl_slave.cc Added the worker pool initialization, termination. Added the thread handler for Worker. @ sql/rpl_utility.h macros that can be used in near future are added. @ sql/sql_class.h More values to the slave_exec_mode set (which mistakenly is defined as sys_var_enum); refining exit_cond() to allow the mutex not be released. The default behaviour when one arg supplied is not changed. @ sql/sys_vars.cc A new global sysvar for number of workers. Is supposed to be updatedable in run time. todo: (bug report) notice static enum Slave_exec_mode - it should be Sys_var_*set*. ****** wl#5569 MTS fixing explicit error code in rpl_parallel_start_stop that changed due to merge with trunk. ****** WL#5596 MTS Here is the total cset combining all revisions done since Sep 2010. Comments from the original commits are pasted in reverse chronological order. ------------------------------------------------------------ revno: 3364 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 17:09:22 +0300 message: wl#5569 MTS Refining rpl_rotate_logs that could not produce deterministic output. The list of binlogs contained one binlog more than expected. ------------------------------------------------------------ revno: 3363 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 14:56:01 +0300 message: updating result files that were left incorrect by the last merge. ------------------------------------------------------------ revno: 3362 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 14:44:59 +0300 message: wl#5569 MTS Failure in recovery when binlog-checksum is active. The reason of the failure was in that parsing of relay log by MTS recovery gaps computing did not make sure to use the relay-log own FormatDescriptor events that contain checksumming info for all events in the log. Fixed with taking care to find out the checksum algorithm for every relay log as the first step of MTS recovery gaps computing. ------------------------------------------------------------ revno: 3361 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-08-17 11:21:23 +0300 message: merge from trunk forced to resolve few semantical conflicts caused by changes in THD::enter_cond() of the trunk. ------------------------------------------------------------ revno: 3360 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-27 08:56:14 +0100 message: Fixed failure in test rpl_mts_check_concurrency when running in the mts collection. ------------------------------------------------------------ revno: 3359 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-26 19:46:41 +0100 message: Added a test case that checks if MTS allows to concurrently access the replication tables, and as such, concurrently commit transactions that update different databases. ------------------------------------------------------------ revno: 3358 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 20:08:43 +0100 message: Configured rpl_parallel_switch_sequential to run in row and mixed mode to avoid cluttering the error log with messages on unsafe execution. ------------------------------------------------------------ revno: 3357 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 19:02:14 +0100 message: This patch contains the following fixes: . Removed suppressed warning introduced in the wrong test case (i.e. rpl_corruption) and put it in the correct one (i.e. rpl_row_corruption). . Introduced variable to avoid clutering the error log with several warning messages on unsafe execution. ------------------------------------------------------------ revno: 3356 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 11:01:12 +0100 message: This patch has the following changes: . Specific directories were created for the MTS runs in the default.push. . Warning message was suppressed in the rpl_corruption.test. . Annoying debug outputs were removed from the error log. However, this is a temporary solution as it forbids to enable traces. ------------------------------------------------------------ revno: 3355 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-20 11:56:40 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3354 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-19 22:26:30 +0300 message: wl#5569 MTS valgrind reported a stack on rpl_savepoint. The problem appears to be in that at computing slave_sql_running_state in show_mater_info() the sql thread proc_info pointer could refer to a value in a stack that has already gone. Fixed with making proc_info to point to a string literal. ------------------------------------------------------------ revno: 3353 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-19 17:46:43 +0100 message: Suppressed warning messages that could potentially cause problems while running mts crash safe test cases. ------------------------------------------------------------ revno: 3352 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-18 21:46:45 +0300 message: wl#5569 MTS Cosmetic changes are done to address readability and clearness of source code of the MTS patch. ------------------------------------------------------------ revno: 3351 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-18 14:52:44 +0300 message: wl#5569 MTS Inadvertently introduced hunk two rev:s back is reverted to please rpl_*_mts_crash_safe. ------------------------------------------------------------ revno: 3350 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-17 00:51:45 +0300 message: wl#5569 MTS fixing build issue for embedded. Public visibility for Rows_log_event::do_apply_event() is restored. ------------------------------------------------------------ revno: 3349 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-16 20:08:31 +0300 message: wl#5569 MTS The patch contains improvements after code review. Changes are mostly consmetic. ------------------------------------------------------------ revno: 3348 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-16 02:11:11 +0300 message: bug#12755663 MTS: RPL_CIRCULAR_FOR_4_HOSTS FAILS: CANT EXECUTE THE CURRENT EVENT GROUP MTS stopped with an error in the middle of the test. The reason is that a group of events from the slave itself was processed partly to modify the group position. In the following restart the wrong group bondary made slave either to error out or assert. Fixed with locating a possible race condition allowin Coordinator to ignore actual failed status of a Worker. So in the case of the test, the slave server group can't be started. Notice, this is a trial patch since I can't catch the failure on available to me hosts at all. ------------------------------------------------------------ revno: 3347 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-07-14 12:40:06 +0300 message: WL#5569 MTS further extensive rpl_circular_for_4_hosts exersices with --repeat 10 --parallel=8 revealed a race condition in that Coordinator might miss to catch not-running status for a Worker. That made Coordinator to skip only a part of a group of the slave server own events so the slave stops at not the bondary of a group. Fixed with moving marking of the errored-out Worker as failed prior to its APH entries release. TODO: notice there can be a possibility to stop at not the boundary due to graceful STOP SLAVE if one is run at time of skipping self-originated events. However this issue belongs to STS and might be similar with BUG@12604951 and BUG@12728160. ------------------------------------------------------------ revno: 3346 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-07-14 08:03:55 +0100 message: Post-push fixes for WL#5569 Injecting faults while updating a myisam table requires to flush the changes before committing suicide. So we have introduced the follwing code: DBUG_EXECUTE_IF("crash_after_commit_and_update_pos", - DBUG_SUICIDE();); + sql_print_information("Crashing crash_after_commit_and_update_pos."); + flush_info(TRUE); + DBUG_SUICIDE(); Besides we improved some comments. ------------------------------------------------------------ revno: 3345 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-13 16:23:57 +0100 message: WL#5569 ------------------------------------------------------------ revno: 3344 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-13 00:10:43 +0300 message: wl#5569 MTS merge trunk -> wl5569-tree ------------------------------------------------------------ revno: 3343 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-12 23:36:17 +0300 message: wl#5569 MTS adding suppression due to expected warning to rpl_circurlar_for_4_hosts; decreasing a loop limit in rpl_parallel_switch_sequential in case of statement format. ------------------------------------------------------------ revno: 3342 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-12 14:46:23 +0300 message: WL#5569 MTS Fixing code and test due to rpl.rpl_circular_for_4_hosts mismatch failure, like http://pb2.norway.sun.com/?action=archive_download&archive_id=3608382. The reason of the mismatch was that when having two group of events to execute, the first for a Worker and the 2nd for Coordinator, Coordinator waited for the 1st group completion but did not verify success of synchronization. So in a case of the failed applying of the 1st group processing of the 2nd could find an inconsistent state to end up with a segfault (even though only the mismatch has been seen so far). ------------------------------------------------------------ revno: 3341 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-10 22:40:01 +0100 message: Avoiding busy waiting when running mts recovery tests. ------------------------------------------------------------ revno: 3340 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-09 23:11:58 +0100 message: Removed --slave-checkpoint-period from MTS test cases. ------------------------------------------------------------ revno: 3339 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-09 23:08:07 +0100 message: Improved test cases for the WL#5569. ------------------------------------------------------------ revno: 3338 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 22:40:52 +0300 message: wl#5569 MTS The patch refines logics of applying phase of MTS-recovery to always applying events that are for Coordinator; fixes few tests to make them passable on PB; makes GAQ size to be of checkpoint_group value. ------------------------------------------------------------ revno: 3337 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 07:54:34 +0100 message: Reduced the timeout period to run the checkpoint routine by setting slave-checkpoint-period to 30. ------------------------------------------------------------ revno: 3336 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 07:44:35 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3335 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-06 12:46:05 +0300 message: wl#5569 MTS refining wait for db-hash entry release at event distribution. The graceful STOP is not accepted at this point so Coordinator continues to stay in a loop. ------------------------------------------------------------ revno: 3334 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-05 20:43:04 +0300 message: bug#12719875 possible MTS recovery issue. MTS stopped with an error after failing to apply an event. It turned out that the event was sceduled incorrectly due to earlier stop by Single-Threaded Slave not at the group boundary but rather in the middle of it. Fixed with forcing CREATE..SELECT be logged as two groups. The CREATE-TABLE group is surrounded with its own BEGIN/COMMIT braces. ------------------------------------------------------------ revno: 3333 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-04 18:14:09 +0300 message: wl#5569 MTS Adding a rule to run PB with all suites in MTS with binlog-format ROW. ------------------------------------------------------------ revno: 3332 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-03 23:29:34 +0300 message: wl5569 MTS cleanup in one file. ------------------------------------------------------------ revno: 3331 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-03 23:16:02 +0300 message: wl5569 MTS bzr commit mail address changed; a minor cleanup to make mts_is_worker() with const argument; releasing a test to run in MTS. ------------------------------------------------------------ revno: 3330 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-02 08:58:56 +0100 message: Fixed use of the performance schema in the replication code and concurrency issue in the IO Thread. In particular, the IO Thread was calling flush_master_info without grabbing locks. ------------------------------------------------------------ revno: 3329 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 16:41:35 +0300 message: wl5569 MTS merging from the main repo. ------------------------------------------------------------ revno: 3328 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 15:48:25 +0300 message: wl#5569 MTS the final cleanup patch. There are few glitches that were considered as tolerable at least for time of the total wl's code is being reviewed. That includes: - no support to old load-data events - no support for FK to add to the list, there are few places in the patch that suggests to deploy error branches each time flush_info() is called. ------------------------------------------------------------ revno: 3327 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 13:16:52 +0300 message: wl#5569 MTS The patch cleans up some host of code. ------------------------------------------------------------ revno: 3326 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-28 11:30:18 +0300 message: wl#5569 MTS replacing views with regular tables for consistency verification in rpl_parallel_innodb. Also a minor cleanup in rpl_parallel is done. ------------------------------------------------------------ revno: 3325 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-27 20:31:45 +0300 message: wl#5569 MTS Cleanup and addressing sporadic rpl_temp_table_mix_row failure in post-execution mtr.check_testcase(). The check of the test failure was caused by faulty optimization in avoiding to migrate temporary tables from Coordinator to Workers in case of rows-event assignement. while it's correct with the homogenous rows-event only load, the mixture can fail. Fixed with removing the optimization so map_db_to_worker() always relocates which is somewhat suboptimal and should be improved in future. ------------------------------------------------------------ revno: 3324 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-27 13:12:52 +0100 message: Ensured that updates to the worker_info_repository are transactional and fixed the slave_checkpoint_group_basic test case. ------------------------------------------------------------ revno: 3323 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-26 13:02:59 +0100 message: Fixed test case. ------------------------------------------------------------ revno: 3322 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-06-25 15:14:24 +0100 message: Introduced test case for recovery with MTS and fixed bugs in recovery. ------------------------------------------------------------ revno: 3321 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 15:38:19 +0300 message: wl#5569 MTS This patch makes a bit of cleanup, addresses one memory-allocation todo and completes fixing valgrind report (rpl_parallel_start_stop) due to strings allocation in Slave_job_group items. ------------------------------------------------------------ revno: 3320 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 12:38:34 +0300 message: wl#5569 MTS this patch completes the previous one to fixes a result file and make the innodb specific test verification to base on tables not views. ------------------------------------------------------------ revno: 3319 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 00:11:22 +0300 message: wl#5569 MTS this is an exploratory patch to sort out if verification method what was based on views has its own not related to mts flaw. The patch calls verification macro on the tables that required some adjustment. ------------------------------------------------------------ revno: 3318 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-23 07:56:15 +0300 message: wl#5569 MTS fixing results of mysqld--help-win. ------------------------------------------------------------ revno: 3317 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 19:20:40 +0100 message: merge mysql-next-mr-wl5569 (local) --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3316 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 19:17:43 +0100 message: In some platforms, such as Windows, thread's wait time is stored in 100ns units. However, when computing the difference between two values, the result value was not multiplied by 100. Besides, there was a casting problem when the aforementioned result value was assigned to an ulong. ------------------------------------------------------------ revno: 3315 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 18:54:23 +0100 message: Fixed how mts copes with recovery. ------------------------------------------------------------ revno: 3314 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-21 19:10:54 +0300 message: wl#5569 MTS Fixing valgrind warnings. ------------------------------------------------------------ revno: 3313 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-21 18:15:43 +0300 message: wl#5569 MTS rpl_parallel_start_stop.test could fail sporadicaly with timeout. ------------------------------------------------------------ revno: 3312 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 23:21:56 +0100 message: merge mysql-next-mr-wl5569 (local) --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3311 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 23:19:06 +0100 message: Fixed error when computing the Lower-Water-Mark. If two or more jobs were removed from the Group of assigned jobs and one of the jobs had a non-empty group relay log but the last one had an empty group relay log. The Lower-Water-Mark was not correctly updated, because the algorithm assumed that the group relay log was null. ------------------------------------------------------------ revno: 3310 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 11:52:44 +0100 message: Fixed valgrind errors. Slave_job_group was silently being cast to LOG_POS_COORD while calling sort_dynamic(&above_lwm_jobs, (qsort_cmp) mts_event_coord_cmp) and by consequence mts_event_coord_cmp(LOG_POS_COORD *, LOG_POS_COORD *). This had two problems: . The first two entries in the Slave_job_group were not a pointer to a char * and my_offset. . Even if the first two entries were char * and my_offset, such casting could lead to alignment problems. To fix the problem, we avoid this casting. ------------------------------------------------------------ revno: 3309 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 19:14:50 +0300 message: wl#5569 MTS fixing slave_transaction_retries_basic_64.result ------------------------------------------------------------ revno: 3308 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 16:11:25 +0300 message: wl#5569 MTS fixing tests. ------------------------------------------------------------ revno: 3307 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 12:33:36 +0300 message: wl#5569 MTS Fixing rpl.rpl_mixed_binlog_max_cache_size that revealed incorrect asynchronous handling of a Rotate event which does not split the current group and therefore has to be executed after all previously scheduled events. Fixing sensetivity of two other tests to mtr's invocation environment that includes inital values of slave_parallel_workers and slave_transaction_retries. ------------------------------------------------------------ revno: 3306 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 09:04:19 +0100 message: Fixed some windows failures. ------------------------------------------------------------ revno: 3305 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-06-18 19:58:21 +0100 message: Fixed some recovery issues. ------------------------------------------------------------ revno: 3304 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 21:01:58 +0300 message: wl#5569 MTS fixing tests and a segfault at the end of handle_slave_sql() happened after worker initialization failed (e.g rpl_row_log on win). ------------------------------------------------------------ revno: 3303 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 18:34:16 +0300 message: wl#5569 MTS fixing tests. ------------------------------------------------------------ revno: 3302 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 14:00:41 +0300 message: wl#5569 MTS fixing rpl_row_basic_3innodb similarly to the previous patch. ------------------------------------------------------------ revno: 3301 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 13:51:59 +0300 message: wl#5569 MTS fixing few tests. 1. Policy is implemented for reacting with a warning in a case of failing worker leaves the total slave state with gaps thereby inconsistent. 2. Two tests that were used to time out due to reset master/slave was disabled in there. ------------------------------------------------------------ revno: 3300 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 02:24:59 +0100 message: Removed unnecessary test cases and augment others in order to test recovery. ------------------------------------------------------------ revno: 3299 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-16 19:46:22 +0300 message: wl#5569 MTS fixing slave_parallel_workers_basic and rpl_stop_middle_group which cant run in MTS ------------------------------------------------------------ revno: 3298 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-16 11:29:53 +0300 message: wl#5569 MTS adding new tests to sys_vars.\ ------------------------------------------------------------ revno: 3297 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:41:32 +0100 message: WL#5569 Adding a global suppression for the warning that may appear when stopping the slave sql thread in the middle of a group. This should affect MTS mode only. ------------------------------------------------------------ revno: 3296 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:40:41 +0100 message: WL#5569 Renames worker-info-repository to slave-worker-info-repository in some tests option files. ------------------------------------------------------------ revno: 3295 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:32:37 +0100 message: WL#5569 More test fixes. Removing remaining prefixes 'mts' from mts variables, which have been renamed recently. ------------------------------------------------------------ revno: 3294 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 00:27:20 +0100 message: WL#5569 Fixing rpl_parallel result file. ------------------------------------------------------------ revno: 3293 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:41:33 +0300 message: wl#5569 MTS correcting --slave-parallel-workers in few more files ------------------------------------------------------------ revno: 3292 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:31:46 +0300 message: wl#5569 MTS correcting --slave-parallel-workers in collections/default.push ------------------------------------------------------------ revno: 3291 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:12:11 +0300 message: wl#5569 MTS Cleanup, including 1. decreasing number and renaming system variables. Important for debugging command line options are replaced with reasonble constant values and only necessary are retained. 2. Small encapsulation in ha_blackhole.cc is done. ------------------------------------------------------------ revno: 3290 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 15:59:23 +0100 message: Fixed replication valgring failures caused by the MTS. ------------------------------------------------------------ revno: 3289 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 21:23:13 +0300 message: wl#5569 MTS wl#5754 Query event parallel execution Fixing failing tests and a failure in gathering accessed databases that was caused by a recent merge from trunk. ------------------------------------------------------------ revno: 3288 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 13:35:20 +0300 message: merge from trunk ------------------------------------------------------------ revno: 3287 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 12:27:38 +0300 message: wl#5569 MTS Fixing failing tests due to a. a flaw in `isolated parallel' mode implementation. Isolation applies to a group of event rather than to an instance. And event that contains over-max accessed db:s or event from Old master trigger marking the current being scheduled group. Such group will be executed having all prior scheduled done and nomore will be scheduled until the group is done. b. Notification to Coordinator about errored-out Worker is corrected. ------------------------------------------------------------ revno: 3286 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-12 22:33:32 +0300 message: wl#5569 MTS making default.push to run rpl suite with non-default --mts-slave-parallel-workers > 0 in all three format/mode (row,stmt, mixed). The default is run for all suites in mixed mode and rpl suites with row+ps, stmt formats. ------------------------------------------------------------ revno: 3285 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-12 22:05:05 +0300 message: wl#5569 MTS manual merge with few fixes for segfault of the last merge from the trunk etc, compilation issue on embedded. ------------------------------------------------------------ revno: 3284 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-09 18:35:59 +0100 message: Post-fixes for merge. Fixed compilation in Windows and removed an used options. ------------------------------------------------------------ revno: 3283 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-09 16:27:47 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3282 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-06 13:51:19 +0300 message: wl#5569 MTS STOP SLAVE now stops consistently w/o gaps, KILL shall be used for an urgent stop, an error case behaves like the killed. For instance, a Worker errors out, it sends KILL to Coordinator through THD::awake(), and Coordinator kill the rest through setting a special Worker-running status to killed (which breaks the read-exec loop of a Worker). ------------------------------------------------------------ revno: 3281 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-05 20:01:51 +0300 message: wl#5569 MTS More cleanup, fixes due to found issues when running tests, some improvements incl in stopping Workers to make routine to distinguish between killed and gracefully stopped cases so in the end STOP SLAVE will guarantee consistent state (some todo remains still). ------------------------------------------------------------ revno: 3280 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-05-30 13:05:07 +0300 message: WL#5569 MTS WL#5754 Query event parallel applying ----------------------------------------------------------------- Aggregating 7 commits that are not pushed yet to the wl5569 repo. Find comments for each cset below. ------------------------------------------------------------------ The current patch addresses concurrent updating slave_open_temp_tables status counter. The former declaration of the underlying server variable is changed from ulong to int32. While that might affect (shrink) the actual range, there has been no specified range and now after the number of bits is the same on all platforms the range cat be set to be [0, max(int32)] ****** wl#5569 MTS Wl#5754 Query event parallel appying wl#5599 MTS recovery The patch includes some cleanup, including one for temp tables support, realization of few todo:s. ****** wl#5569 MTS wl#5754 Query event parallel applying More cleanup is done; Fixing temp tables manipulation. Asserting an impossible to support use case of group of events not wrapped with BEGIN/COMMIT. Todo: recognize old master binlog to refuse to run in parallel. ****** wl#5569 MTS Implementation of giving out the applier role to Worker for all cases but ones dealing with the Coordinators state. That includes Query event with over-max-db:s and Load-data related events. The current patch also makes old master binlog be handled by MTS though sometimes e.g for Query event to switch to the sequential mode. Fixing a race condition making C to wait endlessly if a Worker has exitted due to its applying error. ****** wl#5569 MTS correcting an assert that was used to fire as warned in the previous commit. Parallel feature tests pass now. ****** wl#5569 MTS This patch contains cleanup and simplification of logics of handling some events sequentially by Coordinator and adds memory-allocation failure branch to workers starting routine. ****** wl#5569 MTS An intermediate patch to address few issues raised by reviewers. To sum up, it's about cleanup and logics simplification of event distribution to Worker and consequent actions. Some efforts were paid to support Old Master Begin-less group of events. ------------------------------------------------------------ revno: 3279 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-05-24 17:29:35 +0300 message: WL#5569 MTS WL#5754 Query parallel appying Changing implementation of temporary tables support in MTS. Cleanup, fixing few todo:s and few potential issues found. ------------------------------------------------------------ revno: 3278 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-05-19 12:36:28 +0300 message: wl#5569 MTS Support for ROWS_QUERY_LOG_EVENT is added. It required refactoring of its handling in the canonical sequential mode. The event life suggests its behavior similar to objects associated with Table_map, in particural, its destoying to occur at the end-of-statement time. Tested against existing ROWS_QUERY_LOG_EVENT feature tests incl rpl_row_ignorable_event in both sequential and parallel mode. ------------------------------------------------------------ revno: 3277 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-05-16 22:43:58 +0300 message: wl#5569 MTS Simplifying Coordinator-Worker interfaces. In essence after this patch Worker execute events in its private context (class Slave_worker :public Relay_log_info). The only exception is Query referring to temporary table. The temp:s are maintained in the Coordinator's "central" rli; removing some dead code; performing a lot of cleanup. There are few todo items incl: 1. To implement several todo:s scattered across MTS' code and tests (e.g to restore protected for few members of RLI of rpl_rli.h); 2. to cover Rows_query_log_event that currently can cause hanging (e.g rpl_parallel_fallback) 3. To sort out names of classes based on Rpl_info, possibly remove Rpl_info_worker ------------------------------------------------------------ revno: 3276 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-05-06 21:33:32 +0300 message: wl#5569 MTS improving benchmarking test. ------------------------------------------------------------ revno: 3275 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-04-06 15:51:58 +0300 message: wl#5569 MTS Statistics for Workers and Coordinator incl waiting times, sleeping is reported now into the error log as slave stopping time. ------------------------------------------------------------ revno: 3274 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-04-05 19:26:37 +0300 message: wl#5569 MTS restoring previous 4 default workers that rpl_parallel works with. ------------------------------------------------------------ revno: 3273 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-04-03 13:07:30 +0300 message: wl#5569 MTS Benchmarking related patch uniforms rpl_parallel to be run with arbitrary number of workers, db:s, tables, etc. TODO: to restore the final constinency check which is given out temporary while i could not find a way to leave it surrounded with a --dis/en-able* stanza. ------------------------------------------------------------ revno: 3272 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-04-02 14:32:02 +0300 message: wl#5569 MTS a test file for benchmarking is added. Benchmarking results can be gained by extracting the master side generating and the slave side applying times like in the following loop: workers=6; for n in `seq 1 3`; do echo; echo loop: $n; echo; my_mtr.sh --mysqld=--mts-slave-parallel-workers=$workers \ rpl_parallel_benchmark --mysqld=--binlog-format=statement \ && cat /dev/shm/var/mysqld.2/data/test/delta.out >> p${workers}_stmt.out 2>&1; done ------------------------------------------------------------ revno: 3271 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-03-30 17:11:24 +0300 message: wl#5754 Query event parallel execution Small cleanup for comments as requested by reviewer. ------------------------------------------------------------ revno: 3270 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-02-27 19:35:25 +0200 message: WL#5754 Query event parallel execution Bundling together implementation the whole DML+DDL Query parallel support. That includes: The earlierst four rev:s to cut off the DML stage of the parallel query project from the following devoted to DDL. The four skeleton parallel applying of Queries containing a temporary table, and implement a core of the design that is the DML queries. Queries can contain arbitrary features including temp tables. The DDL part also refined few items related to the general low-level design. In particular, of the mark of the over-max db:s in the updated-db:s status var is turned to be another new constant value. The very last patch to the bundle addresses the last review mail notes. ------------------------------------------------------------ revno: 3269 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-01-12 01:01:02 +0200 message: merging from mysql-trunk ------------------------------------------------------------ revno: 3268 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-01-12 00:54:12 +0200 message: wl#5569 MTS fixing the worker threads start/stop. ------------------------------------------------------------ revno: 3267 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-27 18:54:41 +0000 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3266 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-24 01:57:03 +0200 message: wl#5569 MTS the timed-wait loop of SQL thread required a break-through parameter in case the signal missed in action and just timeout would be reported ------------------------------------------------------------ revno: 3265 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 19:03:42 +0200 message: merging from the repo wl5569 ------------------------------------------------------------ revno: 3264 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 17:49:19 +0200 message: wl#5569 MTS fixing corner cases that mtr-testing with mts workers against stardard suites reveal. ------------------------------------------------------------ revno: 3263 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 16:00:28 +0200 message: wl#5569 MTS: refining another assert that can force C to delete events that are skipped with the slave skip counter ------------------------------------------------------------ revno: 3262 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 15:34:02 +0200 message: wl#5569 MTS Correcting an assert that is hit by few tests. ------------------------------------------------------------ revno: 3261 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 13:27:15 +0200 message: wl#5569 MTS merging from the repo. ------------------------------------------------------------ revno: 3260 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 13:25:31 +0200 message: wl#5569 MTS fixing failing tests. ------------------------------------------------------------ revno: 3259 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 20:34:26 +0200 message: wl#5569 MTS merging from the repo. ------------------------------------------------------------ revno: 3258 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 20:31:13 +0200 message: wl#5569 MTS fixing tests failure when mtr runs --mts_slave_parallel_workers != 0. rpl000010 is a representative. Fixed with identifying, marking, running carefully ev->update_pos() and destroying an event that can split a group of events to force part to be in different relay logs. ------------------------------------------------------------ revno: 3257 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 13:57:18 +0200 message: wl#5569 MTS and wl#5599 MTS recovery The general recovery implementation is finished by this patch. Tested against ./mtr rpl_parallel_conf_limits. Warning, ./mtr rpl_parallel_conf_limits rpl_parallel_conf_limits ... can fail at the 2nd etc test because of no removal of Worker tables happens at RESET SLAVE. ------------------------------------------------------------ revno: 3256 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 22:12:30 +0200 message: wl#5569 MTS slave_worker_info def is updated in the system db. ------------------------------------------------------------ revno: 3255 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 21:34:58 +0200 message: merging with repo ------------------------------------------------------------ revno: 3254 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 21:31:29 +0200 message: wl#5569 MTS Recovery routine part I: gathering the group recovery bitmap. ------------------------------------------------------------ revno: 3253 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 22:18:33 +0000 message: WL#5599 Fixed routine to compute the bitmap of executed events. ------------------------------------------------------------ revno: 3252 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 21:37:48 +0200 message: wl#5569 MTS adding checkpoint relay_log_name,pos as necessary part to locate a relay-log for recovery. Tested with rpl_parallel. ------------------------------------------------------------ revno: 3251 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 17:58:58 +0200 message: wl#5569 MTS manual merging from the repo and correcting GAQ processing with introducing a volatile byte to indicate whether an item is busy or released. ------------------------------------------------------------ revno: 3250 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-18 21:00:23 +0200 message: wl#5569 MTS fixing --mts-exp-slave-run-query-in-parallel=1 case when Query-log-event can be run in parallel incl DML and DDL. The feature is `exp'erimental still can be tried while there are no temp tables involved neither a db different than the session's default is modified by the query. Tested: Changes sustain mtr rpl_parallel --mysqld=--mts-exp-slave-run-query-in-parallel=1 --mysqld=--binlog-format=statement ------------------------------------------------------------ revno: 3249 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-17 14:46:15 +0200 message: wl#5569 MTS fixing PB2 failures, incl valgrind issues, long exec time and asserting in a test. ------------------------------------------------------------ revno: 3248 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-17 00:00:47 +0200 message: merge from wl#5569 repo to local branch rpl_sequential opt files are added to avoid mtr give up to process a bulk of unsafe warnings. ------------------------------------------------------------ revno: 3247 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-16 23:41:45 +0200 message: wl#5569 MTS Adding transparent support/fallback to the sequential execution cases of 1. Query-log-event 2. Rows_query_log_event info event Both cases can be fully parallelized in future project. Fixing an issue in move_queue_head() that was surficed as an assert in Slave_worker::slave_worker_group_ends(). Fixing destoying an event by Worker. ------------------------------------------------------------ revno: 3246 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-14 16:46:20 +0200 message: merge from wl5569 repo ------------------------------------------------------------ revno: 3245 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-14 10:57:16 +0200 message: wl#5569 MTS a light cleanup to arrange the option/system var names properly - mts_-prefixing, and _exp prefixing for experimental features needed for benchmarking (mts_exp_slave_local_timestamp) or suppored limitly (mts_exp_slave_run_query_in_parallel for Query-log-event). Fixing GAQ size. It might be too tight e.g in case of the max WQ length of 1; tested with running rpl_parallel supplying --mts-slave-worker-queue-len-max=1. ------------------------------------------------------------ revno: 3244 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 18:53:32 +0200 message: wl#5569 MTS fixing a valgrind stack cauased by extra pfs-keys/cond_var. Those are removed with Alfranio`s consent ------------------------------------------------------------ revno: 3243 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 17:57:01 +0200 message: wl#5569 MTS fixing a set of valgrind warning cauased by a c&p ------------------------------------------------------------ revno: 3242 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 16:52:50 +0200 message: wl#5569 MTS updating results for few tests. ------------------------------------------------------------ revno: 3241 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-11 21:00:47 +0200 message: wl#5569 MTS 1. Fixing recovery related issue of DBUG_ASSERT(rli->get_event_relay_log_pos() >= BIN_LOG_HEADER_SIZE); at slave start with shifting mts_recovery_routine() at front of the assert. 2. Making SKIP-ed event to commit to the central RLI. That is correct since Workers are not executing anything at this time. 3. Fixing the default for mts_checkpoint_period which should not be zero normally. Zero makes sense solely for debugging (so we may stress that through VALID_RANGE(1,...). 4. Introduced a general mts-unsupported error/warning to apply to cases of non-zero parallel workers and a feature that parallelization can't work with. ------------------------------------------------------------ revno: 3240 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-10 18:25:27 +0200 message: merge from wl5569 repo to a local branch ------------------------------------------------------------ revno: 3239 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-10 17:50:03 +0200 message: wl#5569 MTS Improving GAQ in a) limit size to be capable to hold items while all WQ:s are full b) move_queue_head() contained a flaw to make no progress falsely c) never let to enque in GAQ while it's full ------------------------------------------------------------ revno: 3238 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-09 19:46:27 +0200 message: merge from wl5569 repo to a local branch ------------------------------------------------------------ revno: 3237 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-09 19:45:02 +0200 message: wl#5569 MTS Integration with wl#5599 recovery for MTS and fixing two asserts. One is due to missed cleanup of errored-out rows-events; the other is a work-around on w->curr_group_exec_parts->dynamic_ids is initialized to have one partition on the Worker startup, but it should not. ------------------------------------------------------------ revno: 3236 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 13:59:07 +0000 message: WL#5599 Fixed warning messages. ------------------------------------------------------------ revno: 3235 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 12:59:07 +0000 message: WL#5599 Fixed test cases. ------------------------------------------------------------ revno: 3234 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 01:30:32 +0000 message: WL#5599 Fixed failures in test cases. ------------------------------------------------------------ revno: 3233 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 00:33:48 +0000 message: merge mysql-trunk --> mysql-next-mr-wl5569 Conflicts: . mysql-test/r/log_tables_upgrade.result . mysql-test/r/mysql_upgrade.result . mysql-test/r/mysql_upgrade_ssl.result . mysql-test/r/mysqlcheck.result . mysql-test/suite/perfschema/r/pfs_upgrade_lc0.result . mysql-test/suite/rpl/t/disabled.def . mysql-test/suite/sys_vars/r/all_vars.result . mysql-test/t/system_mysql_db_fix40123.test . mysql-test/t/system_mysql_db_fix50030.test . mysql-test/t/system_mysql_db_fix50117.test . sql/log_event.cc . sql/log_event.h . sql/rpl_mi.h . sql/rpl_slave.cc . sql/share/errmsg-utf8.txt ------------------------------------------------------------ revno: 3232 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-07 20:01:39 +0200 message: manual merge with a piece of recovery support on repo. rpl_parallel hits an assert that Alfranio is fixing ------------------------------------------------------------ revno: 3231 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-07 19:35:16 +0200 message: wl#5569 MTS Testing related fixes incl master_pos_wait() support and thereafter replacing sleeps with the functioning sync_slave_with_master; Fixing the limitted Q-log-event parallelization. After the fixing mixture of rows- and Q- transactions can run concurrently. Q-transaction will be treated sequentially by default. ------------------------------------------------------------ revno: 3230 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2010-12-05 22:04:17 +0200 message: wl#5569 WL#5599 MTS & recovery Refining and correcting two wl:s integration. The main achievement is events execution status is consistently recorded into the Worker and the central RL recovery tables. That was tested manually in rather agressive env where IO was used to reconnect randomly and load from Master contained Rotate events. TODO: to fix: rpl.rpl_parallel_conf_limits may not pass to address: Multi-stmt Query-log-event transaction case (see todo in sources). to destruct by Workers their executed events (was deferred until ev->update_pos started working). (Alfranio) to deploy mts_checkpoint_routine() call inside the successful event read branch of next_event(). Otherwise no calling happens when Coord is constanly busy with read/distribute. ------------------------------------------------------------ revno: 3229 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-04 19:14:50 +0200 message: merging from the repo wl5569 ------------------------------------------------------------ revno: 3228 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-04 15:45:02 +0000 message: Added mutex to the checkpoint_routine. ------------------------------------------------------------ revno: 3227 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-03 16:56:11 +0000 message: Implemented periodic checkpoint if parallel slave is enabled. ------------------------------------------------------------ revno: 3226 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-03 10:15:45 +0000 message: Fixed commit_positions() and removed unnecessary checkpoint thread. ------------------------------------------------------------ revno: 3225 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-02 20:13:12 +0200 message: manual merge to wl#5569 tree ------------------------------------------------------------ revno: 3224 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-02 19:46:46 +0200 message: wl#5569 MTS User interface related: set @@global.slave_parallel_workers= `non-zero` following with `START SLAVE` starts slave with so many Worker threads. That is non-zero value is defacto the slave parallel execution mode. Earlier introduced enum enum_slave_exec_mode SLAVE_EXEC_MODE_PARALLEL is withdrawn. Fixes rli->mts_pending_jobs_size statistics which might cause assert-crash otherwise. a silly c&p mistake of relay-log name change notification. Made a little clean-up including relocation of init-ion of workers related stuff into start_slave_workers(). Many changes in tests due to SLAVE_EXEC_MODE_PARALLEL and not only. ------------------------------------------------------------ revno: 3223 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-01 19:08:21 +0200 message: wl#5569 MTS The limit conditions such as WQ len, total WQ:s size related changes. Also a new test file is added. ------------------------------------------------------------ revno: 3222 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-30 16:39:40 +0200 message: merging from from wl#5569 repo containing wl#5599 integration ------------------------------------------------------------ revno: 3221 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-30 16:02:15 +0200 message: wl#5569 MTS Fixing group_relay_log_name change propagation from C to W; Garbage collection in the Partition-to-Worker hash is added with a parameter of how many records in the hash are tolerated w/o checking of the usage counter. Adding C-W synchronization due to: - overall WQ:s data max - hitting the limit of a WQ length Adding Flow Control infrastructure with - level of the hungry Worker forcing Coordinator to distribute eagerly symmetrically a Worker whose load is more than 100 % - hungry level is considered as fed-up. - nap time for C in case all WQ:s lengths are above the level. - a weight param to the base nap as a function of the number of fed-up W:s. TODO: UNTIL to force sequential exec; To fix ROWS_QUERY_LOG_EVENT corner case; to fix commented out // if (!ev) delete ev; after wl#5599 is merged (ev->update_pos() is done). ------------------------------------------------------------ revno: 3220 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-11-27 17:36:50 +0200 message: wl#5569 Providing relay-log name for wl#5599. Protocol of action on the C and W sides is described in rpl_rli_pdb.h. Erroring out in case of parallel exec and ROWS_QUERY_LOG_EVENT. (todo: the native sequential mode for the event needs some revision, in particular `delete ev' shall happen *always* in rli->cleanup_context not in two places as of current). ------------------------------------------------------------ revno: 3219 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-26 23:08:30 +0200 message: wl#5569 MTS Partitioning conflict detection and handling is implemented. A new option to run Query in parallel though incompatibly with Rows- case in that the default db not the actual db:s are used as the partition key. User interface gained the global var and the cmd line opt: slave_run_query_in_parallel (Welcome to the set! :-) ------------------------------------------------------------ revno: 3218 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-andrei timestamp: Fri 2010-11-26 16:15:37 +0000 message: There was a mismatching between the number of fields read and write and by consequence the read was failing for the Slave_worker. ------------------------------------------------------------ revno: 3217 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-25 11:03:54 +0200 message: wl#5569 merging with wl#5599 piece of code ------------------------------------------------------------ revno: 3216 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-25 10:47:39 +0200 message: wl#5569 Converting the prototype time db2w hash to be concurrent; Necessary inruduction of the least occupied Worker notion. It's currently computed as Worker having the least number of distributed partitions. Adding parallel support for Query_log_event; caution: 1. the session/default not the actual db as the key 2. may not have been tested against all use cases (e.g int vars) Fixing slave stop issues. ------------------------------------------------------------ revno: 3215 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-11-22 20:57:13 +0200 message: wl#5569 extinding futher interfaces to wl#5599 with propagating future_event_relay_log_pos to the Worker exec context. ------------------------------------------------------------ revno: 3214 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-11-20 19:23:42 +0200 message: wl#5569 MTS Worker pool start, stop, kills, error out implementation. ------------------------------------------------------------ revno: 3213 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-19 16:51:58 +0200 message: wl#5569 recovery interfaces for wl#5599 implementation. The essence of this patch is to provide GAQ object implimentation and valid life cycle. The checkpoint handler prior to call store methods of wl#5599 is supposed to invoke rli->gaq->move_queue_head(&rli->workers). See a simulation of that near ev->update_pos() of the mail sql thread loop. The checkpoint info is composed as instance of Slave_job_group to reside as rli->gap->lwm. Todo: uncomment + // delete ev; // after ev->update_pos() event is garbage once the real checkpoint has been done. Todo: the real implemention needs to take care of filing Slave_job_group::update_current_binlog as initially so at time of executing Rotate/FD methods. + // experimental checkpoint per each scheduling attempt + // logics of next_event() + + rli->gaq->move_queue_head(&rli->workers); ------------------------------------------------------------ revno: 3212 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-18 16:50:54 +0200 message: wl#5569 wl#5599 Recovery related. Prototyping the worker RLI instantiation, to be elaborated on. ------------------------------------------------------------ revno: 3211 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-18 16:00:52 +0200 message: wl#5569 MTS Extending the wl#5563 prototype gradually. This commit addresses: 1. recovery interface (a new Worker rli plus rli->gaq and pseudo-code for checkpoint to update GAQ and the central RLI recovery table. Wrt rli, C and W execute do_apply_event(c_rli) where c_rli is the central instance. C executes update_pos(c_rli), but W update_pos(w_rli). others: - decreased processing time for rpl_parallel, serial. ------------------------------------------------------------ revno: 3210 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2010-11-14 11:55:32 +0000 message: Post-push fix for WL#5599. ------------------------------------------------------------ revno: 3209 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-12 17:58:12 +0000 message: Post-push fix for WL#5599. ------------------------------------------------------------ revno: 3208 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-11 11:53:01 +0000 message: WL#5599 The patch changed the handler's functions, i.e. init_info, check_info, flush_info, remove_info and end_info and the related private member functions, in both file and table handlers, to accept an index that identifies the information that will be read or written. This is necessary now because the handlers will be used by the workers to read and write information from file(s) and table and there may be several workers running at the same time and thus an index is used to identify the worker that is accessing information. This change is also necessary for the multi-master replication as information from each master must be uniquely identified. ------------------------------------------------------------ revno: 3207 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-11-10 10:57:13 +0000 message: Refactory to start work on WL#5599. ------------------------------------------------------------ revno: 3206 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 13:34:18 +0000 message: Removed mysql-test/collections/mysql-next-mr.crash-safe.* in the WL#5569. ------------------------------------------------------------ revno: 3205 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 13:04:14 +0000 message: merge mysql-next-mr.crash-safe --> mysql-next-mr-wl5569 Conflicts: . sql/CMakeLists.txt . sql/Makefile.am . sql/sql_class.h . sql/rpl_slave.cc ------------------------------------------------------------ revno: 3204 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 11:39:37 +0000 message: merge mysql-next-mr-wl5563-labs --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3203 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 23:33:37 +0300 message: wl#5563 simplifying memory handling for the Coor-Workers transport to avoid sporadic crashes ------------------------------------------------------------ revno: 3202 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 21:19:56 +0300 message: wl#5563 leaving out a fine garbage collection. That task is unnessary to solve at prototyping time. Update-pos routine to be implemented is going to eliminated that piece of code ------------------------------------------------------------ revno: 3201 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 20:38:35 +0300 message: wl#5563 Extending the tests base to split the former rpl_parallel into two to run in serial exec mode as well. ------------------------------------------------------------ revno: 3200 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 11:49:00 +0300 message: wl#5563 improved test; fixed a delete issue that was used to crash; added @@global.slave_local_timestamp to fill in timestamp col slave clock value. Performance growth can be seen through the test. todo: merge with Alfranio work on hashing and dyn alloc of PFS obj:s. ------------------------------------------------------------ revno: 3199 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Wed 2010-09-15 14:51:49 +0300 message: wl#5563 tests for the wl. Number of workers and iterations can be tuned. todo: convert as param:s to pass to the test through mtr ------------------------------------------------------------ revno: 3198 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Mon 2010-09-13 18:22:41 +0300 message: wl#5563 adding an ingeneous no-stress-attempting-yet test that also fired an assert. Refined the Worker instance ref computing because cleanup_context() is executed by the sql-thread the coordinator as well ------------------------------------------------------------ revno: 3197 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Mon 2010-09-13 13:15:38 +0300 message: wl#5563 Rows-event parallelization basically is implemented although tested shallowly. Write access to rli central stuct by workers may not be eliminated fully at this phase. E.g that relates to errors. todo: to prove rli gets out of Worker scope todo: to provide a stress test ------------------------------------------------------------ revno: 3196 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Sat 2010-09-11 17:00:08 +0300 message: wl#5563 adding Rows-event limitted to one Worker support. Deferred deletion did not check emptyness of the list ------------------------------------------------------------ revno: 3195 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-10 20:36:07 +0300 message: wl#5563 correcting comments to indicate less limitations ------------------------------------------------------------ revno: 3194 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-10 20:32:39 +0300 message: WL#5563 Prototype for Slave parallelized by db name More progress to the WL in that the STMT binlog-format works while the conceptual limits are held. That is no query/transaction is allowed to deal with more than one db. Addressed a complication in that update pos method that is run by Coordinator belongs to Log_event hierarchy and therefore the event deletion now by Worker must be careful. Todo: 1. (High prior) fix Row-format complications 2. (Hight prior) Elaborate on the hash function to be a function on db text name 3. (Optional) Consider moving update_pos to the RLI class to get rid of the delete logics complication. How-to-use: The instuction can be found in comments of the previous commit, see there for more details. In brief though, the db names have to follow a pattern: `test[0-9]'. E.g test0, test1, test2, test3 for the default four Worker threads. Slave side has to set @@global.slave_exec_mode=PARALLEL; before START SLAVE. ------------------------------------------------------------ revno: 3193 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Thu 2010-09-09 21:43:16 +0300 message: WL#5563 Prototype for Slave parallelized by db name This is an intermediate commit that indicates some progress. Namely, the worker pool operates correctly and with signs of scalable performance. How to test: connection master; set @@global.binlog_format=statement; connection slave; set @@global.slave_exec_mode=PARALLEL; set @@global.binlog_format=mixed; show processlist; => IO, SQL threads + 4 workers by default change master to ... connection master; create database test0; create database test1; create database test2; create database test3; # create databases with magic names "test[0-9]+", where the number will index # a worker. create database test0; create database test1; create database test2; create database test3; # create tables. they are only of MyISAM type for now use test0; create table tm_1(a int, b int) engine=myisam; use ... # DML on tables: use test[0-3]; insert into tm_1 values (1,0); ... ... connection slave; # monitor CPU (visually this time: top etc) # check correctness e.g select count(*) from test[0-3].tm_1; connection master; select count(*) from test[0-3].tm_1; ****** ------------------------------------------------------------ revno: 3364 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 17:09:22 +0300 message: wl#5569 MTS Refining rpl_rotate_logs that could not produce deterministic output. The list of binlogs contained one binlog more than expected. ------------------------------------------------------------ revno: 3363 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 14:56:01 +0300 message: updating result files that were left incorrect by the last merge. ------------------------------------------------------------ revno: 3362 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 14:44:59 +0300 message: wl#5569 MTS Failure in recovery when binlog-checksum is active. The reason of the failure was in that parsing of relay log by MTS recovery gaps computing did not make sure to use the relay-log own FormatDescriptor events that contain checksumming info for all events in the log. Fixed with taking care to find out the checksum algorithm for every relay log as the first step of MTS recovery gaps computing. ------------------------------------------------------------ revno: 3361 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-08-17 11:21:23 +0300 message: merge from trunk forced to resolve few semantical conflicts caused by changes in THD::enter_cond() of the trunk. ------------------------------------------------------------ revno: 3360 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-27 08:56:14 +0100 message: Fixed failure in test rpl_mts_check_concurrency when running in the mts collection. ------------------------------------------------------------ revno: 3359 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-26 19:46:41 +0100 message: Added a test case that checks if MTS allows to concurrently access the replication tables, and as such, concurrently commit transactions that update different databases. ------------------------------------------------------------ revno: 3358 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 20:08:43 +0100 message: Configured rpl_parallel_switch_sequential to run in row and mixed mode to avoid cluttering the error log with messages on unsafe execution. ------------------------------------------------------------ revno: 3357 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 19:02:14 +0100 message: This patch contains the following fixes: . Removed suppressed warning introduced in the wrong test case (i.e. rpl_corruption) and put it in the correct one (i.e. rpl_row_corruption). . Introduced variable to avoid clutering the error log with several warning messages on unsafe execution. ------------------------------------------------------------ revno: 3356 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 11:01:12 +0100 message: This patch has the following changes: . Specific directories were created for the MTS runs in the default.push. . Warning message was suppressed in the rpl_corruption.test. . Annoying debug outputs were removed from the error log. However, this is a temporary solution as it forbids to enable traces. ------------------------------------------------------------ revno: 3355 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-20 11:56:40 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3354 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-19 22:26:30 +0300 message: wl#5569 MTS valgrind reported a stack on rpl_savepoint. The problem appears to be in that at computing slave_sql_running_state in show_mater_info() the sql thread proc_info pointer could refer to a value in a stack that has already gone. Fixed with making proc_info to point to a string literal. ------------------------------------------------------------ revno: 3353 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-19 17:46:43 +0100 message: Suppressed warning messages that could potentially cause problems while running mts crash safe test cases. ------------------------------------------------------------ revno: 3352 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-18 21:46:45 +0300 message: wl#5569 MTS Cosmetic changes are done to address readability and clearness of source code of the MTS patch. ------------------------------------------------------------ revno: 3351 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-18 14:52:44 +0300 message: wl#5569 MTS Inadvertently introduced hunk two rev:s back is reverted to please rpl_*_mts_crash_safe. ------------------------------------------------------------ revno: 3350 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-17 00:51:45 +0300 message: wl#5569 MTS fixing build issue for embedded. Public visibility for Rows_log_event::do_apply_event() is restored. ------------------------------------------------------------ revno: 3349 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-16 20:08:31 +0300 message: wl#5569 MTS The patch contains improvements after code review. Changes are mostly consmetic. ------------------------------------------------------------ revno: 3348 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-16 02:11:11 +0300 message: bug#12755663 MTS: RPL_CIRCULAR_FOR_4_HOSTS FAILS: CANT EXECUTE THE CURRENT EVENT GROUP MTS stopped with an error in the middle of the test. The reason is that a group of events from the slave itself was processed partly to modify the group position. In the following restart the wrong group bondary made slave either to error out or assert. Fixed with locating a possible race condition allowin Coordinator to ignore actual failed status of a Worker. So in the case of the test, the slave server group can't be started. Notice, this is a trial patch since I can't catch the failure on available to me hosts at all. ------------------------------------------------------------ revno: 3347 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-07-14 12:40:06 +0300 message: WL#5569 MTS further extensive rpl_circular_for_4_hosts exersices with --repeat 10 --parallel=8 revealed a race condition in that Coordinator might miss to catch not-running status for a Worker. That made Coordinator to skip only a part of a group of the slave server own events so the slave stops at not the bondary of a group. Fixed with moving marking of the errored-out Worker as failed prior to its APH entries release. TODO: notice there can be a possibility to stop at not the boundary due to graceful STOP SLAVE if one is run at time of skipping self-originated events. However this issue belongs to STS and might be similar with BUG@12604951 and BUG@12728160. ------------------------------------------------------------ revno: 3346 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-07-14 08:03:55 +0100 message: Post-push fixes for WL#5569 Injecting faults while updating a myisam table requires to flush the changes before committing suicide. So we have introduced the follwing code: DBUG_EXECUTE_IF("crash_after_commit_and_update_pos", - DBUG_SUICIDE();); + sql_print_information("Crashing crash_after_commit_and_update_pos."); + flush_info(TRUE); + DBUG_SUICIDE(); Besides we improved some comments. ------------------------------------------------------------ revno: 3345 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-13 16:23:57 +0100 message: WL#5569 ------------------------------------------------------------ revno: 3344 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-13 00:10:43 +0300 message: wl#5569 MTS merge trunk -> wl5569-tree ------------------------------------------------------------ revno: 3343 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-12 23:36:17 +0300 message: wl#5569 MTS adding suppression due to expected warning to rpl_circurlar_for_4_hosts; decreasing a loop limit in rpl_parallel_switch_sequential in case of statement format. ------------------------------------------------------------ revno: 3342 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-12 14:46:23 +0300 message: WL#5569 MTS Fixing code and test due to rpl.rpl_circular_for_4_hosts mismatch failure, like http://pb2.norway.sun.com/?action=archive_download&archive_id=3608382. The reason of the mismatch was that when having two group of events to execute, the first for a Worker and the 2nd for Coordinator, Coordinator waited for the 1st group completion but did not verify success of synchronization. So in a case of the failed applying of the 1st group processing of the 2nd could find an inconsistent state to end up with a segfault (even though only the mismatch has been seen so far). ------------------------------------------------------------ revno: 3341 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-10 22:40:01 +0100 message: Avoiding busy waiting when running mts recovery tests. ------------------------------------------------------------ revno: 3340 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-09 23:11:58 +0100 message: Removed --slave-checkpoint-period from MTS test cases. ------------------------------------------------------------ revno: 3339 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-09 23:08:07 +0100 message: Improved test cases for the WL#5569. ------------------------------------------------------------ revno: 3338 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 22:40:52 +0300 message: wl#5569 MTS The patch refines logics of applying phase of MTS-recovery to always applying events that are for Coordinator; fixes few tests to make them passable on PB; makes GAQ size to be of checkpoint_group value. ------------------------------------------------------------ revno: 3337 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 07:54:34 +0100 message: Reduced the timeout period to run the checkpoint routine by setting slave-checkpoint-period to 30. ------------------------------------------------------------ revno: 3336 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 07:44:35 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3335 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-06 12:46:05 +0300 message: wl#5569 MTS refining wait for db-hash entry release at event distribution. The graceful STOP is not accepted at this point so Coordinator continues to stay in a loop. ------------------------------------------------------------ revno: 3334 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-05 20:43:04 +0300 message: bug#12719875 possible MTS recovery issue. MTS stopped with an error after failing to apply an event. It turned out that the event was sceduled incorrectly due to earlier stop by Single-Threaded Slave not at the group boundary but rather in the middle of it. Fixed with forcing CREATE..SELECT be logged as two groups. The CREATE-TABLE group is surrounded with its own BEGIN/COMMIT braces. ------------------------------------------------------------ revno: 3333 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-04 18:14:09 +0300 message: wl#5569 MTS Adding a rule to run PB with all suites in MTS with binlog-format ROW. ------------------------------------------------------------ revno: 3332 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-03 23:29:34 +0300 message: wl5569 MTS cleanup in one file. ------------------------------------------------------------ revno: 3331 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-03 23:16:02 +0300 message: wl5569 MTS bzr commit mail address changed; a minor cleanup to make mts_is_worker() with const argument; releasing a test to run in MTS. ------------------------------------------------------------ revno: 3330 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-02 08:58:56 +0100 message: Fixed use of the performance schema in the replication code and concurrency issue in the IO Thread. In particular, the IO Thread was calling flush_master_info without grabbing locks. ------------------------------------------------------------ revno: 3329 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 16:41:35 +0300 message: wl5569 MTS merging from the main repo. ------------------------------------------------------------ revno: 3328 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 15:48:25 +0300 message: wl#5569 MTS the final cleanup patch. There are few glitches that were considered as tolerable at least for time of the total wl's code is being reviewed. That includes: - no support to old load-data events - no support for FK to add to the list, there are few places in the patch that suggests to deploy error branches each time flush_info() is called. ------------------------------------------------------------ revno: 3327 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 13:16:52 +0300 message: wl#5569 MTS The patch cleans up some host of code. ------------------------------------------------------------ revno: 3326 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-28 11:30:18 +0300 message: wl#5569 MTS replacing views with regular tables for consistency verification in rpl_parallel_innodb. Also a minor cleanup in rpl_parallel is done. ------------------------------------------------------------ revno: 3325 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-27 20:31:45 +0300 message: wl#5569 MTS Cleanup and addressing sporadic rpl_temp_table_mix_row failure in post-execution mtr.check_testcase(). The check of the test failure was caused by faulty optimization in avoiding to migrate temporary tables from Coordinator to Workers in case of rows-event assignement. while it's correct with the homogenous rows-event only load, the mixture can fail. Fixed with removing the optimization so map_db_to_worker() always relocates which is somewhat suboptimal and should be improved in future. ------------------------------------------------------------ revno: 3324 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-27 13:12:52 +0100 message: Ensured that updates to the worker_info_repository are transactional and fixed the slave_checkpoint_group_basic test case. ------------------------------------------------------------ revno: 3323 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-26 13:02:59 +0100 message: Fixed test case. ------------------------------------------------------------ revno: 3322 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-06-25 15:14:24 +0100 message: Introduced test case for recovery with MTS and fixed bugs in recovery. ------------------------------------------------------------ revno: 3321 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 15:38:19 +0300 message: wl#5569 MTS This patch makes a bit of cleanup, addresses one memory-allocation todo and completes fixing valgrind report (rpl_parallel_start_stop) due to strings allocation in Slave_job_group items. ------------------------------------------------------------ revno: 3320 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 12:38:34 +0300 message: wl#5569 MTS this patch completes the previous one to fixes a result file and make the innodb specific test verification to base on tables not views. ------------------------------------------------------------ revno: 3319 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 00:11:22 +0300 message: wl#5569 MTS this is an exploratory patch to sort out if verification method what was based on views has its own not related to mts flaw. The patch calls verification macro on the tables that required some adjustment. ------------------------------------------------------------ revno: 3318 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-23 07:56:15 +0300 message: wl#5569 MTS fixing results of mysqld--help-win. ------------------------------------------------------------ revno: 3317 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 19:20:40 +0100 message: merge mysql-next-mr-wl5569 (local) --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3316 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 19:17:43 +0100 message: In some platforms, such as Windows, thread's wait time is stored in 100ns units. However, when computing the difference between two values, the result value was not multiplied by 100. Besides, there was a casting problem when the aforementioned result value was assigned to an ulong. ------------------------------------------------------------ revno: 3315 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 18:54:23 +0100 message: Fixed how mts copes with recovery. ------------------------------------------------------------ revno: 3314 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-21 19:10:54 +0300 message: wl#5569 MTS Fixing valgrind warnings. ------------------------------------------------------------ revno: 3313 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-21 18:15:43 +0300 message: wl#5569 MTS rpl_parallel_start_stop.test could fail sporadicaly with timeout. ------------------------------------------------------------ revno: 3312 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 23:21:56 +0100 message: merge mysql-next-mr-wl5569 (local) --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3311 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 23:19:06 +0100 message: Fixed error when computing the Lower-Water-Mark. If two or more jobs were removed from the Group of assigned jobs and one of the jobs had a non-empty group relay log but the last one had an empty group relay log. The Lower-Water-Mark was not correctly updated, because the algorithm assumed that the group relay log was null. ------------------------------------------------------------ revno: 3310 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 11:52:44 +0100 message: Fixed valgrind errors. Slave_job_group was silently being cast to LOG_POS_COORD while calling sort_dynamic(&above_lwm_jobs, (qsort_cmp) mts_event_coord_cmp) and by consequence mts_event_coord_cmp(LOG_POS_COORD *, LOG_POS_COORD *). This had two problems: . The first two entries in the Slave_job_group were not a pointer to a char * and my_offset. . Even if the first two entries were char * and my_offset, such casting could lead to alignment problems. To fix the problem, we avoid this casting. ------------------------------------------------------------ revno: 3309 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 19:14:50 +0300 message: wl#5569 MTS fixing slave_transaction_retries_basic_64.result ------------------------------------------------------------ revno: 3308 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 16:11:25 +0300 message: wl#5569 MTS fixing tests. ------------------------------------------------------------ revno: 3307 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 12:33:36 +0300 message: wl#5569 MTS Fixing rpl.rpl_mixed_binlog_max_cache_size that revealed incorrect asynchronous handling of a Rotate event which does not split the current group and therefore has to be executed after all previously scheduled events. Fixing sensetivity of two other tests to mtr's invocation environment that includes inital values of slave_parallel_workers and slave_transaction_retries. ------------------------------------------------------------ revno: 3306 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 09:04:19 +0100 message: Fixed some windows failures. ------------------------------------------------------------ revno: 3305 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-06-18 19:58:21 +0100 message: Fixed some recovery issues. ------------------------------------------------------------ revno: 3304 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 21:01:58 +0300 message: wl#5569 MTS fixing tests and a segfault at the end of handle_slave_sql() happened after worker initialization failed (e.g rpl_row_log on win). ------------------------------------------------------------ revno: 3303 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 18:34:16 +0300 message: wl#5569 MTS fixing tests. ------------------------------------------------------------ revno: 3302 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 14:00:41 +0300 message: wl#5569 MTS fixing rpl_row_basic_3innodb similarly to the previous patch. ------------------------------------------------------------ revno: 3301 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 13:51:59 +0300 message: wl#5569 MTS fixing few tests. 1. Policy is implemented for reacting with a warning in a case of failing worker leaves the total slave state with gaps thereby inconsistent. 2. Two tests that were used to time out due to reset master/slave was disabled in there. ------------------------------------------------------------ revno: 3300 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 02:24:59 +0100 message: Removed unnecessary test cases and augment others in order to test recovery. ------------------------------------------------------------ revno: 3299 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-16 19:46:22 +0300 message: wl#5569 MTS fixing slave_parallel_workers_basic and rpl_stop_middle_group which cant run in MTS ------------------------------------------------------------ revno: 3298 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-16 11:29:53 +0300 message: wl#5569 MTS adding new tests to sys_vars.\ ------------------------------------------------------------ revno: 3297 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:41:32 +0100 message: WL#5569 Adding a global suppression for the warning that may appear when stopping the slave sql thread in the middle of a group. This should affect MTS mode only. ------------------------------------------------------------ revno: 3296 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:40:41 +0100 message: WL#5569 Renames worker-info-repository to slave-worker-info-repository in some tests option files. ------------------------------------------------------------ revno: 3295 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:32:37 +0100 message: WL#5569 More test fixes. Removing remaining prefixes 'mts' from mts variables, which have been renamed recently. ------------------------------------------------------------ revno: 3294 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 00:27:20 +0100 message: WL#5569 Fixing rpl_parallel result file. ------------------------------------------------------------ revno: 3293 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:41:33 +0300 message: wl#5569 MTS correcting --slave-parallel-workers in few more files ------------------------------------------------------------ revno: 3292 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:31:46 +0300 message: wl#5569 MTS correcting --slave-parallel-workers in collections/default.push ------------------------------------------------------------ revno: 3291 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:12:11 +0300 message: wl#5569 MTS Cleanup, including 1. decreasing number and renaming system variables. Important for debugging command line options are replaced with reasonble constant values and only necessary are retained. 2. Small encapsulation in ha_blackhole.cc is done. ------------------------------------------------------------ revno: 3290 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 15:59:23 +0100 message: Fixed replication valgring failures caused by the MTS. ------------------------------------------------------------ revno: 3289 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 21:23:13 +0300 message: wl#5569 MTS wl#5754 Query event parallel execution Fixing failing tests and a failure in gathering accessed databases that was caused by a recent merge from trunk. ------------------------------------------------------------ revno: 3288 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 13:35:20 +0300 message: merge from trunk ------------------------------------------------------------ revno: 3287 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 12:27:38 +0300 message: wl#5569 MTS Fixing failing tests due to a. a flaw in `isolated parallel' mode implementation. Isolation applies to a group of event rather than to an instance. And event that contains over-max accessed db:s or event from Old master trigger marking the current being scheduled group. Such group will be executed having all prior scheduled done and nomore will be scheduled until the group is done. b. Notification to Coordinator about errored-out Worker is corrected. ------------------------------------------------------------ revno: 3286 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-12 22:33:32 +0300 message: wl#5569 MTS making default.push to run rpl suite with non-default --mts-slave-parallel-workers > 0 in all three format/mode (row,stmt, mixed). The default is run for all suites in mixed mode and rpl suites with row+ps, stmt formats. ------------------------------------------------------------ revno: 3285 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-12 22:05:05 +0300 message: wl#5569 MTS manual merge with few fixes for segfault of the last merge from the trunk etc, compilation issue on embedded. ------------------------------------------------------------ revno: 3284 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-09 18:35:59 +0100 message: Post-fixes for merge. Fixed compilation in Windows and removed an used options. ------------------------------------------------------------ revno: 3283 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-09 16:27:47 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3282 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-06 13:51:19 +0300 message: wl#5569 MTS STOP SLAVE now stops consistently w/o gaps, KILL shall be used for an urgent stop, an error case behaves like the killed. For instance, a Worker errors out, it sends KILL to Coordinator through THD::awake(), and Coordinator kill the rest through setting a special Worker-running status to killed (which breaks the read-exec loop of a Worker). ------------------------------------------------------------ revno: 3281 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-05 20:01:51 +0300 message: wl#5569 MTS More cleanup, fixes due to found issues when running tests, some improvements incl in stopping Workers to make routine to distinguish between killed and gracefully stopped cases so in the end STOP SLAVE will guarantee consistent state (some todo remains still). ------------------------------------------------------------ revno: 3280 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-05-30 13:05:07 +0300 message: WL#5569 MTS WL#5754 Query event parallel applying ----------------------------------------------------------------- Aggregating 7 commits that are not pushed yet to the wl5569 repo. Find comments for each cset below. ------------------------------------------------------------------ The current patch addresses concurrent updating slave_open_temp_tables status counter. The former declaration of the underlying server variable is changed from ulong to int32. While that might affect (shrink) the actual range, there has been no specified range and now after the number of bits is the same on all platforms the range cat be set to be [0, max(int32)] ****** wl#5569 MTS Wl#5754 Query event parallel appying wl#5599 MTS recovery The patch includes some cleanup, including one for temp tables support, realization of few todo:s. ****** wl#5569 MTS wl#5754 Query event parallel applying More cleanup is done; Fixing temp tables manipulation. Asserting an impossible to support use case of group of events not wrapped with BEGIN/COMMIT. Todo: recognize old master binlog to refuse to run in parallel. ****** wl#5569 MTS Implementation of giving out the applier role to Worker for all cases but ones dealing with the Coordinators state. That includes Query event with over-max-db:s and Load-data related events. The current patch also makes old master binlog be handled by MTS though sometimes e.g for Query event to switch to the sequential mode. Fixing a race condition making C to wait endlessly if a Worker has exitted due to its applying error. ****** wl#5569 MTS correcting an assert that was used to fire as warned in the previous commit. Parallel feature tests pass now. ****** wl#5569 MTS This patch contains cleanup and simplification of logics of handling some events sequentially by Coordinator and adds memory-allocation failure branch to workers starting routine. ****** wl#5569 MTS An intermediate patch to address few issues raised by reviewers. To sum up, it's about cleanup and logics simplification of event distribution to Worker and consequent actions. Some efforts were paid to support Old Master Begin-less group of events. ------------------------------------------------------------ revno: 3279 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-05-24 17:29:35 +0300 message: WL#5569 MTS WL#5754 Query parallel appying Changing implementation of temporary tables support in MTS. Cleanup, fixing few todo:s and few potential issues found. ------------------------------------------------------------ revno: 3278 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-05-19 12:36:28 +0300 message: wl#5569 MTS Support for ROWS_QUERY_LOG_EVENT is added. It required refactoring of its handling in the canonical sequential mode. The event life suggests its behavior similar to objects associated with Table_map, in particural, its destoying to occur at the end-of-statement time. Tested against existing ROWS_QUERY_LOG_EVENT feature tests incl rpl_row_ignorable_event in both sequential and parallel mode. ------------------------------------------------------------ revno: 3277 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-05-16 22:43:58 +0300 message: wl#5569 MTS Simplifying Coordinator-Worker interfaces. In essence after this patch Worker execute events in its private context (class Slave_worker :public Relay_log_info). The only exception is Query referring to temporary table. The temp:s are maintained in the Coordinator's "central" rli; removing some dead code; performing a lot of cleanup. There are few todo items incl: 1. To implement several todo:s scattered across MTS' code and tests (e.g to restore protected for few members of RLI of rpl_rli.h); 2. to cover Rows_query_log_event that currently can cause hanging (e.g rpl_parallel_fallback) 3. To sort out names of classes based on Rpl_info, possibly remove Rpl_info_worker ------------------------------------------------------------ revno: 3276 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-05-06 21:33:32 +0300 message: wl#5569 MTS improving benchmarking test. ------------------------------------------------------------ revno: 3275 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-04-06 15:51:58 +0300 message: wl#5569 MTS Statistics for Workers and Coordinator incl waiting times, sleeping is reported now into the error log as slave stopping time. ------------------------------------------------------------ revno: 3274 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-04-05 19:26:37 +0300 message: wl#5569 MTS restoring previous 4 default workers that rpl_parallel works with. ------------------------------------------------------------ revno: 3273 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-04-03 13:07:30 +0300 message: wl#5569 MTS Benchmarking related patch uniforms rpl_parallel to be run with arbitrary number of workers, db:s, tables, etc. TODO: to restore the final constinency check which is given out temporary while i could not find a way to leave it surrounded with a --dis/en-able* stanza. ------------------------------------------------------------ revno: 3272 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-04-02 14:32:02 +0300 message: wl#5569 MTS a test file for benchmarking is added. Benchmarking results can be gained by extracting the master side generating and the slave side applying times like in the following loop: workers=6; for n in `seq 1 3`; do echo; echo loop: $n; echo; my_mtr.sh --mysqld=--mts-slave-parallel-workers=$workers \ rpl_parallel_benchmark --mysqld=--binlog-format=statement \ && cat /dev/shm/var/mysqld.2/data/test/delta.out >> p${workers}_stmt.out 2>&1; done ------------------------------------------------------------ revno: 3271 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-03-30 17:11:24 +0300 message: wl#5754 Query event parallel execution Small cleanup for comments as requested by reviewer. ------------------------------------------------------------ revno: 3270 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-02-27 19:35:25 +0200 message: WL#5754 Query event parallel execution Bundling together implementation the whole DML+DDL Query parallel support. That includes: The earlierst four rev:s to cut off the DML stage of the parallel query project from the following devoted to DDL. The four skeleton parallel applying of Queries containing a temporary table, and implement a core of the design that is the DML queries. Queries can contain arbitrary features including temp tables. The DDL part also refined few items related to the general low-level design. In particular, of the mark of the over-max db:s in the updated-db:s status var is turned to be another new constant value. The very last patch to the bundle addresses the last review mail notes. ------------------------------------------------------------ revno: 3269 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-01-12 01:01:02 +0200 message: merging from mysql-trunk ------------------------------------------------------------ revno: 3268 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-01-12 00:54:12 +0200 message: wl#5569 MTS fixing the worker threads start/stop. ------------------------------------------------------------ revno: 3267 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-27 18:54:41 +0000 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3266 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-24 01:57:03 +0200 message: wl#5569 MTS the timed-wait loop of SQL thread required a break-through parameter in case the signal missed in action and just timeout would be reported ------------------------------------------------------------ revno: 3265 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 19:03:42 +0200 message: merging from the repo wl5569 ------------------------------------------------------------ revno: 3264 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 17:49:19 +0200 message: wl#5569 MTS fixing corner cases that mtr-testing with mts workers against stardard suites reveal. ------------------------------------------------------------ revno: 3263 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 16:00:28 +0200 message: wl#5569 MTS: refining another assert that can force C to delete events that are skipped with the slave skip counter ------------------------------------------------------------ revno: 3262 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 15:34:02 +0200 message: wl#5569 MTS Correcting an assert that is hit by few tests. ------------------------------------------------------------ revno: 3261 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 13:27:15 +0200 message: wl#5569 MTS merging from the repo. ------------------------------------------------------------ revno: 3260 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 13:25:31 +0200 message: wl#5569 MTS fixing failing tests. ------------------------------------------------------------ revno: 3259 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 20:34:26 +0200 message: wl#5569 MTS merging from the repo. ------------------------------------------------------------ revno: 3258 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 20:31:13 +0200 message: wl#5569 MTS fixing tests failure when mtr runs --mts_slave_parallel_workers != 0. rpl000010 is a representative. Fixed with identifying, marking, running carefully ev->update_pos() and destroying an event that can split a group of events to force part to be in different relay logs. ------------------------------------------------------------ revno: 3257 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 13:57:18 +0200 message: wl#5569 MTS and wl#5599 MTS recovery The general recovery implementation is finished by this patch. Tested against ./mtr rpl_parallel_conf_limits. Warning, ./mtr rpl_parallel_conf_limits rpl_parallel_conf_limits ... can fail at the 2nd etc test because of no removal of Worker tables happens at RESET SLAVE. ------------------------------------------------------------ revno: 3256 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 22:12:30 +0200 message: wl#5569 MTS slave_worker_info def is updated in the system db. ------------------------------------------------------------ revno: 3255 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 21:34:58 +0200 message: merging with repo ------------------------------------------------------------ revno: 3254 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 21:31:29 +0200 message: wl#5569 MTS Recovery routine part I: gathering the group recovery bitmap. ------------------------------------------------------------ revno: 3253 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 22:18:33 +0000 message: WL#5599 Fixed routine to compute the bitmap of executed events. ------------------------------------------------------------ revno: 3252 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 21:37:48 +0200 message: wl#5569 MTS adding checkpoint relay_log_name,pos as necessary part to locate a relay-log for recovery. Tested with rpl_parallel. ------------------------------------------------------------ revno: 3251 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 17:58:58 +0200 message: wl#5569 MTS manual merging from the repo and correcting GAQ processing with introducing a volatile byte to indicate whether an item is busy or released. ------------------------------------------------------------ revno: 3250 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-18 21:00:23 +0200 message: wl#5569 MTS fixing --mts-exp-slave-run-query-in-parallel=1 case when Query-log-event can be run in parallel incl DML and DDL. The feature is `exp'erimental still can be tried while there are no temp tables involved neither a db different than the session's default is modified by the query. Tested: Changes sustain mtr rpl_parallel --mysqld=--mts-exp-slave-run-query-in-parallel=1 --mysqld=--binlog-format=statement ------------------------------------------------------------ revno: 3249 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-17 14:46:15 +0200 message: wl#5569 MTS fixing PB2 failures, incl valgrind issues, long exec time and asserting in a test. ------------------------------------------------------------ revno: 3248 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-17 00:00:47 +0200 message: merge from wl#5569 repo to local branch rpl_sequential opt files are added to avoid mtr give up to process a bulk of unsafe warnings. ------------------------------------------------------------ revno: 3247 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-16 23:41:45 +0200 message: wl#5569 MTS Adding transparent support/fallback to the sequential execution cases of 1. Query-log-event 2. Rows_query_log_event info event Both cases can be fully parallelized in future project. Fixing an issue in move_queue_head() that was surficed as an assert in Slave_worker::slave_worker_group_ends(). Fixing destoying an event by Worker. ------------------------------------------------------------ revno: 3246 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-14 16:46:20 +0200 message: merge from wl5569 repo ------------------------------------------------------------ revno: 3245 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-14 10:57:16 +0200 message: wl#5569 MTS a light cleanup to arrange the option/system var names properly - mts_-prefixing, and _exp prefixing for experimental features needed for benchmarking (mts_exp_slave_local_timestamp) or suppored limitly (mts_exp_slave_run_query_in_parallel for Query-log-event). Fixing GAQ size. It might be too tight e.g in case of the max WQ length of 1; tested with running rpl_parallel supplying --mts-slave-worker-queue-len-max=1. ------------------------------------------------------------ revno: 3244 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 18:53:32 +0200 message: wl#5569 MTS fixing a valgrind stack cauased by extra pfs-keys/cond_var. Those are removed with Alfranio`s consent ------------------------------------------------------------ revno: 3243 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 17:57:01 +0200 message: wl#5569 MTS fixing a set of valgrind warning cauased by a c&p ------------------------------------------------------------ revno: 3242 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 16:52:50 +0200 message: wl#5569 MTS updating results for few tests. ------------------------------------------------------------ revno: 3241 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-11 21:00:47 +0200 message: wl#5569 MTS 1. Fixing recovery related issue of DBUG_ASSERT(rli->get_event_relay_log_pos() >= BIN_LOG_HEADER_SIZE); at slave start with shifting mts_recovery_routine() at front of the assert. 2. Making SKIP-ed event to commit to the central RLI. That is correct since Workers are not executing anything at this time. 3. Fixing the default for mts_checkpoint_period which should not be zero normally. Zero makes sense solely for debugging (so we may stress that through VALID_RANGE(1,...). 4. Introduced a general mts-unsupported error/warning to apply to cases of non-zero parallel workers and a feature that parallelization can't work with. ------------------------------------------------------------ revno: 3240 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-10 18:25:27 +0200 message: merge from wl5569 repo to a local branch ------------------------------------------------------------ revno: 3239 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-10 17:50:03 +0200 message: wl#5569 MTS Improving GAQ in a) limit size to be capable to hold items while all WQ:s are full b) move_queue_head() contained a flaw to make no progress falsely c) never let to enque in GAQ while it's full ------------------------------------------------------------ revno: 3238 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-09 19:46:27 +0200 message: merge from wl5569 repo to a local branch ------------------------------------------------------------ revno: 3237 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-09 19:45:02 +0200 message: wl#5569 MTS Integration with wl#5599 recovery for MTS and fixing two asserts. One is due to missed cleanup of errored-out rows-events; the other is a work-around on w->curr_group_exec_parts->dynamic_ids is initialized to have one partition on the Worker startup, but it should not. ------------------------------------------------------------ revno: 3236 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 13:59:07 +0000 message: WL#5599 Fixed warning messages. ------------------------------------------------------------ revno: 3235 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 12:59:07 +0000 message: WL#5599 Fixed test cases. ------------------------------------------------------------ revno: 3234 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 01:30:32 +0000 message: WL#5599 Fixed failures in test cases. ------------------------------------------------------------ revno: 3233 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 00:33:48 +0000 message: merge mysql-trunk --> mysql-next-mr-wl5569 Conflicts: . mysql-test/r/log_tables_upgrade.result . mysql-test/r/mysql_upgrade.result . mysql-test/r/mysql_upgrade_ssl.result . mysql-test/r/mysqlcheck.result . mysql-test/suite/perfschema/r/pfs_upgrade_lc0.result . mysql-test/suite/rpl/t/disabled.def . mysql-test/suite/sys_vars/r/all_vars.result . mysql-test/t/system_mysql_db_fix40123.test . mysql-test/t/system_mysql_db_fix50030.test . mysql-test/t/system_mysql_db_fix50117.test . sql/log_event.cc . sql/log_event.h . sql/rpl_mi.h . sql/rpl_slave.cc . sql/share/errmsg-utf8.txt ------------------------------------------------------------ revno: 3232 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-07 20:01:39 +0200 message: manual merge with a piece of recovery support on repo. rpl_parallel hits an assert that Alfranio is fixing ------------------------------------------------------------ revno: 3231 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-07 19:35:16 +0200 message: wl#5569 MTS Testing related fixes incl master_pos_wait() support and thereafter replacing sleeps with the functioning sync_slave_with_master; Fixing the limitted Q-log-event parallelization. After the fixing mixture of rows- and Q- transactions can run concurrently. Q-transaction will be treated sequentially by default. ------------------------------------------------------------ revno: 3230 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2010-12-05 22:04:17 +0200 message: wl#5569 WL#5599 MTS & recovery Refining and correcting two wl:s integration. The main achievement is events execution status is consistently recorded into the Worker and the central RL recovery tables. That was tested manually in rather agressive env where IO was used to reconnect randomly and load from Master contained Rotate events. TODO: to fix: rpl.rpl_parallel_conf_limits may not pass to address: Multi-stmt Query-log-event transaction case (see todo in sources). to destruct by Workers their executed events (was deferred until ev->update_pos started working). (Alfranio) to deploy mts_checkpoint_routine() call inside the successful event read branch of next_event(). Otherwise no calling happens when Coord is constanly busy with read/distribute. ------------------------------------------------------------ revno: 3229 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-04 19:14:50 +0200 message: merging from the repo wl5569 ------------------------------------------------------------ revno: 3228 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-04 15:45:02 +0000 message: Added mutex to the checkpoint_routine. ------------------------------------------------------------ revno: 3227 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-03 16:56:11 +0000 message: Implemented periodic checkpoint if parallel slave is enabled. ------------------------------------------------------------ revno: 3226 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-03 10:15:45 +0000 message: Fixed commit_positions() and removed unnecessary checkpoint thread. ------------------------------------------------------------ revno: 3225 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-02 20:13:12 +0200 message: manual merge to wl#5569 tree ------------------------------------------------------------ revno: 3224 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-02 19:46:46 +0200 message: wl#5569 MTS User interface related: set @@global.slave_parallel_workers= `non-zero` following with `START SLAVE` starts slave with so many Worker threads. That is non-zero value is defacto the slave parallel execution mode. Earlier introduced enum enum_slave_exec_mode SLAVE_EXEC_MODE_PARALLEL is withdrawn. Fixes rli->mts_pending_jobs_size statistics which might cause assert-crash otherwise. a silly c&p mistake of relay-log name change notification. Made a little clean-up including relocation of init-ion of workers related stuff into start_slave_workers(). Many changes in tests due to SLAVE_EXEC_MODE_PARALLEL and not only. ------------------------------------------------------------ revno: 3223 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-01 19:08:21 +0200 message: wl#5569 MTS The limit conditions such as WQ len, total WQ:s size related changes. Also a new test file is added. ------------------------------------------------------------ revno: 3222 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-30 16:39:40 +0200 message: merging from from wl#5569 repo containing wl#5599 integration ------------------------------------------------------------ revno: 3221 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-30 16:02:15 +0200 message: wl#5569 MTS Fixing group_relay_log_name change propagation from C to W; Garbage collection in the Partition-to-Worker hash is added with a parameter of how many records in the hash are tolerated w/o checking of the usage counter. Adding C-W synchronization due to: - overall WQ:s data max - hitting the limit of a WQ length Adding Flow Control infrastructure with - level of the hungry Worker forcing Coordinator to distribute eagerly symmetrically a Worker whose load is more than 100 % - hungry level is considered as fed-up. - nap time for C in case all WQ:s lengths are above the level. - a weight param to the base nap as a function of the number of fed-up W:s. TODO: UNTIL to force sequential exec; To fix ROWS_QUERY_LOG_EVENT corner case; to fix commented out // if (!ev) delete ev; after wl#5599 is merged (ev->update_pos() is done). ------------------------------------------------------------ revno: 3220 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-11-27 17:36:50 +0200 message: wl#5569 Providing relay-log name for wl#5599. Protocol of action on the C and W sides is described in rpl_rli_pdb.h. Erroring out in case of parallel exec and ROWS_QUERY_LOG_EVENT. (todo: the native sequential mode for the event needs some revision, in particular `delete ev' shall happen *always* in rli->cleanup_context not in two places as of current). ------------------------------------------------------------ revno: 3219 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-26 23:08:30 +0200 message: wl#5569 MTS Partitioning conflict detection and handling is implemented. A new option to run Query in parallel though incompatibly with Rows- case in that the default db not the actual db:s are used as the partition key. User interface gained the global var and the cmd line opt: slave_run_query_in_parallel (Welcome to the set! :-) ------------------------------------------------------------ revno: 3218 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-andrei timestamp: Fri 2010-11-26 16:15:37 +0000 message: There was a mismatching between the number of fields read and write and by consequence the read was failing for the Slave_worker. ------------------------------------------------------------ revno: 3217 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-25 11:03:54 +0200 message: wl#5569 merging with wl#5599 piece of code ------------------------------------------------------------ revno: 3216 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-25 10:47:39 +0200 message: wl#5569 Converting the prototype time db2w hash to be concurrent; Necessary inruduction of the least occupied Worker notion. It's currently computed as Worker having the least number of distributed partitions. Adding parallel support for Query_log_event; caution: 1. the session/default not the actual db as the key 2. may not have been tested against all use cases (e.g int vars) Fixing slave stop issues. ------------------------------------------------------------ revno: 3215 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-11-22 20:57:13 +0200 message: wl#5569 extinding futher interfaces to wl#5599 with propagating future_event_relay_log_pos to the Worker exec context. ------------------------------------------------------------ revno: 3214 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-11-20 19:23:42 +0200 message: wl#5569 MTS Worker pool start, stop, kills, error out implementation. ------------------------------------------------------------ revno: 3213 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-19 16:51:58 +0200 message: wl#5569 recovery interfaces for wl#5599 implementation. The essence of this patch is to provide GAQ object implimentation and valid life cycle. The checkpoint handler prior to call store methods of wl#5599 is supposed to invoke rli->gaq->move_queue_head(&rli->workers). See a simulation of that near ev->update_pos() of the mail sql thread loop. The checkpoint info is composed as instance of Slave_job_group to reside as rli->gap->lwm. Todo: uncomment + // delete ev; // after ev->update_pos() event is garbage once the real checkpoint has been done. Todo: the real implemention needs to take care of filing Slave_job_group::update_current_binlog as initially so at time of executing Rotate/FD methods. + // experimental checkpoint per each scheduling attempt + // logics of next_event() + + rli->gaq->move_queue_head(&rli->workers); ------------------------------------------------------------ revno: 3212 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-18 16:50:54 +0200 message: wl#5569 wl#5599 Recovery related. Prototyping the worker RLI instantiation, to be elaborated on. ------------------------------------------------------------ revno: 3211 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-18 16:00:52 +0200 message: wl#5569 MTS Extending the wl#5563 prototype gradually. This commit addresses: 1. recovery interface (a new Worker rli plus rli->gaq and pseudo-code for checkpoint to update GAQ and the central RLI recovery table. Wrt rli, C and W execute do_apply_event(c_rli) where c_rli is the central instance. C executes update_pos(c_rli), but W update_pos(w_rli). others: - decreased processing time for rpl_parallel, serial. ------------------------------------------------------------ revno: 3210 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2010-11-14 11:55:32 +0000 message: Post-push fix for WL#5599. ------------------------------------------------------------ revno: 3209 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-12 17:58:12 +0000 message: Post-push fix for WL#5599. ------------------------------------------------------------ revno: 3208 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-11 11:53:01 +0000 message: WL#5599 The patch changed the handler's functions, i.e. init_info, check_info, flush_info, remove_info and end_info and the related private member functions, in both file and table handlers, to accept an index that identifies the information that will be read or written. This is necessary now because the handlers will be used by the workers to read and write information from file(s) and table and there may be several workers running at the same time and thus an index is used to identify the worker that is accessing information. This change is also necessary for the multi-master replication as information from each master must be uniquely identified. ------------------------------------------------------------ revno: 3207 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-11-10 10:57:13 +0000 message: Refactory to start work on WL#5599. ------------------------------------------------------------ revno: 3206 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 13:34:18 +0000 message: Removed mysql-test/collections/mysql-next-mr.crash-safe.* in the WL#5569. ------------------------------------------------------------ revno: 3205 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 13:04:14 +0000 message: merge mysql-next-mr.crash-safe --> mysql-next-mr-wl5569 Conflicts: . sql/CMakeLists.txt . sql/Makefile.am . sql/sql_class.h . sql/rpl_slave.cc ------------------------------------------------------------ revno: 3204 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 11:39:37 +0000 message: merge mysql-next-mr-wl5563-labs --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3203 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 23:33:37 +0300 message: wl#5563 simplifying memory handling for the Coor-Workers transport to avoid sporadic crashes ------------------------------------------------------------ revno: 3202 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 21:19:56 +0300 message: wl#5563 leaving out a fine garbage collection. That task is unnessary to solve at prototyping time. Update-pos routine to be implemented is going to eliminated that piece of code ------------------------------------------------------------ revno: 3201 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 20:38:35 +0300 message: wl#5563 Extending the tests base to split the former rpl_parallel into two to run in serial exec mode as well. ------------------------------------------------------------ revno: 3200 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 11:49:00 +0300 message: wl#5563 improved test; fixed a delete issue that was used to crash; added @@global.slave_local_timestamp to fill in timestamp col slave clock value. Performance growth can be seen through the test. todo: merge with Alfranio work on hashing and dyn alloc of PFS obj:s. ------------------------------------------------------------ revno: 3199 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Wed 2010-09-15 14:51:49 +0300 message: wl#5563 tests for the wl. Number of workers and iterations can be tuned. todo: convert as param:s to pass to the test through mtr ------------------------------------------------------------ revno: 3198 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Mon 2010-09-13 18:22:41 +0300 message: wl#5563 adding an ingeneous no-stress-attempting-yet test that also fired an assert. Refined the Worker instance ref computing because cleanup_context() is executed by the sql-thread the coordinator as well ------------------------------------------------------------ revno: 3197 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Mon 2010-09-13 13:15:38 +0300 message: wl#5563 Rows-event parallelization basically is implemented although tested shallowly. Write access to rli central stuct by workers may not be eliminated fully at this phase. E.g that relates to errors. todo: to prove rli gets out of Worker scope todo: to provide a stress test ------------------------------------------------------------ revno: 3196 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Sat 2010-09-11 17:00:08 +0300 message: wl#5563 adding Rows-event limitted to one Worker support. Deferred deletion did not check emptyness of the list ------------------------------------------------------------ revno: 3195 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-10 20:36:07 +0300 message: wl#5563 correcting comments to indicate less limitations ------------------------------------------------------------ revno: 3194 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-10 20:32:39 +0300 message: WL#5563 Prototype for Slave parallelized by db name More progress to the WL in that the STMT binlog-format works while the conceptual limits are held. That is no query/transaction is allowed to deal with more than one db. Addressed a complication in that update pos method that is run by Coordinator belongs to Log_event hierarchy and therefore the event deletion now by Worker must be careful. Todo: 1. (High prior) fix Row-format complications 2. (Hight prior) Elaborate on the hash function to be a function on db text name 3. (Optional) Consider moving update_pos to the RLI class to get rid of the delete logics complication. How-to-use: The instuction can be found in comments of the previous commit, see there for more details. In brief though, the db names have to follow a pattern: `test[0-9]'. E.g test0, test1, test2, test3 for the default four Worker threads. Slave side has to set @@global.slave_exec_mode=PARALLEL; before START SLAVE. ------------------------------------------------------------ revno: 3193 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Thu 2010-09-09 21:43:16 +0300 message: WL#5563 Prototype for Slave parallelized by db name This is an intermediate commit that indicates some progress. Namely, the worker pool operates correctly and with signs of scalable performance. How to test: connection master; set @@global.binlog_format=statement; connection slave; set @@global.slave_exec_mode=PARALLEL; set @@global.binlog_format=mixed; show processlist; => IO, SQL threads + 4 workers by default change master to ... connection master; create database test0; create database test1; create database test2; create database test3; # create databases with magic names "test[0-9]+", where the number will index # a worker. create database test0; create database test1; create database test2; create database test3; # create tables. they are only of MyISAM type for now use test0; create table tm_1(a int, b int) engine=myisam; use ... # DML on tables: use test[0-3]; insert into tm_1 values (1,0); ... ... connection slave; # monitor CPU (visually this time: top etc) # check correctness e.g select count(*) from test[0-3].tm_1; connection master; select count(*) from test[0-3].tm_1; ****** ------------------------------------------------------------ revno: 3364 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 17:09:22 +0300 message: wl#5569 MTS Refining rpl_rotate_logs that could not produce deterministic output. The list of binlogs contained one binlog more than expected. ------------------------------------------------------------ revno: 3363 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 14:56:01 +0300 message: updating result files that were left incorrect by the last merge. ------------------------------------------------------------ revno: 3362 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-08-18 14:44:59 +0300 message: wl#5569 MTS Failure in recovery when binlog-checksum is active. The reason of the failure was in that parsing of relay log by MTS recovery gaps computing did not make sure to use the relay-log own FormatDescriptor events that contain checksumming info for all events in the log. Fixed with taking care to find out the checksum algorithm for every relay log as the first step of MTS recovery gaps computing. ------------------------------------------------------------ revno: 3361 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-08-17 11:21:23 +0300 message: merge from trunk forced to resolve few semantical conflicts caused by changes in THD::enter_cond() of the trunk. ------------------------------------------------------------ revno: 3360 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-27 08:56:14 +0100 message: Fixed failure in test rpl_mts_check_concurrency when running in the mts collection. ------------------------------------------------------------ revno: 3359 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-26 19:46:41 +0100 message: Added a test case that checks if MTS allows to concurrently access the replication tables, and as such, concurrently commit transactions that update different databases. ------------------------------------------------------------ revno: 3358 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 20:08:43 +0100 message: Configured rpl_parallel_switch_sequential to run in row and mixed mode to avoid cluttering the error log with messages on unsafe execution. ------------------------------------------------------------ revno: 3357 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 19:02:14 +0100 message: This patch contains the following fixes: . Removed suppressed warning introduced in the wrong test case (i.e. rpl_corruption) and put it in the correct one (i.e. rpl_row_corruption). . Introduced variable to avoid clutering the error log with several warning messages on unsafe execution. ------------------------------------------------------------ revno: 3356 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-22 11:01:12 +0100 message: This patch has the following changes: . Specific directories were created for the MTS runs in the default.push. . Warning message was suppressed in the rpl_corruption.test. . Annoying debug outputs were removed from the error log. However, this is a temporary solution as it forbids to enable traces. ------------------------------------------------------------ revno: 3355 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-20 11:56:40 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3354 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-19 22:26:30 +0300 message: wl#5569 MTS valgrind reported a stack on rpl_savepoint. The problem appears to be in that at computing slave_sql_running_state in show_mater_info() the sql thread proc_info pointer could refer to a value in a stack that has already gone. Fixed with making proc_info to point to a string literal. ------------------------------------------------------------ revno: 3353 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-19 17:46:43 +0100 message: Suppressed warning messages that could potentially cause problems while running mts crash safe test cases. ------------------------------------------------------------ revno: 3352 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-18 21:46:45 +0300 message: wl#5569 MTS Cosmetic changes are done to address readability and clearness of source code of the MTS patch. ------------------------------------------------------------ revno: 3351 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-18 14:52:44 +0300 message: wl#5569 MTS Inadvertently introduced hunk two rev:s back is reverted to please rpl_*_mts_crash_safe. ------------------------------------------------------------ revno: 3350 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-17 00:51:45 +0300 message: wl#5569 MTS fixing build issue for embedded. Public visibility for Rows_log_event::do_apply_event() is restored. ------------------------------------------------------------ revno: 3349 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-16 20:08:31 +0300 message: wl#5569 MTS The patch contains improvements after code review. Changes are mostly consmetic. ------------------------------------------------------------ revno: 3348 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-16 02:11:11 +0300 message: bug#12755663 MTS: RPL_CIRCULAR_FOR_4_HOSTS FAILS: CANT EXECUTE THE CURRENT EVENT GROUP MTS stopped with an error in the middle of the test. The reason is that a group of events from the slave itself was processed partly to modify the group position. In the following restart the wrong group bondary made slave either to error out or assert. Fixed with locating a possible race condition allowin Coordinator to ignore actual failed status of a Worker. So in the case of the test, the slave server group can't be started. Notice, this is a trial patch since I can't catch the failure on available to me hosts at all. ------------------------------------------------------------ revno: 3347 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-07-14 12:40:06 +0300 message: WL#5569 MTS further extensive rpl_circular_for_4_hosts exersices with --repeat 10 --parallel=8 revealed a race condition in that Coordinator might miss to catch not-running status for a Worker. That made Coordinator to skip only a part of a group of the slave server own events so the slave stops at not the bondary of a group. Fixed with moving marking of the errored-out Worker as failed prior to its APH entries release. TODO: notice there can be a possibility to stop at not the boundary due to graceful STOP SLAVE if one is run at time of skipping self-originated events. However this issue belongs to STS and might be similar with BUG@12604951 and BUG@12728160. ------------------------------------------------------------ revno: 3346 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-07-14 08:03:55 +0100 message: Post-push fixes for WL#5569 Injecting faults while updating a myisam table requires to flush the changes before committing suicide. So we have introduced the follwing code: DBUG_EXECUTE_IF("crash_after_commit_and_update_pos", - DBUG_SUICIDE();); + sql_print_information("Crashing crash_after_commit_and_update_pos."); + flush_info(TRUE); + DBUG_SUICIDE(); Besides we improved some comments. ------------------------------------------------------------ revno: 3345 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-13 16:23:57 +0100 message: WL#5569 ------------------------------------------------------------ revno: 3344 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-13 00:10:43 +0300 message: wl#5569 MTS merge trunk -> wl5569-tree ------------------------------------------------------------ revno: 3343 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-12 23:36:17 +0300 message: wl#5569 MTS adding suppression due to expected warning to rpl_circurlar_for_4_hosts; decreasing a loop limit in rpl_parallel_switch_sequential in case of statement format. ------------------------------------------------------------ revno: 3342 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-12 14:46:23 +0300 message: WL#5569 MTS Fixing code and test due to rpl.rpl_circular_for_4_hosts mismatch failure, like http://pb2.norway.sun.com/?action=archive_download&archive_id=3608382. The reason of the mismatch was that when having two group of events to execute, the first for a Worker and the 2nd for Coordinator, Coordinator waited for the 1st group completion but did not verify success of synchronization. So in a case of the failed applying of the 1st group processing of the 2nd could find an inconsistent state to end up with a segfault (even though only the mismatch has been seen so far). ------------------------------------------------------------ revno: 3341 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-10 22:40:01 +0100 message: Avoiding busy waiting when running mts recovery tests. ------------------------------------------------------------ revno: 3340 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-09 23:11:58 +0100 message: Removed --slave-checkpoint-period from MTS test cases. ------------------------------------------------------------ revno: 3339 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-09 23:08:07 +0100 message: Improved test cases for the WL#5569. ------------------------------------------------------------ revno: 3338 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 22:40:52 +0300 message: wl#5569 MTS The patch refines logics of applying phase of MTS-recovery to always applying events that are for Coordinator; fixes few tests to make them passable on PB; makes GAQ size to be of checkpoint_group value. ------------------------------------------------------------ revno: 3337 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 07:54:34 +0100 message: Reduced the timeout period to run the checkpoint routine by setting slave-checkpoint-period to 30. ------------------------------------------------------------ revno: 3336 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-08 07:44:35 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3335 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-07-06 12:46:05 +0300 message: wl#5569 MTS refining wait for db-hash entry release at event distribution. The graceful STOP is not accepted at this point so Coordinator continues to stay in a loop. ------------------------------------------------------------ revno: 3334 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-07-05 20:43:04 +0300 message: bug#12719875 possible MTS recovery issue. MTS stopped with an error after failing to apply an event. It turned out that the event was sceduled incorrectly due to earlier stop by Single-Threaded Slave not at the group boundary but rather in the middle of it. Fixed with forcing CREATE..SELECT be logged as two groups. The CREATE-TABLE group is surrounded with its own BEGIN/COMMIT braces. ------------------------------------------------------------ revno: 3333 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-07-04 18:14:09 +0300 message: wl#5569 MTS Adding a rule to run PB with all suites in MTS with binlog-format ROW. ------------------------------------------------------------ revno: 3332 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-03 23:29:34 +0300 message: wl5569 MTS cleanup in one file. ------------------------------------------------------------ revno: 3331 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-07-03 23:16:02 +0300 message: wl5569 MTS bzr commit mail address changed; a minor cleanup to make mts_is_worker() with const argument; releasing a test to run in MTS. ------------------------------------------------------------ revno: 3330 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-07-02 08:58:56 +0100 message: Fixed use of the performance schema in the replication code and concurrency issue in the IO Thread. In particular, the IO Thread was calling flush_master_info without grabbing locks. ------------------------------------------------------------ revno: 3329 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 16:41:35 +0300 message: wl5569 MTS merging from the main repo. ------------------------------------------------------------ revno: 3328 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 15:48:25 +0300 message: wl#5569 MTS the final cleanup patch. There are few glitches that were considered as tolerable at least for time of the total wl's code is being reviewed. That includes: - no support to old load-data events - no support for FK to add to the list, there are few places in the patch that suggests to deploy error branches each time flush_info() is called. ------------------------------------------------------------ revno: 3327 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-07-01 13:16:52 +0300 message: wl#5569 MTS The patch cleans up some host of code. ------------------------------------------------------------ revno: 3326 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-28 11:30:18 +0300 message: wl#5569 MTS replacing views with regular tables for consistency verification in rpl_parallel_innodb. Also a minor cleanup in rpl_parallel is done. ------------------------------------------------------------ revno: 3325 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-27 20:31:45 +0300 message: wl#5569 MTS Cleanup and addressing sporadic rpl_temp_table_mix_row failure in post-execution mtr.check_testcase(). The check of the test failure was caused by faulty optimization in avoiding to migrate temporary tables from Coordinator to Workers in case of rows-event assignement. while it's correct with the homogenous rows-event only load, the mixture can fail. Fixed with removing the optimization so map_db_to_worker() always relocates which is somewhat suboptimal and should be improved in future. ------------------------------------------------------------ revno: 3324 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-27 13:12:52 +0100 message: Ensured that updates to the worker_info_repository are transactional and fixed the slave_checkpoint_group_basic test case. ------------------------------------------------------------ revno: 3323 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-26 13:02:59 +0100 message: Fixed test case. ------------------------------------------------------------ revno: 3322 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-06-25 15:14:24 +0100 message: Introduced test case for recovery with MTS and fixed bugs in recovery. ------------------------------------------------------------ revno: 3321 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 15:38:19 +0300 message: wl#5569 MTS This patch makes a bit of cleanup, addresses one memory-allocation todo and completes fixing valgrind report (rpl_parallel_start_stop) due to strings allocation in Slave_job_group items. ------------------------------------------------------------ revno: 3320 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 12:38:34 +0300 message: wl#5569 MTS this patch completes the previous one to fixes a result file and make the innodb specific test verification to base on tables not views. ------------------------------------------------------------ revno: 3319 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-24 00:11:22 +0300 message: wl#5569 MTS this is an exploratory patch to sort out if verification method what was based on views has its own not related to mts flaw. The patch calls verification macro on the tables that required some adjustment. ------------------------------------------------------------ revno: 3318 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-23 07:56:15 +0300 message: wl#5569 MTS fixing results of mysqld--help-win. ------------------------------------------------------------ revno: 3317 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 19:20:40 +0100 message: merge mysql-next-mr-wl5569 (local) --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3316 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 19:17:43 +0100 message: In some platforms, such as Windows, thread's wait time is stored in 100ns units. However, when computing the difference between two values, the result value was not multiplied by 100. Besides, there was a casting problem when the aforementioned result value was assigned to an ulong. ------------------------------------------------------------ revno: 3315 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-22 18:54:23 +0100 message: Fixed how mts copes with recovery. ------------------------------------------------------------ revno: 3314 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-21 19:10:54 +0300 message: wl#5569 MTS Fixing valgrind warnings. ------------------------------------------------------------ revno: 3313 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-21 18:15:43 +0300 message: wl#5569 MTS rpl_parallel_start_stop.test could fail sporadicaly with timeout. ------------------------------------------------------------ revno: 3312 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 23:21:56 +0100 message: merge mysql-next-mr-wl5569 (local) --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3311 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 23:19:06 +0100 message: Fixed error when computing the Lower-Water-Mark. If two or more jobs were removed from the Group of assigned jobs and one of the jobs had a non-empty group relay log but the last one had an empty group relay log. The Lower-Water-Mark was not correctly updated, because the algorithm assumed that the group relay log was null. ------------------------------------------------------------ revno: 3310 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-20 11:52:44 +0100 message: Fixed valgrind errors. Slave_job_group was silently being cast to LOG_POS_COORD while calling sort_dynamic(&above_lwm_jobs, (qsort_cmp) mts_event_coord_cmp) and by consequence mts_event_coord_cmp(LOG_POS_COORD *, LOG_POS_COORD *). This had two problems: . The first two entries in the Slave_job_group were not a pointer to a char * and my_offset. . Even if the first two entries were char * and my_offset, such casting could lead to alignment problems. To fix the problem, we avoid this casting. ------------------------------------------------------------ revno: 3309 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 19:14:50 +0300 message: wl#5569 MTS fixing slave_transaction_retries_basic_64.result ------------------------------------------------------------ revno: 3308 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 16:11:25 +0300 message: wl#5569 MTS fixing tests. ------------------------------------------------------------ revno: 3307 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 12:33:36 +0300 message: wl#5569 MTS Fixing rpl.rpl_mixed_binlog_max_cache_size that revealed incorrect asynchronous handling of a Rotate event which does not split the current group and therefore has to be executed after all previously scheduled events. Fixing sensetivity of two other tests to mtr's invocation environment that includes inital values of slave_parallel_workers and slave_transaction_retries. ------------------------------------------------------------ revno: 3306 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-19 09:04:19 +0100 message: Fixed some windows failures. ------------------------------------------------------------ revno: 3305 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-06-18 19:58:21 +0100 message: Fixed some recovery issues. ------------------------------------------------------------ revno: 3304 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 21:01:58 +0300 message: wl#5569 MTS fixing tests and a segfault at the end of handle_slave_sql() happened after worker initialization failed (e.g rpl_row_log on win). ------------------------------------------------------------ revno: 3303 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 18:34:16 +0300 message: wl#5569 MTS fixing tests. ------------------------------------------------------------ revno: 3302 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 14:00:41 +0300 message: wl#5569 MTS fixing rpl_row_basic_3innodb similarly to the previous patch. ------------------------------------------------------------ revno: 3301 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 13:51:59 +0300 message: wl#5569 MTS fixing few tests. 1. Policy is implemented for reacting with a warning in a case of failing worker leaves the total slave state with gaps thereby inconsistent. 2. Two tests that were used to time out due to reset master/slave was disabled in there. ------------------------------------------------------------ revno: 3300 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-06-17 02:24:59 +0100 message: Removed unnecessary test cases and augment others in order to test recovery. ------------------------------------------------------------ revno: 3299 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-16 19:46:22 +0300 message: wl#5569 MTS fixing slave_parallel_workers_basic and rpl_stop_middle_group which cant run in MTS ------------------------------------------------------------ revno: 3298 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-16 11:29:53 +0300 message: wl#5569 MTS adding new tests to sys_vars.\ ------------------------------------------------------------ revno: 3297 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:41:32 +0100 message: WL#5569 Adding a global suppression for the warning that may appear when stopping the slave sql thread in the middle of a group. This should affect MTS mode only. ------------------------------------------------------------ revno: 3296 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:40:41 +0100 message: WL#5569 Renames worker-info-repository to slave-worker-info-repository in some tests option files. ------------------------------------------------------------ revno: 3295 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 01:32:37 +0100 message: WL#5569 More test fixes. Removing remaining prefixes 'mts' from mts variables, which have been renamed recently. ------------------------------------------------------------ revno: 3294 committer: Luis Soares <luis.soares@oracle.com> branch nick: mysql-trunk-wl5569 timestamp: Thu 2011-06-16 00:27:20 +0100 message: WL#5569 Fixing rpl_parallel result file. ------------------------------------------------------------ revno: 3293 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:41:33 +0300 message: wl#5569 MTS correcting --slave-parallel-workers in few more files ------------------------------------------------------------ revno: 3292 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:31:46 +0300 message: wl#5569 MTS correcting --slave-parallel-workers in collections/default.push ------------------------------------------------------------ revno: 3291 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 20:12:11 +0300 message: wl#5569 MTS Cleanup, including 1. decreasing number and renaming system variables. Important for debugging command line options are replaced with reasonble constant values and only necessary are retained. 2. Small encapsulation in ha_blackhole.cc is done. ------------------------------------------------------------ revno: 3290 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-06-15 15:59:23 +0100 message: Fixed replication valgring failures caused by the MTS. ------------------------------------------------------------ revno: 3289 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 21:23:13 +0300 message: wl#5569 MTS wl#5754 Query event parallel execution Fixing failing tests and a failure in gathering accessed databases that was caused by a recent merge from trunk. ------------------------------------------------------------ revno: 3288 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 13:35:20 +0300 message: merge from trunk ------------------------------------------------------------ revno: 3287 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-06-14 12:27:38 +0300 message: wl#5569 MTS Fixing failing tests due to a. a flaw in `isolated parallel' mode implementation. Isolation applies to a group of event rather than to an instance. And event that contains over-max accessed db:s or event from Old master trigger marking the current being scheduled group. Such group will be executed having all prior scheduled done and nomore will be scheduled until the group is done. b. Notification to Coordinator about errored-out Worker is corrected. ------------------------------------------------------------ revno: 3286 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-12 22:33:32 +0300 message: wl#5569 MTS making default.push to run rpl suite with non-default --mts-slave-parallel-workers > 0 in all three format/mode (row,stmt, mixed). The default is run for all suites in mixed mode and rpl suites with row+ps, stmt formats. ------------------------------------------------------------ revno: 3285 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-12 22:05:05 +0300 message: wl#5569 MTS manual merge with few fixes for segfault of the last merge from the trunk etc, compilation issue on embedded. ------------------------------------------------------------ revno: 3284 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-09 18:35:59 +0100 message: Post-fixes for merge. Fixed compilation in Windows and removed an used options. ------------------------------------------------------------ revno: 3283 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-06-09 16:27:47 +0100 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3282 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-06-06 13:51:19 +0300 message: wl#5569 MTS STOP SLAVE now stops consistently w/o gaps, KILL shall be used for an urgent stop, an error case behaves like the killed. For instance, a Worker errors out, it sends KILL to Coordinator through THD::awake(), and Coordinator kill the rest through setting a special Worker-running status to killed (which breaks the read-exec loop of a Worker). ------------------------------------------------------------ revno: 3281 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-06-05 20:01:51 +0300 message: wl#5569 MTS More cleanup, fixes due to found issues when running tests, some improvements incl in stopping Workers to make routine to distinguish between killed and gracefully stopped cases so in the end STOP SLAVE will guarantee consistent state (some todo remains still). ------------------------------------------------------------ revno: 3280 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-05-30 13:05:07 +0300 message: WL#5569 MTS WL#5754 Query event parallel applying ----------------------------------------------------------------- Aggregating 7 commits that are not pushed yet to the wl5569 repo. Find comments for each cset below. ------------------------------------------------------------------ The current patch addresses concurrent updating slave_open_temp_tables status counter. The former declaration of the underlying server variable is changed from ulong to int32. While that might affect (shrink) the actual range, there has been no specified range and now after the number of bits is the same on all platforms the range cat be set to be [0, max(int32)] ****** wl#5569 MTS Wl#5754 Query event parallel appying wl#5599 MTS recovery The patch includes some cleanup, including one for temp tables support, realization of few todo:s. ****** wl#5569 MTS wl#5754 Query event parallel applying More cleanup is done; Fixing temp tables manipulation. Asserting an impossible to support use case of group of events not wrapped with BEGIN/COMMIT. Todo: recognize old master binlog to refuse to run in parallel. ****** wl#5569 MTS Implementation of giving out the applier role to Worker for all cases but ones dealing with the Coordinators state. That includes Query event with over-max-db:s and Load-data related events. The current patch also makes old master binlog be handled by MTS though sometimes e.g for Query event to switch to the sequential mode. Fixing a race condition making C to wait endlessly if a Worker has exitted due to its applying error. ****** wl#5569 MTS correcting an assert that was used to fire as warned in the previous commit. Parallel feature tests pass now. ****** wl#5569 MTS This patch contains cleanup and simplification of logics of handling some events sequentially by Coordinator and adds memory-allocation failure branch to workers starting routine. ****** wl#5569 MTS An intermediate patch to address few issues raised by reviewers. To sum up, it's about cleanup and logics simplification of event distribution to Worker and consequent actions. Some efforts were paid to support Old Master Begin-less group of events. ------------------------------------------------------------ revno: 3279 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-05-24 17:29:35 +0300 message: WL#5569 MTS WL#5754 Query parallel appying Changing implementation of temporary tables support in MTS. Cleanup, fixing few todo:s and few potential issues found. ------------------------------------------------------------ revno: 3278 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2011-05-19 12:36:28 +0300 message: wl#5569 MTS Support for ROWS_QUERY_LOG_EVENT is added. It required refactoring of its handling in the canonical sequential mode. The event life suggests its behavior similar to objects associated with Table_map, in particural, its destoying to occur at the end-of-statement time. Tested against existing ROWS_QUERY_LOG_EVENT feature tests incl rpl_row_ignorable_event in both sequential and parallel mode. ------------------------------------------------------------ revno: 3277 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2011-05-16 22:43:58 +0300 message: wl#5569 MTS Simplifying Coordinator-Worker interfaces. In essence after this patch Worker execute events in its private context (class Slave_worker :public Relay_log_info). The only exception is Query referring to temporary table. The temp:s are maintained in the Coordinator's "central" rli; removing some dead code; performing a lot of cleanup. There are few todo items incl: 1. To implement several todo:s scattered across MTS' code and tests (e.g to restore protected for few members of RLI of rpl_rli.h); 2. to cover Rows_query_log_event that currently can cause hanging (e.g rpl_parallel_fallback) 3. To sort out names of classes based on Rpl_info, possibly remove Rpl_info_worker ------------------------------------------------------------ revno: 3276 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2011-05-06 21:33:32 +0300 message: wl#5569 MTS improving benchmarking test. ------------------------------------------------------------ revno: 3275 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-04-06 15:51:58 +0300 message: wl#5569 MTS Statistics for Workers and Coordinator incl waiting times, sleeping is reported now into the error log as slave stopping time. ------------------------------------------------------------ revno: 3274 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2011-04-05 19:26:37 +0300 message: wl#5569 MTS restoring previous 4 default workers that rpl_parallel works with. ------------------------------------------------------------ revno: 3273 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-04-03 13:07:30 +0300 message: wl#5569 MTS Benchmarking related patch uniforms rpl_parallel to be run with arbitrary number of workers, db:s, tables, etc. TODO: to restore the final constinency check which is given out temporary while i could not find a way to leave it surrounded with a --dis/en-able* stanza. ------------------------------------------------------------ revno: 3272 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2011-04-02 14:32:02 +0300 message: wl#5569 MTS a test file for benchmarking is added. Benchmarking results can be gained by extracting the master side generating and the slave side applying times like in the following loop: workers=6; for n in `seq 1 3`; do echo; echo loop: $n; echo; my_mtr.sh --mysqld=--mts-slave-parallel-workers=$workers \ rpl_parallel_benchmark --mysqld=--binlog-format=statement \ && cat /dev/shm/var/mysqld.2/data/test/delta.out >> p${workers}_stmt.out 2>&1; done ------------------------------------------------------------ revno: 3271 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-03-30 17:11:24 +0300 message: wl#5754 Query event parallel execution Small cleanup for comments as requested by reviewer. ------------------------------------------------------------ revno: 3270 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2011-02-27 19:35:25 +0200 message: WL#5754 Query event parallel execution Bundling together implementation the whole DML+DDL Query parallel support. That includes: The earlierst four rev:s to cut off the DML stage of the parallel query project from the following devoted to DDL. The four skeleton parallel applying of Queries containing a temporary table, and implement a core of the design that is the DML queries. Queries can contain arbitrary features including temp tables. The DDL part also refined few items related to the general low-level design. In particular, of the mark of the over-max db:s in the updated-db:s status var is turned to be another new constant value. The very last patch to the bundle addresses the last review mail notes. ------------------------------------------------------------ revno: 3269 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-01-12 01:01:02 +0200 message: merging from mysql-trunk ------------------------------------------------------------ revno: 3268 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2011-01-12 00:54:12 +0200 message: wl#5569 MTS fixing the worker threads start/stop. ------------------------------------------------------------ revno: 3267 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-27 18:54:41 +0000 message: merge mysql-trunk --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3266 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-24 01:57:03 +0200 message: wl#5569 MTS the timed-wait loop of SQL thread required a break-through parameter in case the signal missed in action and just timeout would be reported ------------------------------------------------------------ revno: 3265 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 19:03:42 +0200 message: merging from the repo wl5569 ------------------------------------------------------------ revno: 3264 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 17:49:19 +0200 message: wl#5569 MTS fixing corner cases that mtr-testing with mts workers against stardard suites reveal. ------------------------------------------------------------ revno: 3263 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 16:00:28 +0200 message: wl#5569 MTS: refining another assert that can force C to delete events that are skipped with the slave skip counter ------------------------------------------------------------ revno: 3262 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 15:34:02 +0200 message: wl#5569 MTS Correcting an assert that is hit by few tests. ------------------------------------------------------------ revno: 3261 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 13:27:15 +0200 message: wl#5569 MTS merging from the repo. ------------------------------------------------------------ revno: 3260 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-23 13:25:31 +0200 message: wl#5569 MTS fixing failing tests. ------------------------------------------------------------ revno: 3259 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 20:34:26 +0200 message: wl#5569 MTS merging from the repo. ------------------------------------------------------------ revno: 3258 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 20:31:13 +0200 message: wl#5569 MTS fixing tests failure when mtr runs --mts_slave_parallel_workers != 0. rpl000010 is a representative. Fixed with identifying, marking, running carefully ev->update_pos() and destroying an event that can split a group of events to force part to be in different relay logs. ------------------------------------------------------------ revno: 3257 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-22 13:57:18 +0200 message: wl#5569 MTS and wl#5599 MTS recovery The general recovery implementation is finished by this patch. Tested against ./mtr rpl_parallel_conf_limits. Warning, ./mtr rpl_parallel_conf_limits rpl_parallel_conf_limits ... can fail at the 2nd etc test because of no removal of Worker tables happens at RESET SLAVE. ------------------------------------------------------------ revno: 3256 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 22:12:30 +0200 message: wl#5569 MTS slave_worker_info def is updated in the system db. ------------------------------------------------------------ revno: 3255 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 21:34:58 +0200 message: merging with repo ------------------------------------------------------------ revno: 3254 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-21 21:31:29 +0200 message: wl#5569 MTS Recovery routine part I: gathering the group recovery bitmap. ------------------------------------------------------------ revno: 3253 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 22:18:33 +0000 message: WL#5599 Fixed routine to compute the bitmap of executed events. ------------------------------------------------------------ revno: 3252 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 21:37:48 +0200 message: wl#5569 MTS adding checkpoint relay_log_name,pos as necessary part to locate a relay-log for recovery. Tested with rpl_parallel. ------------------------------------------------------------ revno: 3251 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-20 17:58:58 +0200 message: wl#5569 MTS manual merging from the repo and correcting GAQ processing with introducing a volatile byte to indicate whether an item is busy or released. ------------------------------------------------------------ revno: 3250 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-18 21:00:23 +0200 message: wl#5569 MTS fixing --mts-exp-slave-run-query-in-parallel=1 case when Query-log-event can be run in parallel incl DML and DDL. The feature is `exp'erimental still can be tried while there are no temp tables involved neither a db different than the session's default is modified by the query. Tested: Changes sustain mtr rpl_parallel --mysqld=--mts-exp-slave-run-query-in-parallel=1 --mysqld=--binlog-format=statement ------------------------------------------------------------ revno: 3249 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-17 14:46:15 +0200 message: wl#5569 MTS fixing PB2 failures, incl valgrind issues, long exec time and asserting in a test. ------------------------------------------------------------ revno: 3248 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-17 00:00:47 +0200 message: merge from wl#5569 repo to local branch rpl_sequential opt files are added to avoid mtr give up to process a bulk of unsafe warnings. ------------------------------------------------------------ revno: 3247 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-16 23:41:45 +0200 message: wl#5569 MTS Adding transparent support/fallback to the sequential execution cases of 1. Query-log-event 2. Rows_query_log_event info event Both cases can be fully parallelized in future project. Fixing an issue in move_queue_head() that was surficed as an assert in Slave_worker::slave_worker_group_ends(). Fixing destoying an event by Worker. ------------------------------------------------------------ revno: 3246 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-14 16:46:20 +0200 message: merge from wl5569 repo ------------------------------------------------------------ revno: 3245 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-14 10:57:16 +0200 message: wl#5569 MTS a light cleanup to arrange the option/system var names properly - mts_-prefixing, and _exp prefixing for experimental features needed for benchmarking (mts_exp_slave_local_timestamp) or suppored limitly (mts_exp_slave_run_query_in_parallel for Query-log-event). Fixing GAQ size. It might be too tight e.g in case of the max WQ length of 1; tested with running rpl_parallel supplying --mts-slave-worker-queue-len-max=1. ------------------------------------------------------------ revno: 3244 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 18:53:32 +0200 message: wl#5569 MTS fixing a valgrind stack cauased by extra pfs-keys/cond_var. Those are removed with Alfranio`s consent ------------------------------------------------------------ revno: 3243 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 17:57:01 +0200 message: wl#5569 MTS fixing a set of valgrind warning cauased by a c&p ------------------------------------------------------------ revno: 3242 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-12-13 16:52:50 +0200 message: wl#5569 MTS updating results for few tests. ------------------------------------------------------------ revno: 3241 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-11 21:00:47 +0200 message: wl#5569 MTS 1. Fixing recovery related issue of DBUG_ASSERT(rli->get_event_relay_log_pos() >= BIN_LOG_HEADER_SIZE); at slave start with shifting mts_recovery_routine() at front of the assert. 2. Making SKIP-ed event to commit to the central RLI. That is correct since Workers are not executing anything at this time. 3. Fixing the default for mts_checkpoint_period which should not be zero normally. Zero makes sense solely for debugging (so we may stress that through VALID_RANGE(1,...). 4. Introduced a general mts-unsupported error/warning to apply to cases of non-zero parallel workers and a feature that parallelization can't work with. ------------------------------------------------------------ revno: 3240 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-10 18:25:27 +0200 message: merge from wl5569 repo to a local branch ------------------------------------------------------------ revno: 3239 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-10 17:50:03 +0200 message: wl#5569 MTS Improving GAQ in a) limit size to be capable to hold items while all WQ:s are full b) move_queue_head() contained a flaw to make no progress falsely c) never let to enque in GAQ while it's full ------------------------------------------------------------ revno: 3238 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-09 19:46:27 +0200 message: merge from wl5569 repo to a local branch ------------------------------------------------------------ revno: 3237 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-09 19:45:02 +0200 message: wl#5569 MTS Integration with wl#5599 recovery for MTS and fixing two asserts. One is due to missed cleanup of errored-out rows-events; the other is a work-around on w->curr_group_exec_parts->dynamic_ids is initialized to have one partition on the Worker startup, but it should not. ------------------------------------------------------------ revno: 3236 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 13:59:07 +0000 message: WL#5599 Fixed warning messages. ------------------------------------------------------------ revno: 3235 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 12:59:07 +0000 message: WL#5599 Fixed test cases. ------------------------------------------------------------ revno: 3234 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 01:30:32 +0000 message: WL#5599 Fixed failures in test cases. ------------------------------------------------------------ revno: 3233 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-08 00:33:48 +0000 message: merge mysql-trunk --> mysql-next-mr-wl5569 Conflicts: . mysql-test/r/log_tables_upgrade.result . mysql-test/r/mysql_upgrade.result . mysql-test/r/mysql_upgrade_ssl.result . mysql-test/r/mysqlcheck.result . mysql-test/suite/perfschema/r/pfs_upgrade_lc0.result . mysql-test/suite/rpl/t/disabled.def . mysql-test/suite/sys_vars/r/all_vars.result . mysql-test/t/system_mysql_db_fix40123.test . mysql-test/t/system_mysql_db_fix50030.test . mysql-test/t/system_mysql_db_fix50117.test . sql/log_event.cc . sql/log_event.h . sql/rpl_mi.h . sql/rpl_slave.cc . sql/share/errmsg-utf8.txt ------------------------------------------------------------ revno: 3232 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-07 20:01:39 +0200 message: manual merge with a piece of recovery support on repo. rpl_parallel hits an assert that Alfranio is fixing ------------------------------------------------------------ revno: 3231 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-12-07 19:35:16 +0200 message: wl#5569 MTS Testing related fixes incl master_pos_wait() support and thereafter replacing sleeps with the functioning sync_slave_with_master; Fixing the limitted Q-log-event parallelization. After the fixing mixture of rows- and Q- transactions can run concurrently. Q-transaction will be treated sequentially by default. ------------------------------------------------------------ revno: 3230 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2010-12-05 22:04:17 +0200 message: wl#5569 WL#5599 MTS & recovery Refining and correcting two wl:s integration. The main achievement is events execution status is consistently recorded into the Worker and the central RL recovery tables. That was tested manually in rather agressive env where IO was used to reconnect randomly and load from Master contained Rotate events. TODO: to fix: rpl.rpl_parallel_conf_limits may not pass to address: Multi-stmt Query-log-event transaction case (see todo in sources). to destruct by Workers their executed events (was deferred until ev->update_pos started working). (Alfranio) to deploy mts_checkpoint_routine() call inside the successful event read branch of next_event(). Otherwise no calling happens when Coord is constanly busy with read/distribute. ------------------------------------------------------------ revno: 3229 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-04 19:14:50 +0200 message: merging from the repo wl5569 ------------------------------------------------------------ revno: 3228 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-12-04 15:45:02 +0000 message: Added mutex to the checkpoint_routine. ------------------------------------------------------------ revno: 3227 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-03 16:56:11 +0000 message: Implemented periodic checkpoint if parallel slave is enabled. ------------------------------------------------------------ revno: 3226 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-12-03 10:15:45 +0000 message: Fixed commit_positions() and removed unnecessary checkpoint thread. ------------------------------------------------------------ revno: 3225 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-02 20:13:12 +0200 message: manual merge to wl#5569 tree ------------------------------------------------------------ revno: 3224 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-12-02 19:46:46 +0200 message: wl#5569 MTS User interface related: set @@global.slave_parallel_workers= `non-zero` following with `START SLAVE` starts slave with so many Worker threads. That is non-zero value is defacto the slave parallel execution mode. Earlier introduced enum enum_slave_exec_mode SLAVE_EXEC_MODE_PARALLEL is withdrawn. Fixes rli->mts_pending_jobs_size statistics which might cause assert-crash otherwise. a silly c&p mistake of relay-log name change notification. Made a little clean-up including relocation of init-ion of workers related stuff into start_slave_workers(). Many changes in tests due to SLAVE_EXEC_MODE_PARALLEL and not only. ------------------------------------------------------------ revno: 3223 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-12-01 19:08:21 +0200 message: wl#5569 MTS The limit conditions such as WQ len, total WQ:s size related changes. Also a new test file is added. ------------------------------------------------------------ revno: 3222 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-30 16:39:40 +0200 message: merging from from wl#5569 repo containing wl#5599 integration ------------------------------------------------------------ revno: 3221 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-30 16:02:15 +0200 message: wl#5569 MTS Fixing group_relay_log_name change propagation from C to W; Garbage collection in the Partition-to-Worker hash is added with a parameter of how many records in the hash are tolerated w/o checking of the usage counter. Adding C-W synchronization due to: - overall WQ:s data max - hitting the limit of a WQ length Adding Flow Control infrastructure with - level of the hungry Worker forcing Coordinator to distribute eagerly symmetrically a Worker whose load is more than 100 % - hungry level is considered as fed-up. - nap time for C in case all WQ:s lengths are above the level. - a weight param to the base nap as a function of the number of fed-up W:s. TODO: UNTIL to force sequential exec; To fix ROWS_QUERY_LOG_EVENT corner case; to fix commented out // if (!ev) delete ev; after wl#5599 is merged (ev->update_pos() is done). ------------------------------------------------------------ revno: 3220 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-11-27 17:36:50 +0200 message: wl#5569 Providing relay-log name for wl#5599. Protocol of action on the C and W sides is described in rpl_rli_pdb.h. Erroring out in case of parallel exec and ROWS_QUERY_LOG_EVENT. (todo: the native sequential mode for the event needs some revision, in particular `delete ev' shall happen *always* in rli->cleanup_context not in two places as of current). ------------------------------------------------------------ revno: 3219 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-26 23:08:30 +0200 message: wl#5569 MTS Partitioning conflict detection and handling is implemented. A new option to run Query in parallel though incompatibly with Rows- case in that the default db not the actual db:s are used as the partition key. User interface gained the global var and the cmd line opt: slave_run_query_in_parallel (Welcome to the set! :-) ------------------------------------------------------------ revno: 3218 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-andrei timestamp: Fri 2010-11-26 16:15:37 +0000 message: There was a mismatching between the number of fields read and write and by consequence the read was failing for the Slave_worker. ------------------------------------------------------------ revno: 3217 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-25 11:03:54 +0200 message: wl#5569 merging with wl#5599 piece of code ------------------------------------------------------------ revno: 3216 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-25 10:47:39 +0200 message: wl#5569 Converting the prototype time db2w hash to be concurrent; Necessary inruduction of the least occupied Worker notion. It's currently computed as Worker having the least number of distributed partitions. Adding parallel support for Query_log_event; caution: 1. the session/default not the actual db as the key 2. may not have been tested against all use cases (e.g int vars) Fixing slave stop issues. ------------------------------------------------------------ revno: 3215 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Mon 2010-11-22 20:57:13 +0200 message: wl#5569 extinding futher interfaces to wl#5599 with propagating future_event_relay_log_pos to the Worker exec context. ------------------------------------------------------------ revno: 3214 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sat 2010-11-20 19:23:42 +0200 message: wl#5569 MTS Worker pool start, stop, kills, error out implementation. ------------------------------------------------------------ revno: 3213 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-19 16:51:58 +0200 message: wl#5569 recovery interfaces for wl#5599 implementation. The essence of this patch is to provide GAQ object implimentation and valid life cycle. The checkpoint handler prior to call store methods of wl#5599 is supposed to invoke rli->gaq->move_queue_head(&rli->workers). See a simulation of that near ev->update_pos() of the mail sql thread loop. The checkpoint info is composed as instance of Slave_job_group to reside as rli->gap->lwm. Todo: uncomment + // delete ev; // after ev->update_pos() event is garbage once the real checkpoint has been done. Todo: the real implemention needs to take care of filing Slave_job_group::update_current_binlog as initially so at time of executing Rotate/FD methods. + // experimental checkpoint per each scheduling attempt + // logics of next_event() + + rli->gaq->move_queue_head(&rli->workers); ------------------------------------------------------------ revno: 3212 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-18 16:50:54 +0200 message: wl#5569 wl#5599 Recovery related. Prototyping the worker RLI instantiation, to be elaborated on. ------------------------------------------------------------ revno: 3211 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-18 16:00:52 +0200 message: wl#5569 MTS Extending the wl#5563 prototype gradually. This commit addresses: 1. recovery interface (a new Worker rli plus rli->gaq and pseudo-code for checkpoint to update GAQ and the central RLI recovery table. Wrt rli, C and W execute do_apply_event(c_rli) where c_rli is the central instance. C executes update_pos(c_rli), but W update_pos(w_rli). others: - decreased processing time for rpl_parallel, serial. ------------------------------------------------------------ revno: 3210 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Sun 2010-11-14 11:55:32 +0000 message: Post-push fix for WL#5599. ------------------------------------------------------------ revno: 3209 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Fri 2010-11-12 17:58:12 +0000 message: Post-push fix for WL#5599. ------------------------------------------------------------ revno: 3208 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Thu 2010-11-11 11:53:01 +0000 message: WL#5599 The patch changed the handler's functions, i.e. init_info, check_info, flush_info, remove_info and end_info and the related private member functions, in both file and table handlers, to accept an index that identifies the information that will be read or written. This is necessary now because the handlers will be used by the workers to read and write information from file(s) and table and there may be several workers running at the same time and thus an index is used to identify the worker that is accessing information. This change is also necessary for the multi-master replication as information from each master must be uniquely identified. ------------------------------------------------------------ revno: 3207 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Wed 2010-11-10 10:57:13 +0000 message: Refactory to start work on WL#5599. ------------------------------------------------------------ revno: 3206 committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 13:34:18 +0000 message: Removed mysql-test/collections/mysql-next-mr.crash-safe.* in the WL#5569. ------------------------------------------------------------ revno: 3205 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 13:04:14 +0000 message: merge mysql-next-mr.crash-safe --> mysql-next-mr-wl5569 Conflicts: . sql/CMakeLists.txt . sql/Makefile.am . sql/sql_class.h . sql/rpl_slave.cc ------------------------------------------------------------ revno: 3204 [merge] committer: Alfranio Correia <alfranio.correia@oracle.com> branch nick: mysql-next-mr-wl5569 timestamp: Tue 2010-11-09 11:39:37 +0000 message: merge mysql-next-mr-wl5563-labs --> mysql-next-mr-wl5569 ------------------------------------------------------------ revno: 3203 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 23:33:37 +0300 message: wl#5563 simplifying memory handling for the Coor-Workers transport to avoid sporadic crashes ------------------------------------------------------------ revno: 3202 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 21:19:56 +0300 message: wl#5563 leaving out a fine garbage collection. That task is unnessary to solve at prototyping time. Update-pos routine to be implemented is going to eliminated that piece of code ------------------------------------------------------------ revno: 3201 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 20:38:35 +0300 message: wl#5563 Extending the tests base to split the former rpl_parallel into two to run in serial exec mode as well. ------------------------------------------------------------ revno: 3200 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-17 11:49:00 +0300 message: wl#5563 improved test; fixed a delete issue that was used to crash; added @@global.slave_local_timestamp to fill in timestamp col slave clock value. Performance growth can be seen through the test. todo: merge with Alfranio work on hashing and dyn alloc of PFS obj:s. ------------------------------------------------------------ revno: 3199 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Wed 2010-09-15 14:51:49 +0300 message: wl#5563 tests for the wl. Number of workers and iterations can be tuned. todo: convert as param:s to pass to the test through mtr ------------------------------------------------------------ revno: 3198 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Mon 2010-09-13 18:22:41 +0300 message: wl#5563 adding an ingeneous no-stress-attempting-yet test that also fired an assert. Refined the Worker instance ref computing because cleanup_context() is executed by the sql-thread the coordinator as well ------------------------------------------------------------ revno: 3197 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Mon 2010-09-13 13:15:38 +0300 message: wl#5563 Rows-event parallelization basically is implemented although tested shallowly. Write access to rli central stuct by workers may not be eliminated fully at this phase. E.g that relates to errors. todo: to prove rli gets out of Worker scope todo: to provide a stress test ------------------------------------------------------------ revno: 3196 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Sat 2010-09-11 17:00:08 +0300 message: wl#5563 adding Rows-event limitted to one Worker support. Deferred deletion did not check emptyness of the list ------------------------------------------------------------ revno: 3195 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-10 20:36:07 +0300 message: wl#5563 correcting comments to indicate less limitations ------------------------------------------------------------ revno: 3194 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Fri 2010-09-10 20:32:39 +0300 message: WL#5563 Prototype for Slave parallelized by db name More progress to the WL in that the STMT binlog-format works while the conceptual limits are held. That is no query/transaction is allowed to deal with more than one db. Addressed a complication in that update pos method that is run by Coordinator belongs to Log_event hierarchy and therefore the event deletion now by Worker must be careful. Todo: 1. (High prior) fix Row-format complications 2. (Hight prior) Elaborate on the hash function to be a function on db text name 3. (Optional) Consider moving update_pos to the RLI class to get rid of the delete logics complication. How-to-use: The instuction can be found in comments of the previous commit, see there for more details. In brief though, the db names have to follow a pattern: `test[0-9]'. E.g test0, test1, test2, test3 for the default four Worker threads. Slave side has to set @@global.slave_exec_mode=PARALLEL; before START SLAVE. ------------------------------------------------------------ revno: 3193 committer: Andrei Elkin <aelkin@mysql.com> branch nick: wl5563-paraslave_part_db timestamp: Thu 2010-09-09 21:43:16 +0300 message: WL#5563 Prototype for Slave parallelized by db name This is an intermediate commit that indicates some progress. Namely, the worker pool operates correctly and with signs of scalable performance. How to test: connection master; set @@global.binlog_format=statement; connection slave; set @@global.slave_exec_mode=PARALLEL; set @@global.binlog_format=mixed; show processlist; => IO, SQL threads + 4 workers by default change master to ... connection master; create database test0; create database test1; create database test2; create database test3; # create databases with magic names "test[0-9]+", where the number will index # a worker. create database test0; create database test1; create database test2; create database test3; # create tables. they are only of MyISAM type for now use test0; create table tm_1(a int, b int) engine=myisam; use ... # DML on tables: use test[0-3]; insert into tm_1 values (1,0); ... ... connection slave; # monitor CPU (visually this time: top etc) # check correctness e.g select count(*) from test[0-3].tm_1; connection master; select count(*) from test[0-3].tm_1; ****** wl#5569 MTS merging a compined bundle cset to mysql-trunk.
Loading