-
Andrei Elkin authored
BUG#13893310 checkpoint_group size wrong at recovery after cold restart This is combined patch for 3 issues. Bug#13893363. The new UNTIL condition is an important feature to have because of --relay-log-recovery=1 and Change-Master can *not* run in presence of gaps. The user would have to execute START SLAVE SQL_THREAD UNTIL SQL_AFTER_MTS_GAPS if he needed to switch from the parallel to the sequential execution mode after slave SQL thread or Worker threads errored out in the parallel mode. Thus UNTIL SQL_AFTER_MTS_GAPS gives the user a facility to find out the exact after gaps position automatically instead of having to figure it out of relay logs and infos himself. Also, a separate issue of incorrect demotion of DEADLOCK/WAIT_FOR_LOCK errors into warning is fixed because at Worker execution slave does not retry. And a todo to relocate SQL_AFTER_MTS_GAPS and other post-exec/schedule until options checking at the end of read-execute loop (instead of to have them right after read phase which can lead to unnecessary hanging when a condition is actually met). Bug#13893310. The issue with checkpoint_group at MTS recovery is that after the server restart MTS recovery gaps collecting algorithm initialized the recovery bitmap with the default 512 size rather than with a correct one with size of not less than of Worker group_executed of the last slave session. That is corrected. The max possible size is used in the gaps collecting. opt_mts_checkpoint_group 's update step is made as 8 (bits). Some refactoring in rpl_info*, rpl_rli_pdb is done, MTS recovery gaps collecting is deployed on a common to START-SLAVE and --skip-start-slave=0 execution path. Few found small bugs fixed along the way, incl. demotion of Worker DEADLOCK/WAIT_FOR_LOCK errors into warning. An associated to the bug valgrind issue is fixed, see the stack below, vai deploying handler->end_info() in the error branch of RLI::init_info(). http://pb2.no.oracle.com/?template=mysql_show_test_failure&test_failure_id=4266844 rpl.rpl_parallel_change_master w5 [ fail ] Found warnings/errors in server log file! Test ended at 2012-04-02 05:18:16 line ==15911== 8,192 bytes in 1 blocks are definitely lost in loss record 130 of 179 ==15911== at 0x4C216FB: malloc (vg_replace_malloc.c:236) ==15911== by 0xA29FE2: my_malloc (my_malloc.c:38) ==15911== by 0xA09EFD: init_io_cache (mf_iocache.c:232) ==15911== by 0x9FD1E3: Rpl_info_file::do_init_info(unsigned long const*, unsigned int) (rpl_info_file.cc:95) ==15911== by 0x9EC97B: Rpl_info_handler::init_info(unsigned long const*, unsigned int) (rpl_info_handler.h:45) ==15911== by 0x9F153E: Relay_log_info::init_info() (rpl_rli.cc:1676) ==15911== by 0x9E688B: init_info(Master_info*, bool, int) (rpl_slave.cc:468) ==15911== by 0x9E9AB1: init_slave() (rpl_slave.cc:316) ==15911== by 0x5CFE18: mysqld_main(int, char**) (mysqld.cc:5083)
Andrei Elkin authoredBUG#13893310 checkpoint_group size wrong at recovery after cold restart This is combined patch for 3 issues. Bug#13893363. The new UNTIL condition is an important feature to have because of --relay-log-recovery=1 and Change-Master can *not* run in presence of gaps. The user would have to execute START SLAVE SQL_THREAD UNTIL SQL_AFTER_MTS_GAPS if he needed to switch from the parallel to the sequential execution mode after slave SQL thread or Worker threads errored out in the parallel mode. Thus UNTIL SQL_AFTER_MTS_GAPS gives the user a facility to find out the exact after gaps position automatically instead of having to figure it out of relay logs and infos himself. Also, a separate issue of incorrect demotion of DEADLOCK/WAIT_FOR_LOCK errors into warning is fixed because at Worker execution slave does not retry. And a todo to relocate SQL_AFTER_MTS_GAPS and other post-exec/schedule until options checking at the end of read-execute loop (instead of to have them right after read phase which can lead to unnecessary hanging when a condition is actually met). Bug#13893310. The issue with checkpoint_group at MTS recovery is that after the server restart MTS recovery gaps collecting algorithm initialized the recovery bitmap with the default 512 size rather than with a correct one with size of not less than of Worker group_executed of the last slave session. That is corrected. The max possible size is used in the gaps collecting. opt_mts_checkpoint_group 's update step is made as 8 (bits). Some refactoring in rpl_info*, rpl_rli_pdb is done, MTS recovery gaps collecting is deployed on a common to START-SLAVE and --skip-start-slave=0 execution path. Few found small bugs fixed along the way, incl. demotion of Worker DEADLOCK/WAIT_FOR_LOCK errors into warning. An associated to the bug valgrind issue is fixed, see the stack below, vai deploying handler->end_info() in the error branch of RLI::init_info(). http://pb2.no.oracle.com/?template=mysql_show_test_failure&test_failure_id=4266844 rpl.rpl_parallel_change_master w5 [ fail ] Found warnings/errors in server log file! Test ended at 2012-04-02 05:18:16 line ==15911== 8,192 bytes in 1 blocks are definitely lost in loss record 130 of 179 ==15911== at 0x4C216FB: malloc (vg_replace_malloc.c:236) ==15911== by 0xA29FE2: my_malloc (my_malloc.c:38) ==15911== by 0xA09EFD: init_io_cache (mf_iocache.c:232) ==15911== by 0x9FD1E3: Rpl_info_file::do_init_info(unsigned long const*, unsigned int) (rpl_info_file.cc:95) ==15911== by 0x9EC97B: Rpl_info_handler::init_info(unsigned long const*, unsigned int) (rpl_info_handler.h:45) ==15911== by 0x9F153E: Relay_log_info::init_info() (rpl_rli.cc:1676) ==15911== by 0x9E688B: init_info(Master_info*, bool, int) (rpl_slave.cc:468) ==15911== by 0x9E9AB1: init_slave() (rpl_slave.cc:316) ==15911== by 0x5CFE18: mysqld_main(int, char**) (mysqld.cc:5083)
Loading