mysql-test/suite/rpl/t/rpl_row_crash_safe.test · ba12387e40a6cf2adb3746e9dc5271f1772ce7ab · Rasoul Jahanshahi / Mysql Server

Apr 11, 2012

Bug#13893363 - MTS IS MISSING THE ABILITY TO STOP A SLAVE AFTER PROCESSING GAPS · d6397485

Andrei Elkin authored Apr 11, 2012

BUG#13893310 checkpoint_group size wrong at recovery after cold restart

This is combined patch for 3 issues.

Bug#13893363.

The new UNTIL condition is an important feature to have because
of --relay-log-recovery=1 and Change-Master can *not* run in
presence of gaps.
The user would have to execute
  START SLAVE SQL_THREAD UNTIL SQL_AFTER_MTS_GAPS
if he needed to switch from the parallel to the sequential execution mode
after slave SQL thread or Worker threads errored out in the parallel mode.
Thus UNTIL SQL_AFTER_MTS_GAPS gives the user a facility to find out
the exact after gaps position automatically instead of having to figure
it out of relay logs and infos himself.

Also, a separate issue of incorrect demotion of
DEADLOCK/WAIT_FOR_LOCK errors into warning is fixed because at
Worker execution slave does not retry.  And a todo to relocate
SQL_AFTER_MTS_GAPS and other post-exec/schedule until options
checking at the end of read-execute loop (instead of to have them
right after read phase which can lead to unnecessary hanging when
a condition is actually met).


Bug#13893310.

The issue with checkpoint_group at MTS recovery is that after the
server restart MTS recovery gaps collecting algorithm initialized
the recovery bitmap with the default 512 size rather than with a
correct one with size of not less than of Worker group_executed
of the last slave session.

That is corrected. The max possible size is used in the gaps
collecting.  opt_mts_checkpoint_group 's update step is made as
8 (bits).  Some refactoring in rpl_info*, rpl_rli_pdb is done,
MTS recovery gaps collecting is deployed on a common to 
START-SLAVE and --skip-start-slave=0 execution path.
Few found small bugs fixed along the way, incl. 
demotion of Worker DEADLOCK/WAIT_FOR_LOCK errors into warning.


An associated to the bug valgrind issue is fixed, see the stack
below, vai deploying handler->end_info() in the error branch of
RLI::init_info().


http://pb2.no.oracle.com/?template=mysql_show_test_failure&test_failure_id=4266844

rpl.rpl_parallel_change_master           w5 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2012-04-02 05:18:16
line
==15911== 8,192 bytes in 1 blocks are definitely lost in loss record 130 of 179
==15911==    at 0x4C216FB: malloc (vg_replace_malloc.c:236)
==15911==    by 0xA29FE2: my_malloc (my_malloc.c:38)
==15911==    by 0xA09EFD: init_io_cache (mf_iocache.c:232)
==15911==    by 0x9FD1E3: Rpl_info_file::do_init_info(unsigned long const*, unsigned int) (rpl_info_file.cc:95)
==15911==    by 0x9EC97B: Rpl_info_handler::init_info(unsigned long const*, unsigned int) (rpl_info_handler.h:45)
==15911==    by 0x9F153E: Relay_log_info::init_info() (rpl_rli.cc:1676)
==15911==    by 0x9E688B: init_info(Master_info*, bool, int) (rpl_slave.cc:468)
==15911==    by 0x9E9AB1: init_slave() (rpl_slave.cc:316)
==15911==    by 0x5CFE18: mysqld_main(int, char**) (mysqld.cc:5083)

d6397485

Bug#13893363 - MTS IS MISSING THE ABILITY TO STOP A SLAVE AFTER PROCESSING GAPS

Andrei Elkin authored Apr 11, 2012

BUG#13893310 checkpoint_group size wrong at recovery after cold restart

This is combined patch for 3 issues.

Bug#13893363.

The new UNTIL condition is an important feature to have because
of --relay-log-recovery=1 and Change-Master can *not* run in
presence of gaps.
The user would have to execute
  START SLAVE SQL_THREAD UNTIL SQL_AFTER_MTS_GAPS
if he needed to switch from the parallel to the sequential execution mode
after slave SQL thread or Worker threads errored out in the parallel mode.
Thus UNTIL SQL_AFTER_MTS_GAPS gives the user a facility to find out
the exact after gaps position automatically instead of having to figure
it out of relay logs and infos himself.

Also, a separate issue of incorrect demotion of
DEADLOCK/WAIT_FOR_LOCK errors into warning is fixed because at
Worker execution slave does not retry.  And a todo to relocate
SQL_AFTER_MTS_GAPS and other post-exec/schedule until options
checking at the end of read-execute loop (instead of to have them
right after read phase which can lead to unnecessary hanging when
a condition is actually met).


Bug#13893310.

The issue with checkpoint_group at MTS recovery is that after the
server restart MTS recovery gaps collecting algorithm initialized
the recovery bitmap with the default 512 size rather than with a
correct one with size of not less than of Worker group_executed
of the last slave session.

That is corrected. The max possible size is used in the gaps
collecting.  opt_mts_checkpoint_group 's update step is made as
8 (bits).  Some refactoring in rpl_info*, rpl_rli_pdb is done,
MTS recovery gaps collecting is deployed on a common to 
START-SLAVE and --skip-start-slave=0 execution path.
Few found small bugs fixed along the way, incl. 
demotion of Worker DEADLOCK/WAIT_FOR_LOCK errors into warning.


An associated to the bug valgrind issue is fixed, see the stack
below, vai deploying handler->end_info() in the error branch of
RLI::init_info().


http://pb2.no.oracle.com/?template=mysql_show_test_failure&test_failure_id=4266844

rpl.rpl_parallel_change_master           w5 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2012-04-02 05:18:16
line
==15911== 8,192 bytes in 1 blocks are definitely lost in loss record 130 of 179
==15911==    at 0x4C216FB: malloc (vg_replace_malloc.c:236)
==15911==    by 0xA29FE2: my_malloc (my_malloc.c:38)
==15911==    by 0xA09EFD: init_io_cache (mf_iocache.c:232)
==15911==    by 0x9FD1E3: Rpl_info_file::do_init_info(unsigned long const*, unsigned int) (rpl_info_file.cc:95)
==15911==    by 0x9EC97B: Rpl_info_handler::init_info(unsigned long const*, unsigned int) (rpl_info_handler.h:45)
==15911==    by 0x9F153E: Relay_log_info::init_info() (rpl_rli.cc:1676)
==15911==    by 0x9E688B: init_info(Master_info*, bool, int) (rpl_slave.cc:468)
==15911==    by 0x9E9AB1: init_slave() (rpl_slave.cc:316)
==15911==    by 0x5CFE18: mysqld_main(int, char**) (mysqld.cc:5083)