Skip to content
  • Frazer Clement's avatar
    7ae7283b
    Bug#54854 / Bug#11762277 CAN'T FIND GOOD POSITION FOR REPLICATION BREAK BETWEEN DDL STATEMENTS · 7ae7283b
    Frazer Clement authored
    Problem
    -------
    
    Replication channel cutover uses the last applied epoch on the slave to determine where
    to begin replication from on the new Master.
    
    The last applied epoch is obtained from the Slave's mysql.ndb_apply_status table.
    The new Master's mysql.ndb_binlog_index table is queried to find the first epoch *after* the
    Slave's last applied epoch, the binlog file and offset of this epoch are used to start
    replication from the new master.
    
    Issues : 
     1) There may be *no* epoch after the last applied epoch
        If log-empty-epochs=0, and even in normal cases, where the slave is up-to-date and
        no new epoch has been finalised on the Master.
     2) As epochs are not continuously numbered, there may be a gap between the last applied
        epoch and the next.  It is not possible to determine what the next epoch number will
        be.  If the new Master is missing some epochs, the current cutover mechanism will 
        silently skip over them and jump to the first available epoch 
     3) Where there is DDL between the last applied epoch and the next epoch, the cutover mechanism
        will skip the DDL.  If the DDL has been applied then this is ok, if it has not, then it
        is silently skipped.
    
    Solution
    --------
    
    This series implements a more precise mechanism for performing replication channel cutover.
    This allows us to ensure that a replication channel cutover begins replication precisely
    after the end of the last committed epoch on the Slave.
    
    This involves :
      - Modifications to the MySQL Server Binlog code to record the next position in the Binlog
        after the COMMIT event at the end of an epoch transaction
      - Modifications to the mysql.ndb_binlog_index table schema to include next_file and next_position
        columns
      - Modifications to the Ndb Binlog injector to set the next_file and next_position columns in
        the mysql.ndb_binlog_index table.
    
    The existing replication channel cutover mechanism continues to work, with the same limitations
    as before.
    A new channel cutover mechanism is defined, making use of the new columns.
    
    Old channel cutover mechanism, given a last applied epoch from the slave.
    
      SELECT File, position from mysql.ndb_binlog_index where epoch > <last_applied_epoch>;
    
    New channel cutover mechanism :
      SELECT next_file, next_position from mysql.ndb_binlog_index where epoch = <last_applied_epoch>;
    
    Note that i) This statement uses the last applied epoch directly - there is no dependency on there
    being a following epoch, ii) There is no risk of silently 'jumping' over an epoch gap during 
    replication channel cutover, iii) Any DDL after the last applied epoch will be (re)applied.
    
    Reapplying inter-epoch DDL can result in errors on the Slave.  This is considered better than the
    old channel cutover mechanism which can result in silently skipping DDL.  A separate patch series
    implements 'DDL ignore existance errors' handling.
    
    This series includes a testcase which verifies the correctness of the next_position under 
    multithreaded Binlog inserts etc.
    7ae7283b
    Bug#54854 / Bug#11762277 CAN'T FIND GOOD POSITION FOR REPLICATION BREAK BETWEEN DDL STATEMENTS
    Frazer Clement authored
    Problem
    -------
    
    Replication channel cutover uses the last applied epoch on the slave to determine where
    to begin replication from on the new Master.
    
    The last applied epoch is obtained from the Slave's mysql.ndb_apply_status table.
    The new Master's mysql.ndb_binlog_index table is queried to find the first epoch *after* the
    Slave's last applied epoch, the binlog file and offset of this epoch are used to start
    replication from the new master.
    
    Issues : 
     1) There may be *no* epoch after the last applied epoch
        If log-empty-epochs=0, and even in normal cases, where the slave is up-to-date and
        no new epoch has been finalised on the Master.
     2) As epochs are not continuously numbered, there may be a gap between the last applied
        epoch and the next.  It is not possible to determine what the next epoch number will
        be.  If the new Master is missing some epochs, the current cutover mechanism will 
        silently skip over them and jump to the first available epoch 
     3) Where there is DDL between the last applied epoch and the next epoch, the cutover mechanism
        will skip the DDL.  If the DDL has been applied then this is ok, if it has not, then it
        is silently skipped.
    
    Solution
    --------
    
    This series implements a more precise mechanism for performing replication channel cutover.
    This allows us to ensure that a replication channel cutover begins replication precisely
    after the end of the last committed epoch on the Slave.
    
    This involves :
      - Modifications to the MySQL Server Binlog code to record the next position in the Binlog
        after the COMMIT event at the end of an epoch transaction
      - Modifications to the mysql.ndb_binlog_index table schema to include next_file and next_position
        columns
      - Modifications to the Ndb Binlog injector to set the next_file and next_position columns in
        the mysql.ndb_binlog_index table.
    
    The existing replication channel cutover mechanism continues to work, with the same limitations
    as before.
    A new channel cutover mechanism is defined, making use of the new columns.
    
    Old channel cutover mechanism, given a last applied epoch from the slave.
    
      SELECT File, position from mysql.ndb_binlog_index where epoch > <last_applied_epoch>;
    
    New channel cutover mechanism :
      SELECT next_file, next_position from mysql.ndb_binlog_index where epoch = <last_applied_epoch>;
    
    Note that i) This statement uses the last applied epoch directly - there is no dependency on there
    being a following epoch, ii) There is no risk of silently 'jumping' over an epoch gap during 
    replication channel cutover, iii) Any DDL after the last applied epoch will be (re)applied.
    
    Reapplying inter-epoch DDL can result in errors on the Slave.  This is considered better than the
    old channel cutover mechanism which can result in silently skipping DDL.  A separate patch series
    implements 'DDL ignore existance errors' handling.
    
    This series includes a testcase which verifies the correctness of the next_position under 
    multithreaded Binlog inserts etc.
Loading