mysql-test/include/begin_include_file.inc · mysql-5.6.35 · Rasoul Jahanshahi / Mysql Server

Aug 18, 2014

BUG#18385953 - RPL_GTID_STRESS_FAILOVER FAILS WITH "ERROR IN SYNC_WITH_MASTER.INC" · ad5bbb38

Sven Sandberg authored Aug 18, 2014

This test failed sporadically in the cleanup code.
The cleanup code contained this:

  for each server:
    DROP DATABASE
    --source include/stop_slave.inc
    CHANGE MASTER TO MASTER_AUTO_POSITION = 0;
    --source include/start_slave.inc

The problem is that CHANGE MASTER will drop the relay logs.  If the IO
thread was ahead of the SQL thread at this point, received and
not-yet-executed transactions in the relay log would be lost.  Since it
does not use the auto-position protocol when it does the
start_slave.inc, replication would resume after the lost transaction
rather than retransmit it.

This caused two types of failures:
 1. The IO thread could be stopped in the middle of a transaction, when
    the SQL thread had processed only up to the last complete transaction.
    Here all transactions consist of two events: Gtid followed by Query.
    Thus, the last received transaction was a Gtid and after
    start_slave.inc the slave would receive a Query. The slave SQL thread
    would then see a Query without Gtid. This looks like an anonymous
    transaction, which is not allowed when GTID_MODE = ON. So the SQL thread
    would stop with error 1782: "Error '@@SESSION.GTID_NEXT cannot be set to
    ANONYMOUS when @@GLOBAL.GTID_MODE = ON.' on query. Default database: 
    'db_3'. Query: 'DROP DATABASE db_3'".
 2. The IO thread could be stopped between transactions, when the SQL
    thread had processed only up to the second-last transaction.
    Then the entire transaction would get lost. Later, in rpl_sync.inc,
    the GTID would not be received at all by the slave, so the sync would
    fail with a timeout.

Fixed by dropping the databases and syncing all servers before stopping
all the slave threads and executing CHANGE MASTER.

ad5bbb38

BUG#18385953 - RPL_GTID_STRESS_FAILOVER FAILS WITH "ERROR IN SYNC_WITH_MASTER.INC"

Sven Sandberg authored Aug 18, 2014

This test failed sporadically in the cleanup code.
The cleanup code contained this:

  for each server:
    DROP DATABASE
    --source include/stop_slave.inc
    CHANGE MASTER TO MASTER_AUTO_POSITION = 0;
    --source include/start_slave.inc

The problem is that CHANGE MASTER will drop the relay logs.  If the IO
thread was ahead of the SQL thread at this point, received and
not-yet-executed transactions in the relay log would be lost.  Since it
does not use the auto-position protocol when it does the
start_slave.inc, replication would resume after the lost transaction
rather than retransmit it.

This caused two types of failures:
 1. The IO thread could be stopped in the middle of a transaction, when
    the SQL thread had processed only up to the last complete transaction.
    Here all transactions consist of two events: Gtid followed by Query.
    Thus, the last received transaction was a Gtid and after
    start_slave.inc the slave would receive a Query. The slave SQL thread
    would then see a Query without Gtid. This looks like an anonymous
    transaction, which is not allowed when GTID_MODE = ON. So the SQL thread
    would stop with error 1782: "Error '@@SESSION.GTID_NEXT cannot be set to
    ANONYMOUS when @@GLOBAL.GTID_MODE = ON.' on query. Default database: 
    'db_3'. Query: 'DROP DATABASE db_3'".
 2. The IO thread could be stopped between transactions, when the SQL
    thread had processed only up to the second-last transaction.
    Then the entire transaction would get lost. Later, in rpl_sync.inc,
    the GTID would not be received at all by the slave, so the sync would
    fail with a timeout.

Fixed by dropping the databases and syncing all servers before stopping
all the slave threads and executing CHANGE MASTER.