-
Sven Sandberg authored
This test failed sporadically in the cleanup code. The cleanup code contained this: for each server: DROP DATABASE --source include/stop_slave.inc CHANGE MASTER TO MASTER_AUTO_POSITION = 0; --source include/start_slave.inc The problem is that CHANGE MASTER will drop the relay logs. If the IO thread was ahead of the SQL thread at this point, received and not-yet-executed transactions in the relay log would be lost. Since it does not use the auto-position protocol when it does the start_slave.inc, replication would resume after the lost transaction rather than retransmit it. This caused two types of failures: 1. The IO thread could be stopped in the middle of a transaction, when the SQL thread had processed only up to the last complete transaction. Here all transactions consist of two events: Gtid followed by Query. Thus, the last received transaction was a Gtid and after start_slave.inc the slave would receive a Query. The slave SQL thread would then see a Query without Gtid. This looks like an anonymous transaction, which is not allowed when GTID_MODE = ON. So the SQL thread would stop with error 1782: "Error '@@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.' on query. Default database: 'db_3'. Query: 'DROP DATABASE db_3'". 2. The IO thread could be stopped between transactions, when the SQL thread had processed only up to the second-last transaction. Then the entire transaction would get lost. Later, in rpl_sync.inc, the GTID would not be received at all by the slave, so the sync would fail with a timeout. Fixed by dropping the databases and syncing all servers before stopping all the slave threads and executing CHANGE MASTER.
Sven Sandberg authoredThis test failed sporadically in the cleanup code. The cleanup code contained this: for each server: DROP DATABASE --source include/stop_slave.inc CHANGE MASTER TO MASTER_AUTO_POSITION = 0; --source include/start_slave.inc The problem is that CHANGE MASTER will drop the relay logs. If the IO thread was ahead of the SQL thread at this point, received and not-yet-executed transactions in the relay log would be lost. Since it does not use the auto-position protocol when it does the start_slave.inc, replication would resume after the lost transaction rather than retransmit it. This caused two types of failures: 1. The IO thread could be stopped in the middle of a transaction, when the SQL thread had processed only up to the last complete transaction. Here all transactions consist of two events: Gtid followed by Query. Thus, the last received transaction was a Gtid and after start_slave.inc the slave would receive a Query. The slave SQL thread would then see a Query without Gtid. This looks like an anonymous transaction, which is not allowed when GTID_MODE = ON. So the SQL thread would stop with error 1782: "Error '@@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.' on query. Default database: 'db_3'. Query: 'DROP DATABASE db_3'". 2. The IO thread could be stopped between transactions, when the SQL thread had processed only up to the second-last transaction. Then the entire transaction would get lost. Later, in rpl_sync.inc, the GTID would not be received at all by the slave, so the sync would fail with a timeout. Fixed by dropping the databases and syncing all servers before stopping all the slave threads and executing CHANGE MASTER.
Loading