-
Joao Gramacho authored
OR MISS COMPLETE TRX Problem: ======= The SHOW SLAVE STATUS command contains the column RETRIEVED_GTID_SET. This is supposed to contain the set of GTIDs that exist in the relay log. However, the field is updated when the slave receiver thread (I/O thread) receives a Gtid_log_event, which happens at the beginning of the transaction. If the I/O thread gets disconnected in the middle of a transaction, RETRIEVED_GTID_SET can contain a GTID for a transaction that is only partially received in the relay log. This transaction will subsequently be rolled back, so it is wrong to pretend that the transaction is there. Typical fail-over algorithms use RETRIEVED_GTID_SET to determine which slave has received the most transactions to promote the slave to a master. This is true for e.g. the mysqlfailover utility. When RETRIEVED_GTID_SET can contain partially transmitted transactions, the fail-over utility can choose the wrong slave to promote. This can lead to data corruption later. This means that even if semi-sync is enabled, transactions that have been acknowledged by one slave can be lost. Fix: === It was implemented a transaction boundaries parser that will give information about transaction boundaries of an event stream based on the event types and their queries (when they are Query_log_event). As events are queued by the I/O thread, it feeds the Master_info transaction boundary parser. The slave I/O recovery also uses the transaction parser to determine if a given GTID can be added to the Retrieved_Gtid_Set or not. When the event parser is in GTID state because a Gtid_log_event was queued, the event's GTID isn't added to the retrieved list yet. It is stored in an auxiliary GTID variable. After flushing an event into the relay log, the IO thread verifies if the transaction parser is not inside a transaction anymore (meaning that the last event of the transaction has been flushed). If transaction parser is outside a transaction, the I/O thread verifies if a GTID was stored in the start of the transaction, adding it to the retrieved list, ensuring that all the transaction has arrived and was flushed to the relay log. Also, before this patch, after the I/O thread flushed a single received event into the relaylog, it was possible to rotate the relaylog if the current relaylog file size exceeded max_binlog_size/max_relaylog_size. After this patch, when GTIDs are enabled we only allow this rotation by size if the transaction parser is not in the middle of a transaction. Note: The current patch removed the changes for BUG#17280176, as it also dealt with similar problem in a different way.
Joao Gramacho authoredOR MISS COMPLETE TRX Problem: ======= The SHOW SLAVE STATUS command contains the column RETRIEVED_GTID_SET. This is supposed to contain the set of GTIDs that exist in the relay log. However, the field is updated when the slave receiver thread (I/O thread) receives a Gtid_log_event, which happens at the beginning of the transaction. If the I/O thread gets disconnected in the middle of a transaction, RETRIEVED_GTID_SET can contain a GTID for a transaction that is only partially received in the relay log. This transaction will subsequently be rolled back, so it is wrong to pretend that the transaction is there. Typical fail-over algorithms use RETRIEVED_GTID_SET to determine which slave has received the most transactions to promote the slave to a master. This is true for e.g. the mysqlfailover utility. When RETRIEVED_GTID_SET can contain partially transmitted transactions, the fail-over utility can choose the wrong slave to promote. This can lead to data corruption later. This means that even if semi-sync is enabled, transactions that have been acknowledged by one slave can be lost. Fix: === It was implemented a transaction boundaries parser that will give information about transaction boundaries of an event stream based on the event types and their queries (when they are Query_log_event). As events are queued by the I/O thread, it feeds the Master_info transaction boundary parser. The slave I/O recovery also uses the transaction parser to determine if a given GTID can be added to the Retrieved_Gtid_Set or not. When the event parser is in GTID state because a Gtid_log_event was queued, the event's GTID isn't added to the retrieved list yet. It is stored in an auxiliary GTID variable. After flushing an event into the relay log, the IO thread verifies if the transaction parser is not inside a transaction anymore (meaning that the last event of the transaction has been flushed). If transaction parser is outside a transaction, the I/O thread verifies if a GTID was stored in the start of the transaction, adding it to the retrieved list, ensuring that all the transaction has arrived and was flushed to the relay log. Also, before this patch, after the I/O thread flushed a single received event into the relaylog, it was possible to rotate the relaylog if the current relaylog file size exceeded max_binlog_size/max_relaylog_size. After this patch, when GTIDs are enabled we only allow this rotation by size if the transaction parser is not in the middle of a transaction. Note: The current patch removed the changes for BUG#17280176, as it also dealt with similar problem in a different way.
Loading