-
Venkatesh Venugopal authored
Bug#30791583 - CRASH IN CERTIFIER::~CERTIFIER() ON STOP GROUP_REPLICATION COMMAND Problem & Analysis ================== Querying the performance_schema.replication_group_member_stats can sometimes crash the server because of the following reasons. 1. The Group Replication perfschema code has below pattern. callbacks.set_last_conflict_free_transaction( callbacks.context, *pipeline_stats->get_transaction_last_conflict_free().c_str(), pipeline_stats->get_transaction_last_conflict_free().length()); Here, the Pipeline_member_stats' m_transaction_last_conflict_free and m_transactions_committed_all_members are not protected by any lock. So, these members are not thread-safe and thus may result in undefined behavior when the value gets updated between the c_str() and length() functions. 2. GR perfschema code is not fully thread-safe. As a result, 2.1. There is chance that applier_module may get deleted by STOP GROUP_REPLICATION query while some thread is executing PS query. This causes the P_S query to hit segmentation fault when it accesses applier_module. 2.2. There is a chance that the group can undergo change while the P_S query is in progress. When this happnes, in debug build, the thread hits an assertion failure in table_replication_group_member_stats.cc DBUG_ASSERT(m_pos.m_index < get_row_count()); while fetching the row by position. Fix === 1. Instead of returning the internal memory buffer, we now pass a local memory buffer to fill the value and pass its value and length to the function. 2.1. To fix the issues with STOP GROUP_REPLICATION query, - A new read-write lock has been added to protect access to the std::map<std::string, Pipeline_member_stats> Flow_control_module_info against any updates received from the process_notification_thread. This is required to make sure that the iterator used for fetching the Pipeline_member_stats is valid till the value is copied. - The P_S query now takes the applier thread's run_lock for a small duration while fetching the local member stats in order to be not deleted by the STOP GROUP_REPLICATION query. 2.2. Since we cannot block concurrent actions that come from group communication, the assert has been converted in error HA_ERR_RECORD_DELETED. RB: 23924 Reviewed by: Nuno Carvalho <nuno.carvalho@oracle.com> Reviewed by: Jaideep Karande <jaideep.karande@oracle.com>
Venkatesh Venugopal authoredBug#30791583 - CRASH IN CERTIFIER::~CERTIFIER() ON STOP GROUP_REPLICATION COMMAND Problem & Analysis ================== Querying the performance_schema.replication_group_member_stats can sometimes crash the server because of the following reasons. 1. The Group Replication perfschema code has below pattern. callbacks.set_last_conflict_free_transaction( callbacks.context, *pipeline_stats->get_transaction_last_conflict_free().c_str(), pipeline_stats->get_transaction_last_conflict_free().length()); Here, the Pipeline_member_stats' m_transaction_last_conflict_free and m_transactions_committed_all_members are not protected by any lock. So, these members are not thread-safe and thus may result in undefined behavior when the value gets updated between the c_str() and length() functions. 2. GR perfschema code is not fully thread-safe. As a result, 2.1. There is chance that applier_module may get deleted by STOP GROUP_REPLICATION query while some thread is executing PS query. This causes the P_S query to hit segmentation fault when it accesses applier_module. 2.2. There is a chance that the group can undergo change while the P_S query is in progress. When this happnes, in debug build, the thread hits an assertion failure in table_replication_group_member_stats.cc DBUG_ASSERT(m_pos.m_index < get_row_count()); while fetching the row by position. Fix === 1. Instead of returning the internal memory buffer, we now pass a local memory buffer to fill the value and pass its value and length to the function. 2.1. To fix the issues with STOP GROUP_REPLICATION query, - A new read-write lock has been added to protect access to the std::map<std::string, Pipeline_member_stats> Flow_control_module_info against any updates received from the process_notification_thread. This is required to make sure that the iterator used for fetching the Pipeline_member_stats is valid till the value is copied. - The P_S query now takes the applier thread's run_lock for a small duration while fetching the local member stats in order to be not deleted by the STOP GROUP_REPLICATION query. 2.2. Since we cannot block concurrent actions that come from group communication, the assert has been converted in error HA_ERR_RECORD_DELETED. RB: 23924 Reviewed by: Nuno Carvalho <nuno.carvalho@oracle.com> Reviewed by: Jaideep Karande <jaideep.karande@oracle.com>
Loading