-
Marc Alff authored
Before this fix, executing TRUNCATE TABLE performance_schema.accounts; could crash the server under load. The root cause is a race condition between: - the TRUNCATE statement - a thread disconnecting. The reason for the race is as follows: TRUNCATE on a performance schema table resets the statistics, and aggregates statistics for lower aggregation levels. For example, a truncate on accounts aggregates all threads stats. At the same time, a thread disconnecting also aggregate the thread statistics to the next parent, before destroying the thread instrumentation. Concurrent execution of both code path can causes crashes, because pointers to the parent of an object can be reset to null, while used concurrently. For example in the crash found, PFS_thread::m_account, the pointer to the parent account, is set to null when disconnecting a thread, which causes the code in truncate to crash while attempting to aggregate the thread stats to the parent account. For example, in function aggregate_thread_waits(), using the thread->m_account pointer is: - safe when the code has ownership of the object, as when a thread disconnects - unsafe when aggregating statistics of another thread, as when executing a TRUNCATE. The solution is to never access thread->m_account directly, but require the caller to sanitize the pointer when necessary. With this fix, aggregate_thread_waits() now takes the parent account as input parameter, explicitely. When executing a thread disconnect, the code owns the object, and the call is simply: aggregate_thread_waits(pfs, pfs->m_account, ...); When executing a TRUNCATE, the code does not own the object, and the call is: PFS_account *safe_account= sanitize_account(pfs->m_account); aggregate_thread_waits(pfs, safe_account, ...); Because the pointer is sanitized, is it guaranteed to: - either be NULL, or be valid. - not change spuriously during the code execution The fix changes every aggregation where a parent can change spuriously, the code change follows the same mechanics.
Marc Alff authoredBefore this fix, executing TRUNCATE TABLE performance_schema.accounts; could crash the server under load. The root cause is a race condition between: - the TRUNCATE statement - a thread disconnecting. The reason for the race is as follows: TRUNCATE on a performance schema table resets the statistics, and aggregates statistics for lower aggregation levels. For example, a truncate on accounts aggregates all threads stats. At the same time, a thread disconnecting also aggregate the thread statistics to the next parent, before destroying the thread instrumentation. Concurrent execution of both code path can causes crashes, because pointers to the parent of an object can be reset to null, while used concurrently. For example in the crash found, PFS_thread::m_account, the pointer to the parent account, is set to null when disconnecting a thread, which causes the code in truncate to crash while attempting to aggregate the thread stats to the parent account. For example, in function aggregate_thread_waits(), using the thread->m_account pointer is: - safe when the code has ownership of the object, as when a thread disconnects - unsafe when aggregating statistics of another thread, as when executing a TRUNCATE. The solution is to never access thread->m_account directly, but require the caller to sanitize the pointer when necessary. With this fix, aggregate_thread_waits() now takes the parent account as input parameter, explicitely. When executing a thread disconnect, the code owns the object, and the call is simply: aggregate_thread_waits(pfs, pfs->m_account, ...); When executing a TRUNCATE, the code does not own the object, and the call is: PFS_account *safe_account= sanitize_account(pfs->m_account); aggregate_thread_waits(pfs, safe_account, ...); Because the pointer is sanitized, is it guaranteed to: - either be NULL, or be valid. - not change spuriously during the code execution The fix changes every aggregation where a parent can change spuriously, the code change follows the same mechanics.
Loading