Skip to content
  • Marc Alff's avatar
    34853249
    Bug#17084615 CRASH, MANY TRUNCATE PFS TABLE VERSUS MANY CONNECT,DISCONNECT · 34853249
    Marc Alff authored
    Before this fix, executing
      TRUNCATE TABLE performance_schema.accounts;
    could crash the server under load.
    
    The root cause is a race condition between:
    - the TRUNCATE statement
    - a thread disconnecting.
    
    The reason for the race is as follows:
    
    TRUNCATE on a performance schema table resets the statistics,
    and aggregates statistics for lower aggregation levels.
    For example, a truncate on accounts aggregates all threads stats.
    
    At the same time, a thread disconnecting also aggregate the thread statistics
    to the next parent, before destroying the thread instrumentation.
    
    Concurrent execution of both code path can causes crashes,
    because pointers to the parent of an object can be reset to null,
    while used concurrently.
    
    For example in the crash found, PFS_thread::m_account, the pointer to the parent account,
    is set to null when disconnecting a thread, which causes the code in truncate
    to crash while attempting to aggregate the thread stats to the parent account.
    
    For example, in function aggregate_thread_waits(),
    using the thread->m_account pointer is:
    - safe when the code has ownership of the object,
      as when a thread disconnects
    - unsafe when aggregating statistics of another thread,
      as when executing a TRUNCATE.
    
    The solution is to never access thread->m_account directly,
    but require the caller to sanitize the pointer when necessary.
    
    With this fix, aggregate_thread_waits() now takes the parent account
    as input parameter, explicitely.
    
    When executing a thread disconnect, the code owns the object,
    and the call is simply:
      aggregate_thread_waits(pfs, pfs->m_account, ...);
    
    When executing a TRUNCATE, the code does not own the object,
    and the call is:
      PFS_account *safe_account= sanitize_account(pfs->m_account);
      aggregate_thread_waits(pfs, safe_account, ...);
    
    Because the pointer is sanitized, is it guaranteed to:
    - either be NULL, or be valid.
    - not change spuriously during the code execution
    
    The fix changes every aggregation where a parent can change spuriously,
    the code change follows the same mechanics.
    34853249
    Bug#17084615 CRASH, MANY TRUNCATE PFS TABLE VERSUS MANY CONNECT,DISCONNECT
    Marc Alff authored
    Before this fix, executing
      TRUNCATE TABLE performance_schema.accounts;
    could crash the server under load.
    
    The root cause is a race condition between:
    - the TRUNCATE statement
    - a thread disconnecting.
    
    The reason for the race is as follows:
    
    TRUNCATE on a performance schema table resets the statistics,
    and aggregates statistics for lower aggregation levels.
    For example, a truncate on accounts aggregates all threads stats.
    
    At the same time, a thread disconnecting also aggregate the thread statistics
    to the next parent, before destroying the thread instrumentation.
    
    Concurrent execution of both code path can causes crashes,
    because pointers to the parent of an object can be reset to null,
    while used concurrently.
    
    For example in the crash found, PFS_thread::m_account, the pointer to the parent account,
    is set to null when disconnecting a thread, which causes the code in truncate
    to crash while attempting to aggregate the thread stats to the parent account.
    
    For example, in function aggregate_thread_waits(),
    using the thread->m_account pointer is:
    - safe when the code has ownership of the object,
      as when a thread disconnects
    - unsafe when aggregating statistics of another thread,
      as when executing a TRUNCATE.
    
    The solution is to never access thread->m_account directly,
    but require the caller to sanitize the pointer when necessary.
    
    With this fix, aggregate_thread_waits() now takes the parent account
    as input parameter, explicitely.
    
    When executing a thread disconnect, the code owns the object,
    and the call is simply:
      aggregate_thread_waits(pfs, pfs->m_account, ...);
    
    When executing a TRUNCATE, the code does not own the object,
    and the call is:
      PFS_account *safe_account= sanitize_account(pfs->m_account);
      aggregate_thread_waits(pfs, safe_account, ...);
    
    Because the pointer is sanitized, is it guaranteed to:
    - either be NULL, or be valid.
    - not change spuriously during the code execution
    
    The fix changes every aggregation where a parent can change spuriously,
    the code change follows the same mechanics.
Loading