-
Marc Alff authored
Bug#17766582 PERFORMANCE SCHEMA OVERHEAD IN PFS_LOCK This fix is a general cleanup for code involving atomic operations in the performance schema, to reduce overhead and improve code clarity. Changes implemented: 1) Removed 'volatile' in the code. The C 'volatile' keyword has nothing to do with atomic operations, and is confusing. This is a code cleanup. 2) Added missing PFS_cacheline_uint32 to atomic counters, to enforce no false sharing happens. This is a performance improvement. 3) Modified optimistic locks, for clarity. Pattern before: pfs_lock lock; m_lock.begin_optimistic_lock(&lock); m_lock.end_optimistic_lock(&lock); Pattern after: pfs_optimistic_state lock; m_lock.begin_optimistic_lock(&lock); m_lock.end_optimistic_lock(&lock); The new type, pfs_optimistic_state, better reflects that a state information is used in a begin / end section. This provides better typing, for type safety. Adjusted all the code accordingly. 4) Modified pfs_lock allocation, for performances. Pattern before: m_lock.is_free(); m_lock.free_to_dirty(); m_lock.dirty_to_allocated(); total: 0+1+2 = 3 atomic operations Note that free_to_dirty() could fail even for free locks, because the CAS can use an old version number, making the code attempt again another record. Pattern after: pfs_dirty_state dirty_state; m_lock.free_to_dirty(& dirty_state); m_lock.dirty_to_allocated(& dirty_state); total: 2+1 = 3 atomic operations. Now the code is garanteed to detect free records, because free_to_dirty() does an atomic load then a CAS. Adjusted all the code accordingly. 5) Modified pfs_lock deallocation, for performances. Pattern before: m_lock.allocated_to_free(); Total 2 atomic operations. Pattern after: m_lock.allocated_to_free(); Total 1 atomic operation. 6) Modified record updates, for performances. Pattern before: m_lock.allocated_to_dirty(); m_lock.dirty_to_allocated(); Total 4 atomic operations. Pattern after: pfs_dirty_state dirty_state; m_lock.allocated_to_dirty(& dirty_state); m_lock.dirty_to_allocated(& dirty_state); Total 2 atomic operations. Adjusted all the code accordingly. 7) Modified record peek, for reliability. Pattern before: m_lock.is_populated() used a dirty read, causing spurious missing records Pattern after: m_lock.is_populated() uses an atomic load, for correctness. This change is expected to resolve spurious test failures, where some records in performance schema tables are sometime missing, os some statistics (when computed on the fly) are sometime under evaluated.
Marc Alff authoredBug#17766582 PERFORMANCE SCHEMA OVERHEAD IN PFS_LOCK This fix is a general cleanup for code involving atomic operations in the performance schema, to reduce overhead and improve code clarity. Changes implemented: 1) Removed 'volatile' in the code. The C 'volatile' keyword has nothing to do with atomic operations, and is confusing. This is a code cleanup. 2) Added missing PFS_cacheline_uint32 to atomic counters, to enforce no false sharing happens. This is a performance improvement. 3) Modified optimistic locks, for clarity. Pattern before: pfs_lock lock; m_lock.begin_optimistic_lock(&lock); m_lock.end_optimistic_lock(&lock); Pattern after: pfs_optimistic_state lock; m_lock.begin_optimistic_lock(&lock); m_lock.end_optimistic_lock(&lock); The new type, pfs_optimistic_state, better reflects that a state information is used in a begin / end section. This provides better typing, for type safety. Adjusted all the code accordingly. 4) Modified pfs_lock allocation, for performances. Pattern before: m_lock.is_free(); m_lock.free_to_dirty(); m_lock.dirty_to_allocated(); total: 0+1+2 = 3 atomic operations Note that free_to_dirty() could fail even for free locks, because the CAS can use an old version number, making the code attempt again another record. Pattern after: pfs_dirty_state dirty_state; m_lock.free_to_dirty(& dirty_state); m_lock.dirty_to_allocated(& dirty_state); total: 2+1 = 3 atomic operations. Now the code is garanteed to detect free records, because free_to_dirty() does an atomic load then a CAS. Adjusted all the code accordingly. 5) Modified pfs_lock deallocation, for performances. Pattern before: m_lock.allocated_to_free(); Total 2 atomic operations. Pattern after: m_lock.allocated_to_free(); Total 1 atomic operation. 6) Modified record updates, for performances. Pattern before: m_lock.allocated_to_dirty(); m_lock.dirty_to_allocated(); Total 4 atomic operations. Pattern after: pfs_dirty_state dirty_state; m_lock.allocated_to_dirty(& dirty_state); m_lock.dirty_to_allocated(& dirty_state); Total 2 atomic operations. Adjusted all the code accordingly. 7) Modified record peek, for reliability. Pattern before: m_lock.is_populated() used a dirty read, causing spurious missing records Pattern after: m_lock.is_populated() uses an atomic load, for correctness. This change is expected to resolve spurious test failures, where some records in performance schema tables are sometime missing, os some statistics (when computed on the fly) are sometime under evaluated.
Loading