storage/perfschema/pfs_lock.h · mysql-5.7.4 · Rasoul Jahanshahi / Mysql Server

Nov 08, 2013

· 4d46b756

Marc Alff authored Nov 08, 2013

Bug#17766582 PERFORMANCE SCHEMA OVERHEAD IN PFS_LOCK  

This fix is a general cleanup for code involving atomic operations in the
performance schema, to reduce overhead and improve code clarity.

Changes implemented:

1)

Removed 'volatile' in the code.
The C 'volatile' keyword has nothing to do with atomic operations,
and is confusing.

This is a code cleanup.

2)

Added missing PFS_cacheline_uint32 to atomic counters,
to enforce no false sharing happens.

This is a performance improvement.

3)

Modified optimistic locks, for clarity.

Pattern before:
  pfs_lock lock;
  m_lock.begin_optimistic_lock(&lock);
  m_lock.end_optimistic_lock(&lock);

Pattern after:
  pfs_optimistic_state lock;
  m_lock.begin_optimistic_lock(&lock);
  m_lock.end_optimistic_lock(&lock);

The new type, pfs_optimistic_state, better reflects that a state information
is used in a begin / end section.
This provides better typing, for type safety.

Adjusted all the code accordingly.

4)

Modified pfs_lock allocation, for performances.

Pattern before:
  m_lock.is_free();
  m_lock.free_to_dirty();
  m_lock.dirty_to_allocated();
  total: 0+1+2 = 3 atomic operations

Note that free_to_dirty() could fail even for free locks,
because the CAS can use an old version number,
making the code attempt again another record.

Pattern after:
  pfs_dirty_state dirty_state;
  m_lock.free_to_dirty(& dirty_state);
  m_lock.dirty_to_allocated(& dirty_state);
  total: 2+1 = 3 atomic operations.

Now the code is garanteed to detect free records,
because free_to_dirty() does an atomic load then a CAS.

Adjusted all the code accordingly.

5)

Modified pfs_lock deallocation, for performances.

Pattern before:
  m_lock.allocated_to_free();
  Total 2 atomic operations.

Pattern after:
  m_lock.allocated_to_free();
  Total 1 atomic operation.

6)

Modified record updates, for performances.

Pattern before:
  m_lock.allocated_to_dirty();
  m_lock.dirty_to_allocated();
  Total 4 atomic operations.

Pattern after:
  pfs_dirty_state dirty_state;
  m_lock.allocated_to_dirty(& dirty_state);
  m_lock.dirty_to_allocated(& dirty_state);
  Total 2 atomic operations.

Adjusted all the code accordingly.

7)

Modified record peek, for reliability.

Pattern before:
  m_lock.is_populated()
  used a dirty read, causing spurious missing records

Pattern after:
  m_lock.is_populated()
  uses an atomic load, for correctness.

This change is expected to resolve spurious test failures,
where some records in performance schema tables are sometime missing,
os some statistics (when computed on the fly) are sometime under evaluated.

4d46b756

Marc Alff authored Nov 08, 2013

Bug#17766582 PERFORMANCE SCHEMA OVERHEAD IN PFS_LOCK  

This fix is a general cleanup for code involving atomic operations in the
performance schema, to reduce overhead and improve code clarity.

Changes implemented:

1)

Removed 'volatile' in the code.
The C 'volatile' keyword has nothing to do with atomic operations,
and is confusing.

This is a code cleanup.

2)

Added missing PFS_cacheline_uint32 to atomic counters,
to enforce no false sharing happens.

This is a performance improvement.

3)

Modified optimistic locks, for clarity.

Pattern before:
  pfs_lock lock;
  m_lock.begin_optimistic_lock(&lock);
  m_lock.end_optimistic_lock(&lock);

Pattern after:
  pfs_optimistic_state lock;
  m_lock.begin_optimistic_lock(&lock);
  m_lock.end_optimistic_lock(&lock);

The new type, pfs_optimistic_state, better reflects that a state information
is used in a begin / end section.
This provides better typing, for type safety.

Adjusted all the code accordingly.

4)

Modified pfs_lock allocation, for performances.

Pattern before:
  m_lock.is_free();
  m_lock.free_to_dirty();
  m_lock.dirty_to_allocated();
  total: 0+1+2 = 3 atomic operations

Note that free_to_dirty() could fail even for free locks,
because the CAS can use an old version number,
making the code attempt again another record.

Pattern after:
  pfs_dirty_state dirty_state;
  m_lock.free_to_dirty(& dirty_state);
  m_lock.dirty_to_allocated(& dirty_state);
  total: 2+1 = 3 atomic operations.

Now the code is garanteed to detect free records,
because free_to_dirty() does an atomic load then a CAS.

Adjusted all the code accordingly.

5)

Modified pfs_lock deallocation, for performances.

Pattern before:
  m_lock.allocated_to_free();
  Total 2 atomic operations.

Pattern after:
  m_lock.allocated_to_free();
  Total 1 atomic operation.

6)

Modified record updates, for performances.

Pattern before:
  m_lock.allocated_to_dirty();
  m_lock.dirty_to_allocated();
  Total 4 atomic operations.

Pattern after:
  pfs_dirty_state dirty_state;
  m_lock.allocated_to_dirty(& dirty_state);
  m_lock.dirty_to_allocated(& dirty_state);
  Total 2 atomic operations.

Adjusted all the code accordingly.

7)

Modified record peek, for reliability.

Pattern before:
  m_lock.is_populated()
  used a dirty read, causing spurious missing records

Pattern after:
  m_lock.is_populated()
  uses an atomic load, for correctness.

This change is expected to resolve spurious test failures,
where some records in performance schema tables are sometime missing,
os some statistics (when computed on the fly) are sometime under evaluated.