storage/perfschema/pfs_lock.h · d9682c92f5455fce012b9207d1b9810f7f2fe187 · Rasoul Jahanshahi / Mysql Server

Sep 25, 2012

· 700ee866

Marc Alff authored Sep 25, 2012

Bug#14116386 PERFSCHEMA.TABLE_LOCK_AGGREGATE_GLOBAL_4U_3T FAILS ON MYSQL-TRUNK SPORADICALLY

This fix is a server code change, that fixes an issue of spuriously
disappearing threads in the performance schema.

This issue is affecting randomly many tests scripts, and in particular
scripts that rely heavily on per thread statistics.

Background information:

Tests scripts are executed with mysql-test-run, a client.
The test client consider that the server is "done" executing a statement
when the current statement execution is done replying all the bytes in the
client socket connection.
In terms of code, this happens after the last socket write event, which is
technically still part of the current statement execution.

However, the server is not "done" yet, and is not idle.
The server code is still executing code, such as:
- more wait events, such as mutex locks
- more stages events, such as "cleaning up",
- finish the current statement instrumentation,
- start an idle wait event.

Only when the server is blocked in an IDLE event can the server be
considered is a stable state. Anything that happen between the last socket
write and the idle wait can potentially cause interferences with the queries
executed by the client script (in a different connection).
This scenario is very common for the test scripts involved, where a root
monitoring connection spy on a regular client connection.

Of particular interrest, during that window of execution in the server, are
calls that maintain the performance_schema.threads table, for the
PROCESSLIST_STATE and PROCESSLIST_INFO columns.
These columns are still updated while the test client makes queries to the
performance schema.

The problem is with the implementation of:
- set_thread_state_v1()
- set_thread_info_v1()
which for a short period of time flag the entire thread as dirty.

This code cause the thread to litteraly disappear and reappear later in:
- the threads table
- every aggregate table that iterates on threads
causing the test failures seen.

This issue has in fact a significant impact, as these instrumentation points
are called multiple times during a statement execution: at each instrumented
stage.

The fix is to:
- not touch the PFS_thread::m_lock
- define a dedicated lock to maintain integrity of the processlist state and info
attributes, PFS_thread::m_processlist_lock.

With this fix, a PFS_thread is not spuriously hidden each time the thread state
changes, making every aggregate based on threads more stable and accurate.

700ee866