Skip to content
  • Marc Alff's avatar
    f82dcccc
    Bug#14111678 PERFSCHEMA.SOCKET_SUMMARY_BY_EVENT_NAME_FUNC FAILS ON PB2 SPORADICALLY · f82dcccc
    Marc Alff authored
    (a) In Mysql 5.5, the performance schema keeps global statistics in a generic
    way, keeping only the wait time spent on an instrument, in a global array,
    global_instr_class_waits_array, which supports the table
    EVENTS_WAITS_GLOBAL_BY_EVENT_NAME.
    
    (b) In Mysql 5.6, some (but not all) instruments have been improved to keep more
    details statistics, so that stats are kept in per instrument specific
    structures, such as PFS_file_stat, PFS_socket_stat.
    
    Regardless of where statistics are counted, the implementation of table
    EVENTS_WAITS_GLOBAL_BY_EVENT_NAME must consistent, so that:
    - (c) the memory buffers used to aggregate statistics, whether generic or
      per instrument type, used in ::make_row()
    - (d) the memory buffers reset, when executed TRUNCATE TABLE
      EVENTS_WAITS_GLOBAL_BY_EVENT_NAME, used in ::delete_all_rows()
    must be the same.
    
    As found during the analysis of this bug report, this is not the case for
    socket io:
    - table_ews_global_by_event_name::make_socket_row() uses
      PFS_instance_wait_visitor, which reads from both the
    global_instr_class_waits_array array and PFS_socket_class.m_socket_stat
    - table_ews_global_by_event_name::delete_all_rows() only resets
      global_instr_class_waits_array.
    
    This leads to incoherent statistics reported in table
    EVENTS_WAITS_GLOBAL_BY_EVENT_NAME.
    
    The immediate bug is caused by the discrepancy between (c) and (d),
    but in general there is much confusion in the code created by the current
    mix of the two designs (a) and (b).
    
    Given that the performance schema is likely to maintain detailed statistics
    for each kind of instruments, which are by definition specific to each
    instrument type, the design (a) which is good enough only to keep "wait"
    statistics, is abandonned, to the benefit of (b).
    
    As a result, this patch completely removes the performance schema buffer
    global_instr_class_waits_array.
    Statistics for each sub type of instrument are kept in per instrument
    structures, such as:
    - PFS_mutex_stat
    - PFS_rwlock_stat
    - PFS_cond_stat
    
    For native instruments such as table io and table lock,
    global statistics are already kept in global variables (unchanged).
    
    For the "idle" native instrument, global statistics are now kept in
    global_idle_stat.
    
    With this change:
    - the overall design and code gains more clarity
    - the overall structure is easier to extend for enhancements
    - the memory footprint is decreased, due to unused / duplicate structures
      found in the existing code (PFS_instr::m_wait_stat) that got removed.
    - the performance schema overhead is reduced, since less CPU is needed in
      destroy() functions for instruments, which now only update statistics in 1
    place instead of 2.
    
    The global aggregation is based on the aggregation by instance and class.
    For statistics by thread / account / user / host, the design (a) is
    maintained, as the aggregation by thread is orthogonal to the aggregation by
    instances.
    f82dcccc
    Bug#14111678 PERFSCHEMA.SOCKET_SUMMARY_BY_EVENT_NAME_FUNC FAILS ON PB2 SPORADICALLY
    Marc Alff authored
    (a) In Mysql 5.5, the performance schema keeps global statistics in a generic
    way, keeping only the wait time spent on an instrument, in a global array,
    global_instr_class_waits_array, which supports the table
    EVENTS_WAITS_GLOBAL_BY_EVENT_NAME.
    
    (b) In Mysql 5.6, some (but not all) instruments have been improved to keep more
    details statistics, so that stats are kept in per instrument specific
    structures, such as PFS_file_stat, PFS_socket_stat.
    
    Regardless of where statistics are counted, the implementation of table
    EVENTS_WAITS_GLOBAL_BY_EVENT_NAME must consistent, so that:
    - (c) the memory buffers used to aggregate statistics, whether generic or
      per instrument type, used in ::make_row()
    - (d) the memory buffers reset, when executed TRUNCATE TABLE
      EVENTS_WAITS_GLOBAL_BY_EVENT_NAME, used in ::delete_all_rows()
    must be the same.
    
    As found during the analysis of this bug report, this is not the case for
    socket io:
    - table_ews_global_by_event_name::make_socket_row() uses
      PFS_instance_wait_visitor, which reads from both the
    global_instr_class_waits_array array and PFS_socket_class.m_socket_stat
    - table_ews_global_by_event_name::delete_all_rows() only resets
      global_instr_class_waits_array.
    
    This leads to incoherent statistics reported in table
    EVENTS_WAITS_GLOBAL_BY_EVENT_NAME.
    
    The immediate bug is caused by the discrepancy between (c) and (d),
    but in general there is much confusion in the code created by the current
    mix of the two designs (a) and (b).
    
    Given that the performance schema is likely to maintain detailed statistics
    for each kind of instruments, which are by definition specific to each
    instrument type, the design (a) which is good enough only to keep "wait"
    statistics, is abandonned, to the benefit of (b).
    
    As a result, this patch completely removes the performance schema buffer
    global_instr_class_waits_array.
    Statistics for each sub type of instrument are kept in per instrument
    structures, such as:
    - PFS_mutex_stat
    - PFS_rwlock_stat
    - PFS_cond_stat
    
    For native instruments such as table io and table lock,
    global statistics are already kept in global variables (unchanged).
    
    For the "idle" native instrument, global statistics are now kept in
    global_idle_stat.
    
    With this change:
    - the overall design and code gains more clarity
    - the overall structure is easier to extend for enhancements
    - the memory footprint is decreased, due to unused / duplicate structures
      found in the existing code (PFS_instr::m_wait_stat) that got removed.
    - the performance schema overhead is reduced, since less CPU is needed in
      destroy() functions for instruments, which now only update statistics in 1
    place instead of 2.
    
    The global aggregation is based on the aggregation by instance and class.
    For statistics by thread / account / user / host, the design (a) is
    maintained, as the aggregation by thread is orthogonal to the aggregation by
    instances.
Loading