Skip to content
  • Karthik Kamath's avatar
    3a32cf15
    BUG#32224917: ANALYZE TABLE TAKES TABLE LOCK DURING INDEX · 3a32cf15
    Karthik Kamath authored
                  STATS UPDATE, CAUSES QUERY PILEUP
    
    DESCRIPTION:
    ============
    Consider a scenario which has a long running query on a
    table. In another session, ANALYZE TABLE is executed on the
    same table concurrently. After that, any subsequent queries
    on that table will wait for the long running query to
    complete. This causes query pileup depending on the duration
    of the long running query.
    
    ANALYSIS:
    =========
    ANALYZE TABLE performs key distribution analysis and stores
    the distribution for the table or tables. This helps the
    optimizer to choose better execution plans for subsequent
    queries on the table.
    
    Optimizer uses key distribution statistics which is stored
    in the rec_per_key arrays in KEY objects. There are separate
    KEY objects which are associated with TABLE and TABLE_SHARE
    objects. In practice, both kinds of KEY objects point to the
    same rec_per_key buffers allocated on TABLE_SHARE's memory.
    This buffer is zero-initialized while TABLE_SHARE for the
    table is constructed. Later when table is opened and TABLE
    object is constructed, we call to handler::open() (and thus
    handler::info() with HA_STATUS_CONST flag), which populates
    it with non-zero values from the SE data.
    
    In addition to key distribution statistics, ANALYZE TABLE
    could also update other statistics such as table row count.
    This statistics is loaded into TABLE::file:stats structure
    when table is opened as well. However, in most cases before
    being used this structure is refreshed by calling
    handler::info() with HA_STATUS_VARIABLE flag.
    
    In order to ensure that rec_per_key buffer and
    TABLE::file::stats structures for the table are updated
    after it, ANALYZE TABLE does two things currently. First, it
    removes all unused TABLE objects for the table from the
    table cache. Then it also marks TABLE_SHARE as old, in order
    to ensure that used TABLE objects for the table will go away
    once they stop being used (the TABLE_SHARE will be also
    deleted from the table definition cache after that). As
    result, the subsequent queries on the table will create a
    new TABLE_SHARE and TABLE objects and call to
    handler::open() will fetch new stats from the SE.
    
    However, before this can happen we need to wait until all
    sessions using the table stop doing so (i.e. wait for used
    TABLE objects to become unused and deleted and the old
    TABLE_SHARE to go away). This is the wait we are observing
    in the bug scenario where the subsequent queries after
    ANALYZE TABLE are waiting for the existing queries (before
    or during ANALYZE TABLE) on the table to finish executing.
    The longer the existing queries take to finish execution,
    the longer the wait and thus the query pileup.
    
    FIX:
    ====
    TABLE_SHARE is no longer invalidated by ANALYZE TABLE.
    Instead, we are invalidating all existing TABLE objects for
    the table. We accomplish this by marking the TABLE as
    invalid. This ensures that we do not reuse the TABLE objects
    for subsequent queries thus making sure that we call
    handler::open() and handler::info() for those queries, which
    will fetch the new stats from the SE.
    The TABLE flag m_needs_reopen has been split into two
    members:
    1. m_invalid_stats, which will mark a TABLE as invalid if it
    has outdated info in rec_per_key and handler stats. This
    flag is protected by Table_cache::m_lock lock thus can be
    safely read and updated by different threads.
    2. m_invalid_dict, which will mark a TABLE as invalid if the
    TABLE object has inconsistent information or if handler
    might not allow certain operations. For eg, We use this flag
    to avoid calling handler::reset() for partitioned InnoDB
    tables after in-place ALTER TABLE API commit phase and to
    force closing table after REPAIR TABLE has failed during its
    prepare phase. Unlike the first flag this one can be only
    updated by thread which has table open, but thanks to this
    it can be safely read by the same thread without extra
    locking."
    
    We have considered alternative solution of not invalidating
    TABLE objects and just updating shared rec_per_key buffer
    directly during ANALYZE TABLE instead.  However, this would
    mean that info in TABLE::file::stats won't be updated by
    ANALYZE TABLE, which might result in regressions since there
    are places where we access TABLE::file::stats without prior
    calls to handler::info().
    
    Note that the fact that shared rec_per_key buffer is
    concurrently updated and read without proper synchronization
    is not changed by this patch. This issue is to be solved
    separately.
    
    Approved by: Dmitry Lenev <dmitry.lenev@oracle.com>
    Approved by: Roy Lyseng <roy.lyseng@oracle.com>
    3a32cf15
    BUG#32224917: ANALYZE TABLE TAKES TABLE LOCK DURING INDEX
    Karthik Kamath authored
                  STATS UPDATE, CAUSES QUERY PILEUP
    
    DESCRIPTION:
    ============
    Consider a scenario which has a long running query on a
    table. In another session, ANALYZE TABLE is executed on the
    same table concurrently. After that, any subsequent queries
    on that table will wait for the long running query to
    complete. This causes query pileup depending on the duration
    of the long running query.
    
    ANALYSIS:
    =========
    ANALYZE TABLE performs key distribution analysis and stores
    the distribution for the table or tables. This helps the
    optimizer to choose better execution plans for subsequent
    queries on the table.
    
    Optimizer uses key distribution statistics which is stored
    in the rec_per_key arrays in KEY objects. There are separate
    KEY objects which are associated with TABLE and TABLE_SHARE
    objects. In practice, both kinds of KEY objects point to the
    same rec_per_key buffers allocated on TABLE_SHARE's memory.
    This buffer is zero-initialized while TABLE_SHARE for the
    table is constructed. Later when table is opened and TABLE
    object is constructed, we call to handler::open() (and thus
    handler::info() with HA_STATUS_CONST flag), which populates
    it with non-zero values from the SE data.
    
    In addition to key distribution statistics, ANALYZE TABLE
    could also update other statistics such as table row count.
    This statistics is loaded into TABLE::file:stats structure
    when table is opened as well. However, in most cases before
    being used this structure is refreshed by calling
    handler::info() with HA_STATUS_VARIABLE flag.
    
    In order to ensure that rec_per_key buffer and
    TABLE::file::stats structures for the table are updated
    after it, ANALYZE TABLE does two things currently. First, it
    removes all unused TABLE objects for the table from the
    table cache. Then it also marks TABLE_SHARE as old, in order
    to ensure that used TABLE objects for the table will go away
    once they stop being used (the TABLE_SHARE will be also
    deleted from the table definition cache after that). As
    result, the subsequent queries on the table will create a
    new TABLE_SHARE and TABLE objects and call to
    handler::open() will fetch new stats from the SE.
    
    However, before this can happen we need to wait until all
    sessions using the table stop doing so (i.e. wait for used
    TABLE objects to become unused and deleted and the old
    TABLE_SHARE to go away). This is the wait we are observing
    in the bug scenario where the subsequent queries after
    ANALYZE TABLE are waiting for the existing queries (before
    or during ANALYZE TABLE) on the table to finish executing.
    The longer the existing queries take to finish execution,
    the longer the wait and thus the query pileup.
    
    FIX:
    ====
    TABLE_SHARE is no longer invalidated by ANALYZE TABLE.
    Instead, we are invalidating all existing TABLE objects for
    the table. We accomplish this by marking the TABLE as
    invalid. This ensures that we do not reuse the TABLE objects
    for subsequent queries thus making sure that we call
    handler::open() and handler::info() for those queries, which
    will fetch the new stats from the SE.
    The TABLE flag m_needs_reopen has been split into two
    members:
    1. m_invalid_stats, which will mark a TABLE as invalid if it
    has outdated info in rec_per_key and handler stats. This
    flag is protected by Table_cache::m_lock lock thus can be
    safely read and updated by different threads.
    2. m_invalid_dict, which will mark a TABLE as invalid if the
    TABLE object has inconsistent information or if handler
    might not allow certain operations. For eg, We use this flag
    to avoid calling handler::reset() for partitioned InnoDB
    tables after in-place ALTER TABLE API commit phase and to
    force closing table after REPAIR TABLE has failed during its
    prepare phase. Unlike the first flag this one can be only
    updated by thread which has table open, but thanks to this
    it can be safely read by the same thread without extra
    locking."
    
    We have considered alternative solution of not invalidating
    TABLE objects and just updating shared rec_per_key buffer
    directly during ANALYZE TABLE instead.  However, this would
    mean that info in TABLE::file::stats won't be updated by
    ANALYZE TABLE, which might result in regressions since there
    are places where we access TABLE::file::stats without prior
    calls to handler::info().
    
    Note that the fact that shared rec_per_key buffer is
    concurrently updated and read without proper synchronization
    is not changed by this patch. This issue is to be solved
    separately.
    
    Approved by: Dmitry Lenev <dmitry.lenev@oracle.com>
    Approved by: Roy Lyseng <roy.lyseng@oracle.com>
Loading