-
Karthik Kamath authored
STATS UPDATE, CAUSES QUERY PILEUP DESCRIPTION: ============ Consider a scenario which has a long running query on a table. In another session, ANALYZE TABLE is executed on the same table concurrently. After that, any subsequent queries on that table will wait for the long running query to complete. This causes query pileup depending on the duration of the long running query. ANALYSIS: ========= ANALYZE TABLE performs key distribution analysis and stores the distribution for the table or tables. This helps the optimizer to choose better execution plans for subsequent queries on the table. Optimizer uses key distribution statistics which is stored in the rec_per_key arrays in KEY objects. There are separate KEY objects which are associated with TABLE and TABLE_SHARE objects. In practice, both kinds of KEY objects point to the same rec_per_key buffers allocated on TABLE_SHARE's memory. This buffer is zero-initialized while TABLE_SHARE for the table is constructed. Later when table is opened and TABLE object is constructed, we call to handler::open() (and thus handler::info() with HA_STATUS_CONST flag), which populates it with non-zero values from the SE data. In addition to key distribution statistics, ANALYZE TABLE could also update other statistics such as table row count. This statistics is loaded into TABLE::file:stats structure when table is opened as well. However, in most cases before being used this structure is refreshed by calling handler::info() with HA_STATUS_VARIABLE flag. In order to ensure that rec_per_key buffer and TABLE::file::stats structures for the table are updated after it, ANALYZE TABLE does two things currently. First, it removes all unused TABLE objects for the table from the table cache. Then it also marks TABLE_SHARE as old, in order to ensure that used TABLE objects for the table will go away once they stop being used (the TABLE_SHARE will be also deleted from the table definition cache after that). As result, the subsequent queries on the table will create a new TABLE_SHARE and TABLE objects and call to handler::open() will fetch new stats from the SE. However, before this can happen we need to wait until all sessions using the table stop doing so (i.e. wait for used TABLE objects to become unused and deleted and the old TABLE_SHARE to go away). This is the wait we are observing in the bug scenario where the subsequent queries after ANALYZE TABLE are waiting for the existing queries (before or during ANALYZE TABLE) on the table to finish executing. The longer the existing queries take to finish execution, the longer the wait and thus the query pileup. FIX: ==== TABLE_SHARE is no longer invalidated by ANALYZE TABLE. Instead, we are invalidating all existing TABLE objects for the table. We accomplish this by marking the TABLE as invalid. This ensures that we do not reuse the TABLE objects for subsequent queries thus making sure that we call handler::open() and handler::info() for those queries, which will fetch the new stats from the SE. The TABLE flag m_needs_reopen has been split into two members: 1. m_invalid_stats, which will mark a TABLE as invalid if it has outdated info in rec_per_key and handler stats. This flag is protected by Table_cache::m_lock lock thus can be safely read and updated by different threads. 2. m_invalid_dict, which will mark a TABLE as invalid if the TABLE object has inconsistent information or if handler might not allow certain operations. For eg, We use this flag to avoid calling handler::reset() for partitioned InnoDB tables after in-place ALTER TABLE API commit phase and to force closing table after REPAIR TABLE has failed during its prepare phase. Unlike the first flag this one can be only updated by thread which has table open, but thanks to this it can be safely read by the same thread without extra locking." We have considered alternative solution of not invalidating TABLE objects and just updating shared rec_per_key buffer directly during ANALYZE TABLE instead. However, this would mean that info in TABLE::file::stats won't be updated by ANALYZE TABLE, which might result in regressions since there are places where we access TABLE::file::stats without prior calls to handler::info(). Note that the fact that shared rec_per_key buffer is concurrently updated and read without proper synchronization is not changed by this patch. This issue is to be solved separately. Approved by: Dmitry Lenev <dmitry.lenev@oracle.com> Approved by: Roy Lyseng <roy.lyseng@oracle.com>
Karthik Kamath authoredSTATS UPDATE, CAUSES QUERY PILEUP DESCRIPTION: ============ Consider a scenario which has a long running query on a table. In another session, ANALYZE TABLE is executed on the same table concurrently. After that, any subsequent queries on that table will wait for the long running query to complete. This causes query pileup depending on the duration of the long running query. ANALYSIS: ========= ANALYZE TABLE performs key distribution analysis and stores the distribution for the table or tables. This helps the optimizer to choose better execution plans for subsequent queries on the table. Optimizer uses key distribution statistics which is stored in the rec_per_key arrays in KEY objects. There are separate KEY objects which are associated with TABLE and TABLE_SHARE objects. In practice, both kinds of KEY objects point to the same rec_per_key buffers allocated on TABLE_SHARE's memory. This buffer is zero-initialized while TABLE_SHARE for the table is constructed. Later when table is opened and TABLE object is constructed, we call to handler::open() (and thus handler::info() with HA_STATUS_CONST flag), which populates it with non-zero values from the SE data. In addition to key distribution statistics, ANALYZE TABLE could also update other statistics such as table row count. This statistics is loaded into TABLE::file:stats structure when table is opened as well. However, in most cases before being used this structure is refreshed by calling handler::info() with HA_STATUS_VARIABLE flag. In order to ensure that rec_per_key buffer and TABLE::file::stats structures for the table are updated after it, ANALYZE TABLE does two things currently. First, it removes all unused TABLE objects for the table from the table cache. Then it also marks TABLE_SHARE as old, in order to ensure that used TABLE objects for the table will go away once they stop being used (the TABLE_SHARE will be also deleted from the table definition cache after that). As result, the subsequent queries on the table will create a new TABLE_SHARE and TABLE objects and call to handler::open() will fetch new stats from the SE. However, before this can happen we need to wait until all sessions using the table stop doing so (i.e. wait for used TABLE objects to become unused and deleted and the old TABLE_SHARE to go away). This is the wait we are observing in the bug scenario where the subsequent queries after ANALYZE TABLE are waiting for the existing queries (before or during ANALYZE TABLE) on the table to finish executing. The longer the existing queries take to finish execution, the longer the wait and thus the query pileup. FIX: ==== TABLE_SHARE is no longer invalidated by ANALYZE TABLE. Instead, we are invalidating all existing TABLE objects for the table. We accomplish this by marking the TABLE as invalid. This ensures that we do not reuse the TABLE objects for subsequent queries thus making sure that we call handler::open() and handler::info() for those queries, which will fetch the new stats from the SE. The TABLE flag m_needs_reopen has been split into two members: 1. m_invalid_stats, which will mark a TABLE as invalid if it has outdated info in rec_per_key and handler stats. This flag is protected by Table_cache::m_lock lock thus can be safely read and updated by different threads. 2. m_invalid_dict, which will mark a TABLE as invalid if the TABLE object has inconsistent information or if handler might not allow certain operations. For eg, We use this flag to avoid calling handler::reset() for partitioned InnoDB tables after in-place ALTER TABLE API commit phase and to force closing table after REPAIR TABLE has failed during its prepare phase. Unlike the first flag this one can be only updated by thread which has table open, but thanks to this it can be safely read by the same thread without extra locking." We have considered alternative solution of not invalidating TABLE objects and just updating shared rec_per_key buffer directly during ANALYZE TABLE instead. However, this would mean that info in TABLE::file::stats won't be updated by ANALYZE TABLE, which might result in regressions since there are places where we access TABLE::file::stats without prior calls to handler::info(). Note that the fact that shared rec_per_key buffer is concurrently updated and read without proper synchronization is not changed by this patch. This issue is to be solved separately. Approved by: Dmitry Lenev <dmitry.lenev@oracle.com> Approved by: Roy Lyseng <roy.lyseng@oracle.com>
Loading