mysql-test/r/analyze_debug.result · 3e90d07c3578e4da39dc1bce73559bbdf655c28c · Rasoul Jahanshahi / Mysql Server

Feb 26, 2021

BUG#32224917: ANALYZE TABLE TAKES TABLE LOCK DURING INDEX · 3a32cf15

Karthik Kamath authored Feb 26, 2021

              STATS UPDATE, CAUSES QUERY PILEUP

DESCRIPTION:
============
Consider a scenario which has a long running query on a
table. In another session, ANALYZE TABLE is executed on the
same table concurrently. After that, any subsequent queries
on that table will wait for the long running query to
complete. This causes query pileup depending on the duration
of the long running query.

ANALYSIS:
=========
ANALYZE TABLE performs key distribution analysis and stores
the distribution for the table or tables. This helps the
optimizer to choose better execution plans for subsequent
queries on the table.

Optimizer uses key distribution statistics which is stored
in the rec_per_key arrays in KEY objects. There are separate
KEY objects which are associated with TABLE and TABLE_SHARE
objects. In practice, both kinds of KEY objects point to the
same rec_per_key buffers allocated on TABLE_SHARE's memory.
This buffer is zero-initialized while TABLE_SHARE for the
table is constructed. Later when table is opened and TABLE
object is constructed, we call to handler::open() (and thus
handler::info() with HA_STATUS_CONST flag), which populates
it with non-zero values from the SE data.

In addition to key distribution statistics, ANALYZE TABLE
could also update other statistics such as table row count.
This statistics is loaded into TABLE::file:stats structure
when table is opened as well. However, in most cases before
being used this structure is refreshed by calling
handler::info() with HA_STATUS_VARIABLE flag.

In order to ensure that rec_per_key buffer and
TABLE::file::stats structures for the table are updated
after it, ANALYZE TABLE does two things currently. First, it
removes all unused TABLE objects for the table from the
table cache. Then it also marks TABLE_SHARE as old, in order
to ensure that used TABLE objects for the table will go away
once they stop being used (the TABLE_SHARE will be also
deleted from the table definition cache after that). As
result, the subsequent queries on the table will create a
new TABLE_SHARE and TABLE objects and call to
handler::open() will fetch new stats from the SE.

However, before this can happen we need to wait until all
sessions using the table stop doing so (i.e. wait for used
TABLE objects to become unused and deleted and the old
TABLE_SHARE to go away). This is the wait we are observing
in the bug scenario where the subsequent queries after
ANALYZE TABLE are waiting for the existing queries (before
or during ANALYZE TABLE) on the table to finish executing.
The longer the existing queries take to finish execution,
the longer the wait and thus the query pileup.

FIX:
====
TABLE_SHARE is no longer invalidated by ANALYZE TABLE.
Instead, we are invalidating all existing TABLE objects for
the table. We accomplish this by marking the TABLE as
invalid. This ensures that we do not reuse the TABLE objects
for subsequent queries thus making sure that we call
handler::open() and handler::info() for those queries, which
will fetch the new stats from the SE.
The TABLE flag m_needs_reopen has been split into two
members:
1. m_invalid_stats, which will mark a TABLE as invalid if it
has outdated info in rec_per_key and handler stats. This
flag is protected by Table_cache::m_lock lock thus can be
safely read and updated by different threads.
2. m_invalid_dict, which will mark a TABLE as invalid if the
TABLE object has inconsistent information or if handler
might not allow certain operations. For eg, We use this flag
to avoid calling handler::reset() for partitioned InnoDB
tables after in-place ALTER TABLE API commit phase and to
force closing table after REPAIR TABLE has failed during its
prepare phase. Unlike the first flag this one can be only
updated by thread which has table open, but thanks to this
it can be safely read by the same thread without extra
locking."

We have considered alternative solution of not invalidating
TABLE objects and just updating shared rec_per_key buffer
directly during ANALYZE TABLE instead.  However, this would
mean that info in TABLE::file::stats won't be updated by
ANALYZE TABLE, which might result in regressions since there
are places where we access TABLE::file::stats without prior
calls to handler::info().

Note that the fact that shared rec_per_key buffer is
concurrently updated and read without proper synchronization
is not changed by this patch. This issue is to be solved
separately.

Approved by: Dmitry Lenev <dmitry.lenev@oracle.com>
Approved by: Roy Lyseng <roy.lyseng@oracle.com>

3a32cf15

BUG#32224917: ANALYZE TABLE TAKES TABLE LOCK DURING INDEX

Karthik Kamath authored Feb 26, 2021

              STATS UPDATE, CAUSES QUERY PILEUP

DESCRIPTION:
============
Consider a scenario which has a long running query on a
table. In another session, ANALYZE TABLE is executed on the
same table concurrently. After that, any subsequent queries
on that table will wait for the long running query to
complete. This causes query pileup depending on the duration
of the long running query.

ANALYSIS:
=========
ANALYZE TABLE performs key distribution analysis and stores
the distribution for the table or tables. This helps the
optimizer to choose better execution plans for subsequent
queries on the table.

Optimizer uses key distribution statistics which is stored
in the rec_per_key arrays in KEY objects. There are separate
KEY objects which are associated with TABLE and TABLE_SHARE
objects. In practice, both kinds of KEY objects point to the
same rec_per_key buffers allocated on TABLE_SHARE's memory.
This buffer is zero-initialized while TABLE_SHARE for the
table is constructed. Later when table is opened and TABLE
object is constructed, we call to handler::open() (and thus
handler::info() with HA_STATUS_CONST flag), which populates
it with non-zero values from the SE data.

In addition to key distribution statistics, ANALYZE TABLE
could also update other statistics such as table row count.
This statistics is loaded into TABLE::file:stats structure
when table is opened as well. However, in most cases before
being used this structure is refreshed by calling
handler::info() with HA_STATUS_VARIABLE flag.

In order to ensure that rec_per_key buffer and
TABLE::file::stats structures for the table are updated
after it, ANALYZE TABLE does two things currently. First, it
removes all unused TABLE objects for the table from the
table cache. Then it also marks TABLE_SHARE as old, in order
to ensure that used TABLE objects for the table will go away
once they stop being used (the TABLE_SHARE will be also
deleted from the table definition cache after that). As
result, the subsequent queries on the table will create a
new TABLE_SHARE and TABLE objects and call to
handler::open() will fetch new stats from the SE.

However, before this can happen we need to wait until all
sessions using the table stop doing so (i.e. wait for used
TABLE objects to become unused and deleted and the old
TABLE_SHARE to go away). This is the wait we are observing
in the bug scenario where the subsequent queries after
ANALYZE TABLE are waiting for the existing queries (before
or during ANALYZE TABLE) on the table to finish executing.
The longer the existing queries take to finish execution,
the longer the wait and thus the query pileup.

FIX:
====
TABLE_SHARE is no longer invalidated by ANALYZE TABLE.
Instead, we are invalidating all existing TABLE objects for
the table. We accomplish this by marking the TABLE as
invalid. This ensures that we do not reuse the TABLE objects
for subsequent queries thus making sure that we call
handler::open() and handler::info() for those queries, which
will fetch the new stats from the SE.
The TABLE flag m_needs_reopen has been split into two
members:
1. m_invalid_stats, which will mark a TABLE as invalid if it
has outdated info in rec_per_key and handler stats. This
flag is protected by Table_cache::m_lock lock thus can be
safely read and updated by different threads.
2. m_invalid_dict, which will mark a TABLE as invalid if the
TABLE object has inconsistent information or if handler
might not allow certain operations. For eg, We use this flag
to avoid calling handler::reset() for partitioned InnoDB
tables after in-place ALTER TABLE API commit phase and to
force closing table after REPAIR TABLE has failed during its
prepare phase. Unlike the first flag this one can be only
updated by thread which has table open, but thanks to this
it can be safely read by the same thread without extra
locking."

We have considered alternative solution of not invalidating
TABLE objects and just updating shared rec_per_key buffer
directly during ANALYZE TABLE instead.  However, this would
mean that info in TABLE::file::stats won't be updated by
ANALYZE TABLE, which might result in regressions since there
are places where we access TABLE::file::stats without prior
calls to handler::info().

Note that the fact that shared rec_per_key buffer is
concurrently updated and read without proper synchronization
is not changed by this patch. This issue is to be solved
separately.

Approved by: Dmitry Lenev <dmitry.lenev@oracle.com>
Approved by: Roy Lyseng <roy.lyseng@oracle.com>