mysql-test/r/explain_for_connection_rqg_trad.result · mysql-5.7.25 · Rasoul Jahanshahi / Mysql Server

Feb 25, 2015

Bug #11747810: EXPLAIN EXTENDED SHOWS BOGUS VALUE FOR 'FILTERED' · 86ce784c

Chaithra Gopalareddy authored Feb 25, 2015

COLUMN FOR LIMIT QUERY

Problems:
a- in EXPLAIN, "filtered" column could show values >100%.
b- QEP_shared::m_rowcount is often a copy of
POSITION::rows_fetched, but in rare cases it's something else; it's
not well-defined
c- it's only used by EXPLAIN, which itself has complex logic to decide
what to show in "rows"/"filtered", depending on the access method.
d- post-greedy-search optimizations (like test_if_skip_sort_order,
adjust_access_methods and make_join_select)
change the access method without changing rows_fetched, or without
changing POSITION::filter_effect. Leads to displaying an obsolete
value in EXPLAIN.
e- test_if_skip_sort_order, when it switches to index scan, and sees
there's a small limit, decides to show this limit as "rows" in
EXPLAIN; it is the root cause of the values >100%.
This assignment to QEP_shared::m_rowcount is also the only use
of this variable in the phase before we have QEP_TABs, so it is
forcing us to share this variable between JOIN_TAB and QEP_TAB.
f- because for an index, table or range scan,
rows_fetched is usually set by calculate_scan_cost (called by
best_access_path), thus
includes filtering by constant conditions and filter_effect
doesn't include this filtering, make_join_readinfo() had to do
compensation calculations so that in EXPLAIN the user sees constant
conditions in "filtered" and not in "rows"; however, sometimes
rows_fetched was modified by (d), which didn't include constant
condition filtering, then the compensation calculation did wrong.

Fix:
- for (b) QEP_shared::m_rowcount is removed and rows_fetched is always
used, instead; as a consequence, EXPLAIN logic is simplified and (c)
solved. A side-effect of this simplification is that
semijoin-materialized tmp tables show "filtered=100" instead of
"filtered=0", which is an improvement.
- test_if_skip_sort_order() retains its logic, but we use rows_fetched
instead of row_count. We also set the filter_effect to magic value
of -2 stating that filter effect need not be calcuated for
constant-condtion-filter. make_join_readinfo(), with more sane
decision logic, solving (e) and (a).
- post-greedy-search logic now sets filter_effect to a magic value
(=-1), if it changes the access method; in make_join_readinfo() this
triggers a complete recalculation of filter_effect, if we're
explaining the statement. Solves (d).
- post-greedy-search logic doesn't anymore change rows_fetched, so
that constant-condition-filter compensation in make_join_readinfo()
always works, as it is now sure that rows_fetched is always from
best_access_path: the new rows_fetched is determined depending on the
latest access method (chosen by post-greedy-search logic), and
filter_effect is compensated. Solving (f). IF this fix is a problem
for some reason, if post-greedy-search logic wants to set
rows_fetched, then it is probably possible to do the filter_effect
compensation right after greedy-search. Assuming post-greedy-search
logic prefers to have, as input, rows_fetched without constant
condition.

Result files changes are described below; IMHO they are all correct.

distinct.result
explain SELECT distinct a from t3 order by a desc limit 2;
rows=3 becomes 204, as test_if_cheaper_ordering calcultes the
select limit wrongly. IMHO it is the problem that needs a fix
in test_if_cheaper_ordering as it calculates select limit w.r.t
the estimate from range optimizer. For this case range optimizer
gives an estimate of 3 which is thought to be the number of qualifying
rows after range is satisfied. But in this case it should have been
204 as there is no range in this query.

explain_for_connection*
changes look reasonable, however some of them are difficult to
inspect; the goal of these tests are to check equality between EXPLAIN
and EXPLAIN CONNECTION, so if both plans change it's ok.
Some changes are an effect that filter recalculation is done only by
normal EXPLAIN.

explain_other.result
using LIMIT was wrong. For the last change:
1 SIMPLE t1 NULL range a a 11 NULL 3 100.00
("rows" was 1), it's because we simplified the calculation of rows in
opt_explain.cc for dynamic range; there's no best value to show, as
during execution we switch back and forth between index scan (fanout
3) and range scan (fanout 1).

filter_single_*
rows=LIMIT(=1) is correct.

innodb_mrr*
1 SIMPLE t1 NULL index ind_parent_id PRIMARY 4 NULL 7 57.14 Using where
yes 57.14 is the filtering effect of IS NOT NULL (4 rows out of 7
match this, if you look at the table's data).
Line 732: another effect of EXPLAIN simplification, for
ref-using-quick-with-filesort (see comment in opt_explain.cc)

myisam_mrr*
see innodb_mrr

subquery_all*:
again ref-using-quick-with-filesort

subquery_sj*
0->100 is good; for 26.53 to 19 it's again the change for "range
checked for each record".

86ce784c

Bug #11747810: EXPLAIN EXTENDED SHOWS BOGUS VALUE FOR 'FILTERED'