Skip to content
  • Chaithra Gopalareddy's avatar
    86ce784c
    Bug #11747810: EXPLAIN EXTENDED SHOWS BOGUS VALUE FOR 'FILTERED' · 86ce784c
    Chaithra Gopalareddy authored
    	       COLUMN FOR LIMIT QUERY
    
    Problems:
    a- in EXPLAIN, "filtered" column could show values >100%.
    b- QEP_shared::m_rowcount is often a copy of
    POSITION::rows_fetched, but in rare cases it's something else; it's
    not well-defined
    c- it's only used by EXPLAIN, which itself has complex logic to decide
    what to show in "rows"/"filtered", depending on the access method.
    d- post-greedy-search optimizations (like test_if_skip_sort_order,
    adjust_access_methods and make_join_select)
    change the access method without changing rows_fetched, or without
    changing POSITION::filter_effect. Leads to displaying an obsolete
    value in EXPLAIN.
    e- test_if_skip_sort_order, when it switches to index scan, and sees
    there's a small limit, decides to show this limit as "rows" in
    EXPLAIN; it is the root cause of the values >100%.
    This assignment to QEP_shared::m_rowcount is also the only use
    of this variable in the phase before we have QEP_TABs, so it is
    forcing us to share this variable between JOIN_TAB and QEP_TAB.
    f- because for an index, table or range scan,
    rows_fetched is usually set by calculate_scan_cost (called by
    best_access_path), thus
    includes filtering by constant conditions and filter_effect
    doesn't include this filtering, make_join_readinfo() had to do
    compensation calculations so that in EXPLAIN the user sees constant
    conditions in "filtered" and not in "rows"; however, sometimes
    rows_fetched was modified by (d), which didn't include constant
    condition filtering, then the compensation calculation did wrong.
    
    Fix:
    - for (b) QEP_shared::m_rowcount is removed and rows_fetched is always
    used, instead; as a consequence, EXPLAIN logic is simplified and (c)
    solved. A side-effect of this simplification is that
    semijoin-materialized tmp tables show "filtered=100" instead of
    "filtered=0", which is an improvement.
    - test_if_skip_sort_order() retains its logic, but we use rows_fetched
    instead of row_count. We also set the filter_effect to magic value
    of -2 stating that filter effect need not be calcuated for
     constant-condtion-filter. make_join_readinfo(), with more sane
     decision logic, solving (e) and (a).
    - post-greedy-search logic now sets filter_effect to a magic value
    (=-1), if it changes the access method; in make_join_readinfo() this
    triggers a complete recalculation of filter_effect, if we're
    explaining the statement. Solves (d).
    - post-greedy-search logic doesn't anymore change rows_fetched, so
    that constant-condition-filter compensation in make_join_readinfo()
    always works, as it is now sure that rows_fetched is always from
    best_access_path: the new rows_fetched is determined depending on the
    latest access method (chosen by post-greedy-search logic), and
    filter_effect is compensated. Solving (f). IF this fix is a problem
    for some reason, if post-greedy-search logic wants to set
    rows_fetched, then it is probably possible to do the filter_effect
    compensation right after greedy-search. Assuming post-greedy-search
    logic prefers to have, as input, rows_fetched without constant
    condition.
    
    Result files changes are described below; IMHO they are all correct.
    
    distinct.result
    explain SELECT distinct a from t3 order by a desc limit 2;
    rows=3 becomes 204, as test_if_cheaper_ordering calcultes the
    select limit wrongly. IMHO it is the problem that needs a fix
    in test_if_cheaper_ordering as it calculates select limit w.r.t
    the estimate from range optimizer. For this case range optimizer
    gives an estimate of 3 which is thought to be the number of qualifying
    rows after range is satisfied. But in this case it should have been
    204 as there is no range in this query.
    
    explain_for_connection*
    changes look reasonable, however some of them are difficult to
    inspect; the goal of these tests are to check equality between EXPLAIN
    and EXPLAIN CONNECTION, so if both plans change it's ok.
    Some changes are an effect that filter recalculation is done only by
    normal EXPLAIN.
    
    explain_other.result
    using LIMIT was wrong. For the last change:
    1       SIMPLE  t1      NULL    range   a       a       11      NULL    3       100.00
    ("rows" was 1), it's because we simplified the calculation of rows in
    opt_explain.cc for dynamic range; there's no best value to show, as
    during execution we switch back and forth between index scan (fanout
    3) and range scan (fanout 1).
    
    filter_single_*
    rows=LIMIT(=1) is correct.
    
    innodb_mrr*
    1       SIMPLE  t1      NULL    index   ind_parent_id   PRIMARY 4       NULL    7       57.14   Using where
    yes 57.14 is the filtering effect of IS NOT NULL (4 rows out of 7
    match this, if you look at the table's data).
    Line 732: another effect of EXPLAIN simplification, for
    ref-using-quick-with-filesort (see comment in opt_explain.cc)
    
    myisam_mrr*
    see innodb_mrr
    
    subquery_all*:
    again ref-using-quick-with-filesort
    
    subquery_sj*
    0->100 is good; for 26.53 to 19 it's again the change for "range
    checked for each record".
    86ce784c
    Bug #11747810: EXPLAIN EXTENDED SHOWS BOGUS VALUE FOR 'FILTERED'
    Chaithra Gopalareddy authored
    	       COLUMN FOR LIMIT QUERY
    
    Problems:
    a- in EXPLAIN, "filtered" column could show values >100%.
    b- QEP_shared::m_rowcount is often a copy of
    POSITION::rows_fetched, but in rare cases it's something else; it's
    not well-defined
    c- it's only used by EXPLAIN, which itself has complex logic to decide
    what to show in "rows"/"filtered", depending on the access method.
    d- post-greedy-search optimizations (like test_if_skip_sort_order,
    adjust_access_methods and make_join_select)
    change the access method without changing rows_fetched, or without
    changing POSITION::filter_effect. Leads to displaying an obsolete
    value in EXPLAIN.
    e- test_if_skip_sort_order, when it switches to index scan, and sees
    there's a small limit, decides to show this limit as "rows" in
    EXPLAIN; it is the root cause of the values >100%.
    This assignment to QEP_shared::m_rowcount is also the only use
    of this variable in the phase before we have QEP_TABs, so it is
    forcing us to share this variable between JOIN_TAB and QEP_TAB.
    f- because for an index, table or range scan,
    rows_fetched is usually set by calculate_scan_cost (called by
    best_access_path), thus
    includes filtering by constant conditions and filter_effect
    doesn't include this filtering, make_join_readinfo() had to do
    compensation calculations so that in EXPLAIN the user sees constant
    conditions in "filtered" and not in "rows"; however, sometimes
    rows_fetched was modified by (d), which didn't include constant
    condition filtering, then the compensation calculation did wrong.
    
    Fix:
    - for (b) QEP_shared::m_rowcount is removed and rows_fetched is always
    used, instead; as a consequence, EXPLAIN logic is simplified and (c)
    solved. A side-effect of this simplification is that
    semijoin-materialized tmp tables show "filtered=100" instead of
    "filtered=0", which is an improvement.
    - test_if_skip_sort_order() retains its logic, but we use rows_fetched
    instead of row_count. We also set the filter_effect to magic value
    of -2 stating that filter effect need not be calcuated for
     constant-condtion-filter. make_join_readinfo(), with more sane
     decision logic, solving (e) and (a).
    - post-greedy-search logic now sets filter_effect to a magic value
    (=-1), if it changes the access method; in make_join_readinfo() this
    triggers a complete recalculation of filter_effect, if we're
    explaining the statement. Solves (d).
    - post-greedy-search logic doesn't anymore change rows_fetched, so
    that constant-condition-filter compensation in make_join_readinfo()
    always works, as it is now sure that rows_fetched is always from
    best_access_path: the new rows_fetched is determined depending on the
    latest access method (chosen by post-greedy-search logic), and
    filter_effect is compensated. Solving (f). IF this fix is a problem
    for some reason, if post-greedy-search logic wants to set
    rows_fetched, then it is probably possible to do the filter_effect
    compensation right after greedy-search. Assuming post-greedy-search
    logic prefers to have, as input, rows_fetched without constant
    condition.
    
    Result files changes are described below; IMHO they are all correct.
    
    distinct.result
    explain SELECT distinct a from t3 order by a desc limit 2;
    rows=3 becomes 204, as test_if_cheaper_ordering calcultes the
    select limit wrongly. IMHO it is the problem that needs a fix
    in test_if_cheaper_ordering as it calculates select limit w.r.t
    the estimate from range optimizer. For this case range optimizer
    gives an estimate of 3 which is thought to be the number of qualifying
    rows after range is satisfied. But in this case it should have been
    204 as there is no range in this query.
    
    explain_for_connection*
    changes look reasonable, however some of them are difficult to
    inspect; the goal of these tests are to check equality between EXPLAIN
    and EXPLAIN CONNECTION, so if both plans change it's ok.
    Some changes are an effect that filter recalculation is done only by
    normal EXPLAIN.
    
    explain_other.result
    using LIMIT was wrong. For the last change:
    1       SIMPLE  t1      NULL    range   a       a       11      NULL    3       100.00
    ("rows" was 1), it's because we simplified the calculation of rows in
    opt_explain.cc for dynamic range; there's no best value to show, as
    during execution we switch back and forth between index scan (fanout
    3) and range scan (fanout 1).
    
    filter_single_*
    rows=LIMIT(=1) is correct.
    
    innodb_mrr*
    1       SIMPLE  t1      NULL    index   ind_parent_id   PRIMARY 4       NULL    7       57.14   Using where
    yes 57.14 is the filtering effect of IS NOT NULL (4 rows out of 7
    match this, if you look at the table's data).
    Line 732: another effect of EXPLAIN simplification, for
    ref-using-quick-with-filesort (see comment in opt_explain.cc)
    
    myisam_mrr*
    see innodb_mrr
    
    subquery_all*:
    again ref-using-quick-with-filesort
    
    subquery_sj*
    0->100 is good; for 26.53 to 19 it's again the change for "range
    checked for each record".
Loading