-
Chaithra Gopalareddy authored
COLUMN FOR LIMIT QUERY Problems: a- in EXPLAIN, "filtered" column could show values >100%. b- QEP_shared::m_rowcount is often a copy of POSITION::rows_fetched, but in rare cases it's something else; it's not well-defined c- it's only used by EXPLAIN, which itself has complex logic to decide what to show in "rows"/"filtered", depending on the access method. d- post-greedy-search optimizations (like test_if_skip_sort_order, adjust_access_methods and make_join_select) change the access method without changing rows_fetched, or without changing POSITION::filter_effect. Leads to displaying an obsolete value in EXPLAIN. e- test_if_skip_sort_order, when it switches to index scan, and sees there's a small limit, decides to show this limit as "rows" in EXPLAIN; it is the root cause of the values >100%. This assignment to QEP_shared::m_rowcount is also the only use of this variable in the phase before we have QEP_TABs, so it is forcing us to share this variable between JOIN_TAB and QEP_TAB. f- because for an index, table or range scan, rows_fetched is usually set by calculate_scan_cost (called by best_access_path), thus includes filtering by constant conditions and filter_effect doesn't include this filtering, make_join_readinfo() had to do compensation calculations so that in EXPLAIN the user sees constant conditions in "filtered" and not in "rows"; however, sometimes rows_fetched was modified by (d), which didn't include constant condition filtering, then the compensation calculation did wrong. Fix: - for (b) QEP_shared::m_rowcount is removed and rows_fetched is always used, instead; as a consequence, EXPLAIN logic is simplified and (c) solved. A side-effect of this simplification is that semijoin-materialized tmp tables show "filtered=100" instead of "filtered=0", which is an improvement. - test_if_skip_sort_order() retains its logic, but we use rows_fetched instead of row_count. We also set the filter_effect to magic value of -2 stating that filter effect need not be calcuated for constant-condtion-filter. make_join_readinfo(), with more sane decision logic, solving (e) and (a). - post-greedy-search logic now sets filter_effect to a magic value (=-1), if it changes the access method; in make_join_readinfo() this triggers a complete recalculation of filter_effect, if we're explaining the statement. Solves (d). - post-greedy-search logic doesn't anymore change rows_fetched, so that constant-condition-filter compensation in make_join_readinfo() always works, as it is now sure that rows_fetched is always from best_access_path: the new rows_fetched is determined depending on the latest access method (chosen by post-greedy-search logic), and filter_effect is compensated. Solving (f). IF this fix is a problem for some reason, if post-greedy-search logic wants to set rows_fetched, then it is probably possible to do the filter_effect compensation right after greedy-search. Assuming post-greedy-search logic prefers to have, as input, rows_fetched without constant condition. Result files changes are described below; IMHO they are all correct. distinct.result explain SELECT distinct a from t3 order by a desc limit 2; rows=3 becomes 204, as test_if_cheaper_ordering calcultes the select limit wrongly. IMHO it is the problem that needs a fix in test_if_cheaper_ordering as it calculates select limit w.r.t the estimate from range optimizer. For this case range optimizer gives an estimate of 3 which is thought to be the number of qualifying rows after range is satisfied. But in this case it should have been 204 as there is no range in this query. explain_for_connection* changes look reasonable, however some of them are difficult to inspect; the goal of these tests are to check equality between EXPLAIN and EXPLAIN CONNECTION, so if both plans change it's ok. Some changes are an effect that filter recalculation is done only by normal EXPLAIN. explain_other.result using LIMIT was wrong. For the last change: 1 SIMPLE t1 NULL range a a 11 NULL 3 100.00 ("rows" was 1), it's because we simplified the calculation of rows in opt_explain.cc for dynamic range; there's no best value to show, as during execution we switch back and forth between index scan (fanout 3) and range scan (fanout 1). filter_single_* rows=LIMIT(=1) is correct. innodb_mrr* 1 SIMPLE t1 NULL index ind_parent_id PRIMARY 4 NULL 7 57.14 Using where yes 57.14 is the filtering effect of IS NOT NULL (4 rows out of 7 match this, if you look at the table's data). Line 732: another effect of EXPLAIN simplification, for ref-using-quick-with-filesort (see comment in opt_explain.cc) myisam_mrr* see innodb_mrr subquery_all*: again ref-using-quick-with-filesort subquery_sj* 0->100 is good; for 26.53 to 19 it's again the change for "range checked for each record".
Chaithra Gopalareddy authoredCOLUMN FOR LIMIT QUERY Problems: a- in EXPLAIN, "filtered" column could show values >100%. b- QEP_shared::m_rowcount is often a copy of POSITION::rows_fetched, but in rare cases it's something else; it's not well-defined c- it's only used by EXPLAIN, which itself has complex logic to decide what to show in "rows"/"filtered", depending on the access method. d- post-greedy-search optimizations (like test_if_skip_sort_order, adjust_access_methods and make_join_select) change the access method without changing rows_fetched, or without changing POSITION::filter_effect. Leads to displaying an obsolete value in EXPLAIN. e- test_if_skip_sort_order, when it switches to index scan, and sees there's a small limit, decides to show this limit as "rows" in EXPLAIN; it is the root cause of the values >100%. This assignment to QEP_shared::m_rowcount is also the only use of this variable in the phase before we have QEP_TABs, so it is forcing us to share this variable between JOIN_TAB and QEP_TAB. f- because for an index, table or range scan, rows_fetched is usually set by calculate_scan_cost (called by best_access_path), thus includes filtering by constant conditions and filter_effect doesn't include this filtering, make_join_readinfo() had to do compensation calculations so that in EXPLAIN the user sees constant conditions in "filtered" and not in "rows"; however, sometimes rows_fetched was modified by (d), which didn't include constant condition filtering, then the compensation calculation did wrong. Fix: - for (b) QEP_shared::m_rowcount is removed and rows_fetched is always used, instead; as a consequence, EXPLAIN logic is simplified and (c) solved. A side-effect of this simplification is that semijoin-materialized tmp tables show "filtered=100" instead of "filtered=0", which is an improvement. - test_if_skip_sort_order() retains its logic, but we use rows_fetched instead of row_count. We also set the filter_effect to magic value of -2 stating that filter effect need not be calcuated for constant-condtion-filter. make_join_readinfo(), with more sane decision logic, solving (e) and (a). - post-greedy-search logic now sets filter_effect to a magic value (=-1), if it changes the access method; in make_join_readinfo() this triggers a complete recalculation of filter_effect, if we're explaining the statement. Solves (d). - post-greedy-search logic doesn't anymore change rows_fetched, so that constant-condition-filter compensation in make_join_readinfo() always works, as it is now sure that rows_fetched is always from best_access_path: the new rows_fetched is determined depending on the latest access method (chosen by post-greedy-search logic), and filter_effect is compensated. Solving (f). IF this fix is a problem for some reason, if post-greedy-search logic wants to set rows_fetched, then it is probably possible to do the filter_effect compensation right after greedy-search. Assuming post-greedy-search logic prefers to have, as input, rows_fetched without constant condition. Result files changes are described below; IMHO they are all correct. distinct.result explain SELECT distinct a from t3 order by a desc limit 2; rows=3 becomes 204, as test_if_cheaper_ordering calcultes the select limit wrongly. IMHO it is the problem that needs a fix in test_if_cheaper_ordering as it calculates select limit w.r.t the estimate from range optimizer. For this case range optimizer gives an estimate of 3 which is thought to be the number of qualifying rows after range is satisfied. But in this case it should have been 204 as there is no range in this query. explain_for_connection* changes look reasonable, however some of them are difficult to inspect; the goal of these tests are to check equality between EXPLAIN and EXPLAIN CONNECTION, so if both plans change it's ok. Some changes are an effect that filter recalculation is done only by normal EXPLAIN. explain_other.result using LIMIT was wrong. For the last change: 1 SIMPLE t1 NULL range a a 11 NULL 3 100.00 ("rows" was 1), it's because we simplified the calculation of rows in opt_explain.cc for dynamic range; there's no best value to show, as during execution we switch back and forth between index scan (fanout 3) and range scan (fanout 1). filter_single_* rows=LIMIT(=1) is correct. innodb_mrr* 1 SIMPLE t1 NULL index ind_parent_id PRIMARY 4 NULL 7 57.14 Using where yes 57.14 is the filtering effect of IS NOT NULL (4 rows out of 7 match this, if you look at the table's data). Line 732: another effect of EXPLAIN simplification, for ref-using-quick-with-filesort (see comment in opt_explain.cc) myisam_mrr* see innodb_mrr subquery_all*: again ref-using-quick-with-filesort subquery_sj* 0->100 is good; for 26.53 to 19 it's again the change for "range checked for each record".
Loading