-
Oystein Grovlen authored
After the introduction of filtering estimation for conditions (WL#6635), the expected number of rows selected from a table may be between 0 and 1. Some of our algorithms are not prepared for this. One observed effect is that semi-join DuplicateWeedout plans include tables in the duplicate-eliminating range that are not relevant to the semi-join (so-called nt tables). This happens because extending the range with a table where the estimated number of rows are less than one make the cost go down. This patch solves this issue by ensuring that filtering estimates always gives at least one row. I think there are several reasons to do this instead of trying to adapt our code to work with values below 1: - WL#6635 already put such a lower limit on the filter effect of a single condition. Hence, values lower than 1 may only happen where there are multiple conditions on a table. However, I think the same reasoning could be made in case of multiple conditions. That is, filtering estimates should assume at least 1 row regardless of number of conditions. - The assumption when computing filtering effect of multiple conditions is that conditions are non-correlated. In actual databases, this is normally not the case. Hence, the filtering estimates of multiple conditions will often be lower than what is actual the case. - I think it is more important to get an optimized query plan for the cases where the result is non-empty than for the case when no records are returned. With very low numbers, one risk that the join order has very little impact on the total cost of the query. This means that for the cases when there is actual a match, the join order may not be optimal. - It is not trivial to adjust the optimizer code to cope with values lower than 1. Note: The existing limitation on filter, "filter*fanout >= 0.05", in calculate_condition_filter() is kept. I am not convinced this limitation makes sense, but dropping it had negative effects on DBT-3 query 8. This patch does not contain a new test case. The "How-to-repeat" part of the bug report consists of an existing test case in subquery.inc, and the subquery_sj_dupsweed.test shows that an nt table is not longer included in the duplicate-producing range. (See first hunk of subquery_sj_dupsweedout.result). File comments: (In addition comes result files where all changes are changes in filter and row estimates. That is, there is no plan changes.) sql/sql_planner.cc Make sure calculate_condition_filter() never returns a number that corresponds to less than 1 row in the given table. mysql-test/include/explain_json.inc Added some rows to a table of a test case to get a plan which use materialization. According to comment in the test, this is what was intended when test case was added. mysql-test/include/subquery_sj.inc Added some rows to a table of a test case to get same plan as before. mysql-test/r/derived.result Join order has changed, but this should not be relevant for the objective of the test. mysql-test/r/explain_json_all.result More records in table leads to different cost estimates. Test case now use materialization which was the original intention mysql-test/r/explain_json_none.result More records in table leads to different cost estimates, but no plan change. mysql-test/r/greedy_optimizer.result Query plans show different join order. New join order is expected since tables with lowest filter estimate comes first. (Previously, filter estimates were equal for the tables in question.) mysql-test/r/greedy_search.result One test case has a slight increase in number of partial query plans evaluated. mysql-test/r/partition_explicit_prune.result Changes in number of handler accesses. All changed numbers are lower than previously. (Neither the values before nor after match what comments in test says is exepected.) mysql-test/r/subquery_sj_all.result mysql-test/r/subquery_sj_all_bka.result mysql-test/r/subquery_sj_all_bkaunique.result @@ -3406,12 +3406,13 @@ Changes from DupsWeedout => MatScan. According to comment this test case is a regression test case for MatScan. @@ -4067,12 +4068,12 @@ According to comment in test, this test case is to reproduce an issue with MatLookup. MatLookup plan is still used in subquery_sj_mat.test. I doubt that it is possible to get MatLookup by default as long as condition filtering is used. @@ -4383,9 +4388,9 @@ Added some rows to one table to get same plan as before. @@ -6215,11 +6220,11 @@ New plan is the same plan as when the test case was pushed @@ -9675,11 +9680,11 @@ Different join order from before, but neither old nor new results have plans that are similar to original plan. mysql-test/r/subquery_sj_all_bka_nixbnl.result Changes are similar to subquery_sj_all.result except: @@ -4298,11 +4299,12 @@ @@ -4434,11 +4440,12 @@ @@ -4542,11 +4549,12 @@ @@ -4632,11 +4640,12 @@ @@ -4740,11 +4749,12 @@ DupsWeedout => MatScan. According to test comment, MatLookup is what must be avoided @@ -6224,10 +6234,10 @@ No longer DupsWeedout that covers both semi-join nests, but different algorithms for the two. This is a result of DupsWeedout no longer including more tables than necessary. (Note that the tests where BNL is allowed has FirstMatch for both.) mysql-test/r/subquery_sj_dupsweed.result mysql-test/r/subquery_sj_dupsweed_bka.result mysql-test/r/subquery_sj_dupsweed_bkaunique.result Differences from subquery_sj_all.result: @@ -3378,12 +3378,12 @@ This is the result change proves the point of this patch. New plan does no longer include nt table in the temporary table range. @@ -6190,10 +6194,10 @@ Different join order, but subquery_sj_all.test covers the original plan mysql-test/r/subquery_sj_dupsweed_bka_nixbnl.result Changes that differ from subquery_sj_dupsweed_bka.result are changes to use same plan as subquery_sj_dupsweed_bka.result mysql-test/r/subquery_sj_firstmatch.result mysql-test/r/subquery_sj_firstmatch_bka.result mysql-test/r/subquery_sj_firstmatch_bka_nixbnl.result mysql-test/r/subquery_sj_firstmatch_bkaunique.result @@ -7066,11 +7070,11 @@ @@ -7112,32 +7116,6 @@ New join order, but but subquery_sj_all.test covers the old plan @@ -6173,8 +6177,8 @@ New plan is came as query plan when test case was pushed @@ -11311,17 +11324,46 @@ Table with increased filter estimate comes later in join order mysql-test/r/subquery_sj_loosescan.result mysql-test/r/subquery_sj_loosescan_bka.result mysql-test/r/subquery_sj_loosescan_bka_nixbnl.result mysql-test/r/subquery_sj_loosescan_bkaunique.result mysql-test/r/subquery_sj_mat.result mysql-test/r/subquery_sj_mat_bka.result mysql-test/r/subquery_sj_mat_bka_nixbnl.result mysql-test/r/subquery_sj_mat_bkaunique.result mysql-test/r/subquery_sj_mat_nosj.result Results differences is only in tests that use DupsWeedout because LooseScan or Materialization is not applicable. Plans may be different from subquery_sj_dupsweedout.test since these tests have turned off DuplicateWeedout and that will effect plan pruning. mysql-test/r/view.result Tests get different join order due to different filter estimates for one table. This should be OK since this does not seem to be a regression test case.
Oystein Grovlen authoredAfter the introduction of filtering estimation for conditions (WL#6635), the expected number of rows selected from a table may be between 0 and 1. Some of our algorithms are not prepared for this. One observed effect is that semi-join DuplicateWeedout plans include tables in the duplicate-eliminating range that are not relevant to the semi-join (so-called nt tables). This happens because extending the range with a table where the estimated number of rows are less than one make the cost go down. This patch solves this issue by ensuring that filtering estimates always gives at least one row. I think there are several reasons to do this instead of trying to adapt our code to work with values below 1: - WL#6635 already put such a lower limit on the filter effect of a single condition. Hence, values lower than 1 may only happen where there are multiple conditions on a table. However, I think the same reasoning could be made in case of multiple conditions. That is, filtering estimates should assume at least 1 row regardless of number of conditions. - The assumption when computing filtering effect of multiple conditions is that conditions are non-correlated. In actual databases, this is normally not the case. Hence, the filtering estimates of multiple conditions will often be lower than what is actual the case. - I think it is more important to get an optimized query plan for the cases where the result is non-empty than for the case when no records are returned. With very low numbers, one risk that the join order has very little impact on the total cost of the query. This means that for the cases when there is actual a match, the join order may not be optimal. - It is not trivial to adjust the optimizer code to cope with values lower than 1. Note: The existing limitation on filter, "filter*fanout >= 0.05", in calculate_condition_filter() is kept. I am not convinced this limitation makes sense, but dropping it had negative effects on DBT-3 query 8. This patch does not contain a new test case. The "How-to-repeat" part of the bug report consists of an existing test case in subquery.inc, and the subquery_sj_dupsweed.test shows that an nt table is not longer included in the duplicate-producing range. (See first hunk of subquery_sj_dupsweedout.result). File comments: (In addition comes result files where all changes are changes in filter and row estimates. That is, there is no plan changes.) sql/sql_planner.cc Make sure calculate_condition_filter() never returns a number that corresponds to less than 1 row in the given table. mysql-test/include/explain_json.inc Added some rows to a table of a test case to get a plan which use materialization. According to comment in the test, this is what was intended when test case was added. mysql-test/include/subquery_sj.inc Added some rows to a table of a test case to get same plan as before. mysql-test/r/derived.result Join order has changed, but this should not be relevant for the objective of the test. mysql-test/r/explain_json_all.result More records in table leads to different cost estimates. Test case now use materialization which was the original intention mysql-test/r/explain_json_none.result More records in table leads to different cost estimates, but no plan change. mysql-test/r/greedy_optimizer.result Query plans show different join order. New join order is expected since tables with lowest filter estimate comes first. (Previously, filter estimates were equal for the tables in question.) mysql-test/r/greedy_search.result One test case has a slight increase in number of partial query plans evaluated. mysql-test/r/partition_explicit_prune.result Changes in number of handler accesses. All changed numbers are lower than previously. (Neither the values before nor after match what comments in test says is exepected.) mysql-test/r/subquery_sj_all.result mysql-test/r/subquery_sj_all_bka.result mysql-test/r/subquery_sj_all_bkaunique.result @@ -3406,12 +3406,13 @@ Changes from DupsWeedout => MatScan. According to comment this test case is a regression test case for MatScan. @@ -4067,12 +4068,12 @@ According to comment in test, this test case is to reproduce an issue with MatLookup. MatLookup plan is still used in subquery_sj_mat.test. I doubt that it is possible to get MatLookup by default as long as condition filtering is used. @@ -4383,9 +4388,9 @@ Added some rows to one table to get same plan as before. @@ -6215,11 +6220,11 @@ New plan is the same plan as when the test case was pushed @@ -9675,11 +9680,11 @@ Different join order from before, but neither old nor new results have plans that are similar to original plan. mysql-test/r/subquery_sj_all_bka_nixbnl.result Changes are similar to subquery_sj_all.result except: @@ -4298,11 +4299,12 @@ @@ -4434,11 +4440,12 @@ @@ -4542,11 +4549,12 @@ @@ -4632,11 +4640,12 @@ @@ -4740,11 +4749,12 @@ DupsWeedout => MatScan. According to test comment, MatLookup is what must be avoided @@ -6224,10 +6234,10 @@ No longer DupsWeedout that covers both semi-join nests, but different algorithms for the two. This is a result of DupsWeedout no longer including more tables than necessary. (Note that the tests where BNL is allowed has FirstMatch for both.) mysql-test/r/subquery_sj_dupsweed.result mysql-test/r/subquery_sj_dupsweed_bka.result mysql-test/r/subquery_sj_dupsweed_bkaunique.result Differences from subquery_sj_all.result: @@ -3378,12 +3378,12 @@ This is the result change proves the point of this patch. New plan does no longer include nt table in the temporary table range. @@ -6190,10 +6194,10 @@ Different join order, but subquery_sj_all.test covers the original plan mysql-test/r/subquery_sj_dupsweed_bka_nixbnl.result Changes that differ from subquery_sj_dupsweed_bka.result are changes to use same plan as subquery_sj_dupsweed_bka.result mysql-test/r/subquery_sj_firstmatch.result mysql-test/r/subquery_sj_firstmatch_bka.result mysql-test/r/subquery_sj_firstmatch_bka_nixbnl.result mysql-test/r/subquery_sj_firstmatch_bkaunique.result @@ -7066,11 +7070,11 @@ @@ -7112,32 +7116,6 @@ New join order, but but subquery_sj_all.test covers the old plan @@ -6173,8 +6177,8 @@ New plan is came as query plan when test case was pushed @@ -11311,17 +11324,46 @@ Table with increased filter estimate comes later in join order mysql-test/r/subquery_sj_loosescan.result mysql-test/r/subquery_sj_loosescan_bka.result mysql-test/r/subquery_sj_loosescan_bka_nixbnl.result mysql-test/r/subquery_sj_loosescan_bkaunique.result mysql-test/r/subquery_sj_mat.result mysql-test/r/subquery_sj_mat_bka.result mysql-test/r/subquery_sj_mat_bka_nixbnl.result mysql-test/r/subquery_sj_mat_bkaunique.result mysql-test/r/subquery_sj_mat_nosj.result Results differences is only in tests that use DupsWeedout because LooseScan or Materialization is not applicable. Plans may be different from subquery_sj_dupsweedout.test since these tests have turned off DuplicateWeedout and that will effect plan pruning. mysql-test/r/view.result Tests get different join order due to different filter estimates for one table. This should be OK since this does not seem to be a regression test case.
Loading