mysql-test/r/heap_hash.result · mysql-cluster-7.6.7 · Rasoul Jahanshahi / Mysql Server

Sep 11, 2015

Bug#20701585 SEMI-JOIN DUPLICATE ELIMINATION GIVES LOWER COST THAN NO DUPLICATE ELIMINATION · 4f3b53f5

Oystein Grovlen authored Sep 11, 2015

After the introduction of filtering estimation for conditions
(WL#6635), the expected number of rows selected from a table may be
between 0 and 1.  Some of our algorithms are not prepared for this.
One observed effect is that semi-join DuplicateWeedout plans include
tables in the duplicate-eliminating range that are not relevant to the
semi-join (so-called nt tables).  This happens because extending the
range with a table where the estimated number of rows are less than
one make the cost go down.

This patch solves this issue by ensuring that filtering estimates
always gives at least one row.  I think there are several reasons to
do this instead of trying to adapt our code to work with values below
1:

 - WL#6635 already put such a lower limit on the filter effect of a
   single condition.  Hence, values lower than 1 may only happen where
   there are multiple conditions on a table.  However, I think the
   same reasoning could be made in case of multiple conditions.  That
   is, filtering estimates should assume at least 1 row regardless of
   number of conditions.

 - The assumption when computing filtering effect of multiple
   conditions is that conditions are non-correlated.  In actual
   databases, this is normally not the case.  Hence, the filtering
   estimates of multiple conditions will often be lower than what is
   actual the case.

 - I think it is more important to get an optimized query plan for the
   cases where the result is non-empty than for the case when no
   records are returned.  With very low numbers, one risk that the
   join order has very little impact on the total cost of the query.
   This means that for the cases when there is actual a match, the
   join order may not be optimal.

 - It is not trivial to adjust the optimizer code to cope with values
   lower than 1.

Note: The existing limitation on filter, "filter*fanout >= 0.05", in
calculate_condition_filter() is kept.  I am not convinced this
limitation makes sense, but dropping it had negative effects on DBT-3
query 8.

This patch does not contain a new test case.  The "How-to-repeat" part
of the bug report consists of an existing test case in subquery.inc,
and the subquery_sj_dupsweed.test shows that an nt table is not longer
included in the duplicate-producing range.  (See first hunk of
subquery_sj_dupsweedout.result).

File comments:
(In addition comes result files where all changes are changes in
filter and row estimates.  That is, there is no plan changes.)

sql/sql_planner.cc
  Make sure calculate_condition_filter() never returns a number that
  corresponds to less than 1 row in the given table.

mysql-test/include/explain_json.inc
  Added some rows to a table of a test case to get a plan which use
  materialization.  According to comment in the test, this is what was
  intended when test case was added.

mysql-test/include/subquery_sj.inc
  Added some rows to a table of a test case to get same plan as before.

mysql-test/r/derived.result
  Join order has changed, but this should not be relevant for the
  objective of the test.

mysql-test/r/explain_json_all.result
  More records in table leads to different cost estimates.
  Test case now use materialization which was the original intention

mysql-test/r/explain_json_none.result
  More records in table leads to different cost estimates, but no plan change.

mysql-test/r/greedy_optimizer.result
  Query plans show different join order.  New join order is expected
  since tables with lowest filter estimate comes first.  (Previously,
  filter estimates were equal for the tables in question.)

mysql-test/r/greedy_search.result
  One test case has a slight increase in number of partial query plans
  evaluated.

mysql-test/r/partition_explicit_prune.result
  Changes in number of handler accesses.  All changed numbers are
  lower than previously.  (Neither the values before nor after match
  what comments in test says is exepected.)

mysql-test/r/subquery_sj_all.result
mysql-test/r/subquery_sj_all_bka.result
mysql-test/r/subquery_sj_all_bkaunique.result
  @@ -3406,12 +3406,13 @@
    Changes from DupsWeedout => MatScan.  According to comment this
    test case is a regression test case for MatScan.
  @@ -4067,12 +4068,12 @@
    According to comment in test, this test case is to reproduce an
    issue with MatLookup.  MatLookup plan is still used in
    subquery_sj_mat.test.  I doubt that it is possible to get MatLookup by
    default as long as condition filtering is used.
  @@ -4383,9 +4388,9 @@
    Added some rows to one table to get same plan as before.
  @@ -6215,11 +6220,11 @@
    New plan is the same plan as when the test case was pushed
  @@ -9675,11 +9680,11 @@
    Different join order from before, but neither old nor new
    results have plans that are similar to original plan.

mysql-test/r/subquery_sj_all_bka_nixbnl.result
  Changes are similar to subquery_sj_all.result except:
  @@ -4298,11 +4299,12 @@
  @@ -4434,11 +4440,12 @@
  @@ -4542,11 +4549,12 @@
  @@ -4632,11 +4640,12 @@
  @@ -4740,11 +4749,12 @@
    DupsWeedout => MatScan. According to test comment, MatLookup is
    what must be avoided
  @@ -6224,10 +6234,10 @@
    No longer DupsWeedout that covers both semi-join nests, but
    different algorithms for the two. This is a result of DupsWeedout
    no longer including more tables than necessary.  (Note that the
    tests where BNL is allowed has FirstMatch for both.)

mysql-test/r/subquery_sj_dupsweed.result
mysql-test/r/subquery_sj_dupsweed_bka.result
mysql-test/r/subquery_sj_dupsweed_bkaunique.result
  Differences from subquery_sj_all.result:
  @@ -3378,12 +3378,12 @@
    This is the result change proves the point of this patch.  New
    plan does no longer include nt table in the temporary table range.
  @@ -6190,10 +6194,10 @@
    Different join order, but subquery_sj_all.test covers the original plan

mysql-test/r/subquery_sj_dupsweed_bka_nixbnl.result
  Changes that differ from subquery_sj_dupsweed_bka.result are changes
  to use same plan as subquery_sj_dupsweed_bka.result

mysql-test/r/subquery_sj_firstmatch.result
mysql-test/r/subquery_sj_firstmatch_bka.result
mysql-test/r/subquery_sj_firstmatch_bka_nixbnl.result
mysql-test/r/subquery_sj_firstmatch_bkaunique.result
  @@ -7066,11 +7070,11 @@
  @@ -7112,32 +7116,6 @@
    New join order, but but subquery_sj_all.test covers the old plan
  @@ -6173,8 +6177,8 @@
    New plan is came as query plan when test case was pushed
  @@ -11311,17 +11324,46 @@
    Table with increased filter estimate comes later in join order

mysql-test/r/subquery_sj_loosescan.result
mysql-test/r/subquery_sj_loosescan_bka.result
mysql-test/r/subquery_sj_loosescan_bka_nixbnl.result
mysql-test/r/subquery_sj_loosescan_bkaunique.result
mysql-test/r/subquery_sj_mat.result
mysql-test/r/subquery_sj_mat_bka.result
mysql-test/r/subquery_sj_mat_bka_nixbnl.result
mysql-test/r/subquery_sj_mat_bkaunique.result
mysql-test/r/subquery_sj_mat_nosj.result
  Results differences is only in tests that use DupsWeedout because
  LooseScan or Materialization is not applicable.  Plans may be
  different from subquery_sj_dupsweedout.test since these tests have
  turned off DuplicateWeedout and that will effect plan pruning.

mysql-test/r/view.result

  Tests get different join order due to different filter estimates for
  one table.  This should be OK since this does not seem to be a
  regression test case.

4f3b53f5

Bug#20701585 SEMI-JOIN DUPLICATE ELIMINATION GIVES LOWER COST THAN NO DUPLICATE ELIMINATION

Oystein Grovlen authored Sep 11, 2015

After the introduction of filtering estimation for conditions
(WL#6635), the expected number of rows selected from a table may be
between 0 and 1.  Some of our algorithms are not prepared for this.
One observed effect is that semi-join DuplicateWeedout plans include
tables in the duplicate-eliminating range that are not relevant to the
semi-join (so-called nt tables).  This happens because extending the
range with a table where the estimated number of rows are less than
one make the cost go down.

This patch solves this issue by ensuring that filtering estimates
always gives at least one row.  I think there are several reasons to
do this instead of trying to adapt our code to work with values below
1:

 - WL#6635 already put such a lower limit on the filter effect of a
   single condition.  Hence, values lower than 1 may only happen where
   there are multiple conditions on a table.  However, I think the
   same reasoning could be made in case of multiple conditions.  That
   is, filtering estimates should assume at least 1 row regardless of
   number of conditions.

 - The assumption when computing filtering effect of multiple
   conditions is that conditions are non-correlated.  In actual
   databases, this is normally not the case.  Hence, the filtering
   estimates of multiple conditions will often be lower than what is
   actual the case.

 - I think it is more important to get an optimized query plan for the
   cases where the result is non-empty than for the case when no
   records are returned.  With very low numbers, one risk that the
   join order has very little impact on the total cost of the query.
   This means that for the cases when there is actual a match, the
   join order may not be optimal.

 - It is not trivial to adjust the optimizer code to cope with values
   lower than 1.

Note: The existing limitation on filter, "filter*fanout >= 0.05", in
calculate_condition_filter() is kept.  I am not convinced this
limitation makes sense, but dropping it had negative effects on DBT-3
query 8.

This patch does not contain a new test case.  The "How-to-repeat" part
of the bug report consists of an existing test case in subquery.inc,
and the subquery_sj_dupsweed.test shows that an nt table is not longer
included in the duplicate-producing range.  (See first hunk of
subquery_sj_dupsweedout.result).

File comments:
(In addition comes result files where all changes are changes in
filter and row estimates.  That is, there is no plan changes.)

sql/sql_planner.cc
  Make sure calculate_condition_filter() never returns a number that
  corresponds to less than 1 row in the given table.

mysql-test/include/explain_json.inc
  Added some rows to a table of a test case to get a plan which use
  materialization.  According to comment in the test, this is what was
  intended when test case was added.

mysql-test/include/subquery_sj.inc
  Added some rows to a table of a test case to get same plan as before.

mysql-test/r/derived.result
  Join order has changed, but this should not be relevant for the
  objective of the test.

mysql-test/r/explain_json_all.result
  More records in table leads to different cost estimates.
  Test case now use materialization which was the original intention

mysql-test/r/explain_json_none.result
  More records in table leads to different cost estimates, but no plan change.

mysql-test/r/greedy_optimizer.result
  Query plans show different join order.  New join order is expected
  since tables with lowest filter estimate comes first.  (Previously,
  filter estimates were equal for the tables in question.)

mysql-test/r/greedy_search.result
  One test case has a slight increase in number of partial query plans
  evaluated.

mysql-test/r/partition_explicit_prune.result
  Changes in number of handler accesses.  All changed numbers are
  lower than previously.  (Neither the values before nor after match
  what comments in test says is exepected.)

mysql-test/r/subquery_sj_all.result
mysql-test/r/subquery_sj_all_bka.result
mysql-test/r/subquery_sj_all_bkaunique.result
  @@ -3406,12 +3406,13 @@
    Changes from DupsWeedout => MatScan.  According to comment this
    test case is a regression test case for MatScan.
  @@ -4067,12 +4068,12 @@
    According to comment in test, this test case is to reproduce an
    issue with MatLookup.  MatLookup plan is still used in
    subquery_sj_mat.test.  I doubt that it is possible to get MatLookup by
    default as long as condition filtering is used.
  @@ -4383,9 +4388,9 @@
    Added some rows to one table to get same plan as before.
  @@ -6215,11 +6220,11 @@
    New plan is the same plan as when the test case was pushed
  @@ -9675,11 +9680,11 @@
    Different join order from before, but neither old nor new
    results have plans that are similar to original plan.

mysql-test/r/subquery_sj_all_bka_nixbnl.result
  Changes are similar to subquery_sj_all.result except:
  @@ -4298,11 +4299,12 @@
  @@ -4434,11 +4440,12 @@
  @@ -4542,11 +4549,12 @@
  @@ -4632,11 +4640,12 @@
  @@ -4740,11 +4749,12 @@
    DupsWeedout => MatScan. According to test comment, MatLookup is
    what must be avoided
  @@ -6224,10 +6234,10 @@
    No longer DupsWeedout that covers both semi-join nests, but
    different algorithms for the two. This is a result of DupsWeedout
    no longer including more tables than necessary.  (Note that the
    tests where BNL is allowed has FirstMatch for both.)

mysql-test/r/subquery_sj_dupsweed.result
mysql-test/r/subquery_sj_dupsweed_bka.result
mysql-test/r/subquery_sj_dupsweed_bkaunique.result
  Differences from subquery_sj_all.result:
  @@ -3378,12 +3378,12 @@
    This is the result change proves the point of this patch.  New
    plan does no longer include nt table in the temporary table range.
  @@ -6190,10 +6194,10 @@
    Different join order, but subquery_sj_all.test covers the original plan

mysql-test/r/subquery_sj_dupsweed_bka_nixbnl.result
  Changes that differ from subquery_sj_dupsweed_bka.result are changes
  to use same plan as subquery_sj_dupsweed_bka.result

mysql-test/r/subquery_sj_firstmatch.result
mysql-test/r/subquery_sj_firstmatch_bka.result
mysql-test/r/subquery_sj_firstmatch_bka_nixbnl.result
mysql-test/r/subquery_sj_firstmatch_bkaunique.result
  @@ -7066,11 +7070,11 @@
  @@ -7112,32 +7116,6 @@
    New join order, but but subquery_sj_all.test covers the old plan
  @@ -6173,8 +6177,8 @@
    New plan is came as query plan when test case was pushed
  @@ -11311,17 +11324,46 @@
    Table with increased filter estimate comes later in join order

mysql-test/r/subquery_sj_loosescan.result
mysql-test/r/subquery_sj_loosescan_bka.result
mysql-test/r/subquery_sj_loosescan_bka_nixbnl.result
mysql-test/r/subquery_sj_loosescan_bkaunique.result
mysql-test/r/subquery_sj_mat.result
mysql-test/r/subquery_sj_mat_bka.result
mysql-test/r/subquery_sj_mat_bka_nixbnl.result
mysql-test/r/subquery_sj_mat_bkaunique.result
mysql-test/r/subquery_sj_mat_nosj.result
  Results differences is only in tests that use DupsWeedout because
  LooseScan or Materialization is not applicable.  Plans may be
  different from subquery_sj_dupsweedout.test since these tests have
  turned off DuplicateWeedout and that will effect plan pruning.

mysql-test/r/view.result

  Tests get different join order due to different filter estimates for
  one table.  This should be OK since this does not seem to be a
  regression test case.