include/my_base.h · 124c7ab1d6f914637521fd4463a993aa73403513 · Rasoul Jahanshahi / Mysql Server

Feb 04, 2019

Bug#29240516: Count on nullable expression may return wrong result · a16d420a

Roy Lyseng authored Feb 04, 2019

Bug#28857990: Different result for COUNT with EXISTS subquery

This bugfix includes some refactoring of the function opt_sum_query(),
to clearify the logic and hence facilitate bug fixing.

Function opt_sum_query() is renamed to optimize_aggregated_query().

The result of the function is replaced with the well-known pattern
for success/error, and a four-valued enum for the optimization outcome.
The four outcomes are as follows:
-  AGGR_COMPLETE: All aggregates were evaluated
-  AGGR_REGULAR:  Aggregates not fully evaluated, regular execution required
-  AGGR_DELAYED:  Aggregates not fully evaluated, execute with ha_records()
-  AGGR_EMPTY:    Source tables empty, aggregates are NULL or 0 (for COUNT)

Extensive comments have been added.

The logic for handling optimization of aggregate functions is now
controlled by two boolean variables: aggr_impossible and aggr_delayed:
When either of these is true, regular optimization and execution is
expected afterwards, the difference is that if aggr_delayed is true,
some COUNT operations are still not evaluated, and must be evaluated
using ha_records().

count is renamed as row_count, and is now always dependent on
have_exact_count (hence no special values are needed).

outer_tables is renamed to inner_tables, since it tracks the inner tables
of an outer join.

The query block of the join being optimized is passed to the function,
avoiding to pass SELECT_LEX::leaf_tables, and making available other
properties of the query block.

We now keep track of possible optimizations for all SELECT expressions.
This avoids the final loop that might adjust select_count, and it
implicitly fixes bug#29240516 (see below). The variable aggr_impossible
is set to false initially, and whenever an expression that cannot be
evaluated is encountered, it is set to true. The exception is delayed
COUNT operations: If one is encountered, aggr_delayed is set true.
After all expressions have been analyzed, we are left with one of three
outcomes:
- If aggr_impossible is true, all expressions were not evaluated, and
  regular optimization will commence.
- Otherwise, if aggr_delayed is true, all expressions are either already
  evaluated, or will be evaluated by ha_records() call in execution phase.
- Otherwise, all expressions have been evaluated, and optimization is
  shortcut.
(This does not include the fourth possible successful outcome,
 AGGR_EMPTY, which may cause a premature exit from the function before
 all expressions are analyzed).

The problem in bug#29240516 was that a COUNT on a nullable expression
followed by a COUNT on a non-nullable expression would get the
same result as the latter (ie the number of rows in the source tables).
The reason for this was that the first COUNT operation let const_result
become false, because it would need a table scan, but the next COUNT
would override by setting the output argument select_count to true.
There is a later loop over all fields to cancel out this, but it only
deals with MIN and MAX operations.

Now, the flag aggr_impossible is set when the first COUNT is encountered,
and this prevents the ha_records() count optimization from being used,
and the optimization outcome is set to AGGR_REGULAR.

Also re-recorded a wrong result in main.temptable_basic.

The problem in bug#28857990 was that the COUNT on a semi-join would be
the product of the row count of the input tables, as if the tables
were inner-joined. However, the semi-joined tables would not contribute
additional rows to the total count. The problem is eliminated by
disabling COUNT optimization for query blocks containing semi-join.

Reviewed by: Chaithra Gopalareddy <chaithra.gopalareddy@oracle.com>

a16d420a

Bug#29240516: Count on nullable expression may return wrong result

Roy Lyseng authored Feb 04, 2019

Bug#28857990: Different result for COUNT with EXISTS subquery

This bugfix includes some refactoring of the function opt_sum_query(),
to clearify the logic and hence facilitate bug fixing.

Function opt_sum_query() is renamed to optimize_aggregated_query().

The result of the function is replaced with the well-known pattern
for success/error, and a four-valued enum for the optimization outcome.
The four outcomes are as follows:
-  AGGR_COMPLETE: All aggregates were evaluated
-  AGGR_REGULAR:  Aggregates not fully evaluated, regular execution required
-  AGGR_DELAYED:  Aggregates not fully evaluated, execute with ha_records()
-  AGGR_EMPTY:    Source tables empty, aggregates are NULL or 0 (for COUNT)

Extensive comments have been added.

The logic for handling optimization of aggregate functions is now
controlled by two boolean variables: aggr_impossible and aggr_delayed:
When either of these is true, regular optimization and execution is
expected afterwards, the difference is that if aggr_delayed is true,
some COUNT operations are still not evaluated, and must be evaluated
using ha_records().

count is renamed as row_count, and is now always dependent on
have_exact_count (hence no special values are needed).

outer_tables is renamed to inner_tables, since it tracks the inner tables
of an outer join.

The query block of the join being optimized is passed to the function,
avoiding to pass SELECT_LEX::leaf_tables, and making available other
properties of the query block.

We now keep track of possible optimizations for all SELECT expressions.
This avoids the final loop that might adjust select_count, and it
implicitly fixes bug#29240516 (see below). The variable aggr_impossible
is set to false initially, and whenever an expression that cannot be
evaluated is encountered, it is set to true. The exception is delayed
COUNT operations: If one is encountered, aggr_delayed is set true.
After all expressions have been analyzed, we are left with one of three
outcomes:
- If aggr_impossible is true, all expressions were not evaluated, and
  regular optimization will commence.
- Otherwise, if aggr_delayed is true, all expressions are either already
  evaluated, or will be evaluated by ha_records() call in execution phase.
- Otherwise, all expressions have been evaluated, and optimization is
  shortcut.
(This does not include the fourth possible successful outcome,
 AGGR_EMPTY, which may cause a premature exit from the function before
 all expressions are analyzed).

The problem in bug#29240516 was that a COUNT on a nullable expression
followed by a COUNT on a non-nullable expression would get the
same result as the latter (ie the number of rows in the source tables).
The reason for this was that the first COUNT operation let const_result
become false, because it would need a table scan, but the next COUNT
would override by setting the output argument select_count to true.
There is a later loop over all fields to cancel out this, but it only
deals with MIN and MAX operations.

Now, the flag aggr_impossible is set when the first COUNT is encountered,
and this prevents the ha_records() count optimization from being used,
and the optimization outcome is set to AGGR_REGULAR.

Also re-recorded a wrong result in main.temptable_basic.

The problem in bug#28857990 was that the COUNT on a semi-join would be
the product of the row count of the input tables, as if the tables
were inner-joined. However, the semi-joined tables would not contribute
additional rows to the total count. The problem is eliminated by
disabling COUNT optimization for query blocks containing semi-join.

Reviewed by: Chaithra Gopalareddy <chaithra.gopalareddy@oracle.com>