-
Roy Lyseng authored
Bug#28857990: Different result for COUNT with EXISTS subquery This bugfix includes some refactoring of the function opt_sum_query(), to clearify the logic and hence facilitate bug fixing. Function opt_sum_query() is renamed to optimize_aggregated_query(). The result of the function is replaced with the well-known pattern for success/error, and a four-valued enum for the optimization outcome. The four outcomes are as follows: - AGGR_COMPLETE: All aggregates were evaluated - AGGR_REGULAR: Aggregates not fully evaluated, regular execution required - AGGR_DELAYED: Aggregates not fully evaluated, execute with ha_records() - AGGR_EMPTY: Source tables empty, aggregates are NULL or 0 (for COUNT) Extensive comments have been added. The logic for handling optimization of aggregate functions is now controlled by two boolean variables: aggr_impossible and aggr_delayed: When either of these is true, regular optimization and execution is expected afterwards, the difference is that if aggr_delayed is true, some COUNT operations are still not evaluated, and must be evaluated using ha_records(). count is renamed as row_count, and is now always dependent on have_exact_count (hence no special values are needed). outer_tables is renamed to inner_tables, since it tracks the inner tables of an outer join. The query block of the join being optimized is passed to the function, avoiding to pass SELECT_LEX::leaf_tables, and making available other properties of the query block. We now keep track of possible optimizations for all SELECT expressions. This avoids the final loop that might adjust select_count, and it implicitly fixes bug#29240516 (see below). The variable aggr_impossible is set to false initially, and whenever an expression that cannot be evaluated is encountered, it is set to true. The exception is delayed COUNT operations: If one is encountered, aggr_delayed is set true. After all expressions have been analyzed, we are left with one of three outcomes: - If aggr_impossible is true, all expressions were not evaluated, and regular optimization will commence. - Otherwise, if aggr_delayed is true, all expressions are either already evaluated, or will be evaluated by ha_records() call in execution phase. - Otherwise, all expressions have been evaluated, and optimization is shortcut. (This does not include the fourth possible successful outcome, AGGR_EMPTY, which may cause a premature exit from the function before all expressions are analyzed). The problem in bug#29240516 was that a COUNT on a nullable expression followed by a COUNT on a non-nullable expression would get the same result as the latter (ie the number of rows in the source tables). The reason for this was that the first COUNT operation let const_result become false, because it would need a table scan, but the next COUNT would override by setting the output argument select_count to true. There is a later loop over all fields to cancel out this, but it only deals with MIN and MAX operations. Now, the flag aggr_impossible is set when the first COUNT is encountered, and this prevents the ha_records() count optimization from being used, and the optimization outcome is set to AGGR_REGULAR. Also re-recorded a wrong result in main.temptable_basic. The problem in bug#28857990 was that the COUNT on a semi-join would be the product of the row count of the input tables, as if the tables were inner-joined. However, the semi-joined tables would not contribute additional rows to the total count. The problem is eliminated by disabling COUNT optimization for query blocks containing semi-join. Reviewed by: Chaithra Gopalareddy <chaithra.gopalareddy@oracle.com>
Roy Lyseng authoredBug#28857990: Different result for COUNT with EXISTS subquery This bugfix includes some refactoring of the function opt_sum_query(), to clearify the logic and hence facilitate bug fixing. Function opt_sum_query() is renamed to optimize_aggregated_query(). The result of the function is replaced with the well-known pattern for success/error, and a four-valued enum for the optimization outcome. The four outcomes are as follows: - AGGR_COMPLETE: All aggregates were evaluated - AGGR_REGULAR: Aggregates not fully evaluated, regular execution required - AGGR_DELAYED: Aggregates not fully evaluated, execute with ha_records() - AGGR_EMPTY: Source tables empty, aggregates are NULL or 0 (for COUNT) Extensive comments have been added. The logic for handling optimization of aggregate functions is now controlled by two boolean variables: aggr_impossible and aggr_delayed: When either of these is true, regular optimization and execution is expected afterwards, the difference is that if aggr_delayed is true, some COUNT operations are still not evaluated, and must be evaluated using ha_records(). count is renamed as row_count, and is now always dependent on have_exact_count (hence no special values are needed). outer_tables is renamed to inner_tables, since it tracks the inner tables of an outer join. The query block of the join being optimized is passed to the function, avoiding to pass SELECT_LEX::leaf_tables, and making available other properties of the query block. We now keep track of possible optimizations for all SELECT expressions. This avoids the final loop that might adjust select_count, and it implicitly fixes bug#29240516 (see below). The variable aggr_impossible is set to false initially, and whenever an expression that cannot be evaluated is encountered, it is set to true. The exception is delayed COUNT operations: If one is encountered, aggr_delayed is set true. After all expressions have been analyzed, we are left with one of three outcomes: - If aggr_impossible is true, all expressions were not evaluated, and regular optimization will commence. - Otherwise, if aggr_delayed is true, all expressions are either already evaluated, or will be evaluated by ha_records() call in execution phase. - Otherwise, all expressions have been evaluated, and optimization is shortcut. (This does not include the fourth possible successful outcome, AGGR_EMPTY, which may cause a premature exit from the function before all expressions are analyzed). The problem in bug#29240516 was that a COUNT on a nullable expression followed by a COUNT on a non-nullable expression would get the same result as the latter (ie the number of rows in the source tables). The reason for this was that the first COUNT operation let const_result become false, because it would need a table scan, but the next COUNT would override by setting the output argument select_count to true. There is a later loop over all fields to cancel out this, but it only deals with MIN and MAX operations. Now, the flag aggr_impossible is set when the first COUNT is encountered, and this prevents the ha_records() count optimization from being used, and the optimization outcome is set to AGGR_REGULAR. Also re-recorded a wrong result in main.temptable_basic. The problem in bug#28857990 was that the COUNT on a semi-join would be the product of the row count of the input tables, as if the tables were inner-joined. However, the semi-joined tables would not contribute additional rows to the total count. The problem is eliminated by disabling COUNT optimization for query blocks containing semi-join. Reviewed by: Chaithra Gopalareddy <chaithra.gopalareddy@oracle.com>
Loading