Skip to content
  • Roy Lyseng's avatar
    a16d420a
    Bug#29240516: Count on nullable expression may return wrong result · a16d420a
    Roy Lyseng authored
    Bug#28857990: Different result for COUNT with EXISTS subquery
    
    This bugfix includes some refactoring of the function opt_sum_query(),
    to clearify the logic and hence facilitate bug fixing.
    
    Function opt_sum_query() is renamed to optimize_aggregated_query().
    
    The result of the function is replaced with the well-known pattern
    for success/error, and a four-valued enum for the optimization outcome.
    The four outcomes are as follows:
    -  AGGR_COMPLETE: All aggregates were evaluated
    -  AGGR_REGULAR:  Aggregates not fully evaluated, regular execution required
    -  AGGR_DELAYED:  Aggregates not fully evaluated, execute with ha_records()
    -  AGGR_EMPTY:    Source tables empty, aggregates are NULL or 0 (for COUNT)
    
    Extensive comments have been added.
    
    The logic for handling optimization of aggregate functions is now
    controlled by two boolean variables: aggr_impossible and aggr_delayed:
    When either of these is true, regular optimization and execution is
    expected afterwards, the difference is that if aggr_delayed is true,
    some COUNT operations are still not evaluated, and must be evaluated
    using ha_records().
    
    count is renamed as row_count, and is now always dependent on
    have_exact_count (hence no special values are needed).
    
    outer_tables is renamed to inner_tables, since it tracks the inner tables
    of an outer join.
    
    The query block of the join being optimized is passed to the function,
    avoiding to pass SELECT_LEX::leaf_tables, and making available other
    properties of the query block.
    
    We now keep track of possible optimizations for all SELECT expressions.
    This avoids the final loop that might adjust select_count, and it
    implicitly fixes bug#29240516 (see below). The variable aggr_impossible
    is set to false initially, and whenever an expression that cannot be
    evaluated is encountered, it is set to true. The exception is delayed
    COUNT operations: If one is encountered, aggr_delayed is set true.
    After all expressions have been analyzed, we are left with one of three
    outcomes:
    - If aggr_impossible is true, all expressions were not evaluated, and
      regular optimization will commence.
    - Otherwise, if aggr_delayed is true, all expressions are either already
      evaluated, or will be evaluated by ha_records() call in execution phase.
    - Otherwise, all expressions have been evaluated, and optimization is
      shortcut.
    (This does not include the fourth possible successful outcome,
     AGGR_EMPTY, which may cause a premature exit from the function before
     all expressions are analyzed).
    
    The problem in bug#29240516 was that a COUNT on a nullable expression
    followed by a COUNT on a non-nullable expression would get the
    same result as the latter (ie the number of rows in the source tables).
    The reason for this was that the first COUNT operation let const_result
    become false, because it would need a table scan, but the next COUNT
    would override by setting the output argument select_count to true.
    There is a later loop over all fields to cancel out this, but it only
    deals with MIN and MAX operations.
    
    Now, the flag aggr_impossible is set when the first COUNT is encountered,
    and this prevents the ha_records() count optimization from being used,
    and the optimization outcome is set to AGGR_REGULAR.
    
    Also re-recorded a wrong result in main.temptable_basic.
    
    The problem in bug#28857990 was that the COUNT on a semi-join would be
    the product of the row count of the input tables, as if the tables
    were inner-joined. However, the semi-joined tables would not contribute
    additional rows to the total count. The problem is eliminated by
    disabling COUNT optimization for query blocks containing semi-join.
    
    Reviewed by: Chaithra Gopalareddy <chaithra.gopalareddy@oracle.com>
    a16d420a
    Bug#29240516: Count on nullable expression may return wrong result
    Roy Lyseng authored
    Bug#28857990: Different result for COUNT with EXISTS subquery
    
    This bugfix includes some refactoring of the function opt_sum_query(),
    to clearify the logic and hence facilitate bug fixing.
    
    Function opt_sum_query() is renamed to optimize_aggregated_query().
    
    The result of the function is replaced with the well-known pattern
    for success/error, and a four-valued enum for the optimization outcome.
    The four outcomes are as follows:
    -  AGGR_COMPLETE: All aggregates were evaluated
    -  AGGR_REGULAR:  Aggregates not fully evaluated, regular execution required
    -  AGGR_DELAYED:  Aggregates not fully evaluated, execute with ha_records()
    -  AGGR_EMPTY:    Source tables empty, aggregates are NULL or 0 (for COUNT)
    
    Extensive comments have been added.
    
    The logic for handling optimization of aggregate functions is now
    controlled by two boolean variables: aggr_impossible and aggr_delayed:
    When either of these is true, regular optimization and execution is
    expected afterwards, the difference is that if aggr_delayed is true,
    some COUNT operations are still not evaluated, and must be evaluated
    using ha_records().
    
    count is renamed as row_count, and is now always dependent on
    have_exact_count (hence no special values are needed).
    
    outer_tables is renamed to inner_tables, since it tracks the inner tables
    of an outer join.
    
    The query block of the join being optimized is passed to the function,
    avoiding to pass SELECT_LEX::leaf_tables, and making available other
    properties of the query block.
    
    We now keep track of possible optimizations for all SELECT expressions.
    This avoids the final loop that might adjust select_count, and it
    implicitly fixes bug#29240516 (see below). The variable aggr_impossible
    is set to false initially, and whenever an expression that cannot be
    evaluated is encountered, it is set to true. The exception is delayed
    COUNT operations: If one is encountered, aggr_delayed is set true.
    After all expressions have been analyzed, we are left with one of three
    outcomes:
    - If aggr_impossible is true, all expressions were not evaluated, and
      regular optimization will commence.
    - Otherwise, if aggr_delayed is true, all expressions are either already
      evaluated, or will be evaluated by ha_records() call in execution phase.
    - Otherwise, all expressions have been evaluated, and optimization is
      shortcut.
    (This does not include the fourth possible successful outcome,
     AGGR_EMPTY, which may cause a premature exit from the function before
     all expressions are analyzed).
    
    The problem in bug#29240516 was that a COUNT on a nullable expression
    followed by a COUNT on a non-nullable expression would get the
    same result as the latter (ie the number of rows in the source tables).
    The reason for this was that the first COUNT operation let const_result
    become false, because it would need a table scan, but the next COUNT
    would override by setting the output argument select_count to true.
    There is a later loop over all fields to cancel out this, but it only
    deals with MIN and MAX operations.
    
    Now, the flag aggr_impossible is set when the first COUNT is encountered,
    and this prevents the ha_records() count optimization from being used,
    and the optimization outcome is set to AGGR_REGULAR.
    
    Also re-recorded a wrong result in main.temptable_basic.
    
    The problem in bug#28857990 was that the COUNT on a semi-join would be
    the product of the row count of the input tables, as if the tables
    were inner-joined. However, the semi-joined tables would not contribute
    additional rows to the total count. The problem is eliminated by
    disabling COUNT optimization for query blocks containing semi-join.
    
    Reviewed by: Chaithra Gopalareddy <chaithra.gopalareddy@oracle.com>
Loading