Skip to content
  • Sreeharsha Ramanavarapu's avatar
    85a15390
    Bug #23049975: GROUP BY SENDING DIFFERENT RESULTS BETWEEN · 85a15390
    Sreeharsha Ramanavarapu authored
                   5.6 AND 5.7
    
    ISSUE:
    ------
    This issue occurs in the following scenario:
    1) A subquery in the SELECT list of a query.
    2) The subquery contains a where condition which
       references a value in the outer query.
    3) The outer query contains a group by which requires the
       creation of a temporary table.
    
    An example query is:
    SELECT (SELECT t3.t3_val
            FROM t3
            WHERE t3.t3_t1_val = t1.t1_id) as CMS,
           t2.t2_val, t1.t1_id
    FROM t1 INNER JOIN t2 ON t1.join_val = t2.join_val
    WHERE t1.t1_id = 1 GROUP BY t2.t2_val;
    
    
    The following is the common behavior in 5.6 and trunk:
    a) A temp table is created with a field for t1_id. This
       field is supposed to refer to the outer query's value
       of t1.t1_id and copy it when there is a new value.
    b) The temp table's t1.t1_id is in-turn used to copy it's
       value into the field of t3.t3_t1_val. This will check
       for the value in the index on t3_t1_val.
    
    Steps followed in 5.6:
    1) QEP_tmp_table::put_record copies the value from t1.t1_id
       into the temporary table's t1_id.
    2) While evaluating the inner query, the
       "subselect_single_select_engine::exec" also contains
       optimize phase of the join. This will copy the value
       from the temporary table to the field t3.t3_t1_val.
    3) The value copied into t3_t1_val will be used to check the
       values in the index.
    
    Steps followed in 5.7:
    1) Optimization of inner subquery occurs before the execute
       phase. Value from the temp table is attempted to be
       copied into t3_t1_val. Since there is no value in the
       temp table at this stage, we end up copying a null value.
    2) QEP_tmp_table::put_record copies the value from t1.t1_id
       into the temporary table's t1_id.
    3) Unlike in 5.6, "subselect_single_select_engine::exec"
       doesn't want to optimize since that step has already
       happend.
    4) Value in t3_t1_val (which is null) is used to search the
       index and this fails.
    
    
    Notice that the steps 1 and 2 have been flipped in 5.6 and
    5.7. Removing the optimize phase from
    "subselect_single_select_engine::exec" has introduced this
    problem.
    
    This was changed as part of "WL#6369: EXPLAIN for other
    thread".
    
    SOLUTION:
    ---------
    The problem is when generating the ref-key for the subquery
    evaluation. The source for the ref-key is an outer reference
    to table t1, which is a const value, thus the value is
    available in the row buffer for t1. However, the copy
    function looks in the result buffer, which has not been
    populated yet since this occurs during optimization phase.
    
    Currently the value used by t3_t1_val is considered as a
    constant. Replace result_field with field in the Item_field
    representing the ref-key source. It makes the corresponding
    temp_table field redundant.
    85a15390
    Bug #23049975: GROUP BY SENDING DIFFERENT RESULTS BETWEEN
    Sreeharsha Ramanavarapu authored
                   5.6 AND 5.7
    
    ISSUE:
    ------
    This issue occurs in the following scenario:
    1) A subquery in the SELECT list of a query.
    2) The subquery contains a where condition which
       references a value in the outer query.
    3) The outer query contains a group by which requires the
       creation of a temporary table.
    
    An example query is:
    SELECT (SELECT t3.t3_val
            FROM t3
            WHERE t3.t3_t1_val = t1.t1_id) as CMS,
           t2.t2_val, t1.t1_id
    FROM t1 INNER JOIN t2 ON t1.join_val = t2.join_val
    WHERE t1.t1_id = 1 GROUP BY t2.t2_val;
    
    
    The following is the common behavior in 5.6 and trunk:
    a) A temp table is created with a field for t1_id. This
       field is supposed to refer to the outer query's value
       of t1.t1_id and copy it when there is a new value.
    b) The temp table's t1.t1_id is in-turn used to copy it's
       value into the field of t3.t3_t1_val. This will check
       for the value in the index on t3_t1_val.
    
    Steps followed in 5.6:
    1) QEP_tmp_table::put_record copies the value from t1.t1_id
       into the temporary table's t1_id.
    2) While evaluating the inner query, the
       "subselect_single_select_engine::exec" also contains
       optimize phase of the join. This will copy the value
       from the temporary table to the field t3.t3_t1_val.
    3) The value copied into t3_t1_val will be used to check the
       values in the index.
    
    Steps followed in 5.7:
    1) Optimization of inner subquery occurs before the execute
       phase. Value from the temp table is attempted to be
       copied into t3_t1_val. Since there is no value in the
       temp table at this stage, we end up copying a null value.
    2) QEP_tmp_table::put_record copies the value from t1.t1_id
       into the temporary table's t1_id.
    3) Unlike in 5.6, "subselect_single_select_engine::exec"
       doesn't want to optimize since that step has already
       happend.
    4) Value in t3_t1_val (which is null) is used to search the
       index and this fails.
    
    
    Notice that the steps 1 and 2 have been flipped in 5.6 and
    5.7. Removing the optimize phase from
    "subselect_single_select_engine::exec" has introduced this
    problem.
    
    This was changed as part of "WL#6369: EXPLAIN for other
    thread".
    
    SOLUTION:
    ---------
    The problem is when generating the ref-key for the subquery
    evaluation. The source for the ref-key is an outer reference
    to table t1, which is a const value, thus the value is
    available in the row buffer for t1. However, the copy
    function looks in the result buffer, which has not been
    populated yet since this occurs during optimization phase.
    
    Currently the value used by t3_t1_val is considered as a
    constant. Replace result_field with field in the Item_field
    representing the ref-key source. It makes the corresponding
    temp_table field redundant.
Loading