Skip to content
  • Knut Anders Hatlen's avatar
    985e788e
    Bug#22089623: ASSERTION IN ITEM_FUNC_TRIG_COND::VAL_INT() WITH SUBQUERY · 985e788e
    Knut Anders Hatlen authored
                  ON LEFT SIDE OF IN
    
    Problem:
    
    If the left expression of an IN expression is a row subquery that does
    not access any tables, an assertion could be raised in debug builds,
    and wrong results could be returned in release builds.
    
    The assert failure was introduced by the fix for bug#20729351, but the
    wrong results were seen even before that fix.
    
    Analysis:
    
    The root cause for these problems is an inconsistency in how
    nullability is represented. For Item_row, the maybe_null flag tells if
    any of the columns in the row could be NULL. For scalar subqueries and
    row subqueries represented by Item_singlerow_subselect, maybe_null has
    this meaning:
    
    - If the subquery is scalar, maybe_null == true means that the
      returned column could be NULL, or that the subquery could return an
      empty result (which would make the scalar subquery evaluate to
      NULL). This is consistent with how maybe_null is used in Item_row.
    
    - If the subquery returns multiple columns (a row subquery) and it is
      not a UNION, maybe_null == true means that the subquery could return
      empty results. And, conversely, maybe_null == false means that it
      could not return empty results. It does not say anything about the
      nullability of the columns. This is not consistent with how
      maybe_null is used in Item_row and in scalar subqueries.
    
    - If the subquery is a row query that is a UNION, maybe_null == true
      means that it accesses tables, whereas maybe_null == false means
      that it does not access any tables.
    
    Take these two equivalent statements:
    
      SELECT (NULL, NULL) IN ...
    
      SELECT (SELECT NULL, NULL) IN ...
    
    In the former, the Item_row that represents the left operand of the IN
    expression, has maybe_null == true. In the latter, the
    Item_singlerow_subselect that represents the left operand, has
    maybe_null == false.
    
    Or take these two equivalent statements:
    
      SELECT (SELECT 1, 2 WHERE FALSE) IN ...
    
      SELECT (SELECT 1, 2 WHERE FALSE UNION SELECT 1, 2 WHERE FALSE) IN ...
    
    In the former of these, the Item_singlerow_subselect has maybe_null ==
    true. In the latter, maybe_null is false.
    
    These inconsistencies confuse the code that transforms IN expressions
    to EXISTS expressions in the following way:
    
    Item_in_subselect::row_value_transformer() sets up a data structure
    (pushed_cond_guards) that might be needed in the synthetic predicates
    that are created later. It only does this if left_expr->maybe_null is
    true. However, the data structure is needed if any of the values in
    the left expression could be NULL, and this is not what
    left_expr->maybe_null tells us if left_expr is an
    Item_singlerow_subselect, only if it is an Item_row.
    
    Because of this, it might skip setting up pushed_cond_guards even
    though it will be needed later. When it is attempted used later, and
    it is found that it is NULL, it results in an assertion failure or a
    wrong result.
    
    Before bug#20729351, all the columns returned by the row subquery
    (represented by Item_cache objects) had maybe_null set to false, even
    if the value could be NULL. This was clearly wrong, and it caused
    wrong results in some cases. The assert wasn't seen, though.
    
    After bug#20729351, all the columns returned by the row subquery have
    maybe_null pessimistically set to true. This made the queries take a
    slightly different code path, which still produced wrong results, but
    now also raise an assert failure.
    
    Solution:
    
    This patch fixes these problems by making the maybe_null flag in
    Item_singlerow_subselect have a value that is more consistent. Now it
    is false if and only if we know for sure that it is never going to
    return empty results, and none of the returned columns are nullable.
    
    It also makes the maybe_null flag of the individual columns in the row
    subquery more accurate. Instead of being always false/true as they
    were before/after 20729351, the maybe_null flag now tells the actual
    nullability of the column (if the column could have the value NULL, or
    if the subquery could return an empty result, maybe_null is true).
    
    Additionally, testing of this fix exposed another, earlier wrong
    result regression, caused by "WL#6369: EXPLAIN for other thread." A
    query such as
    
      SELECT (SELECT 1, 2 UNION SELECT 1, 2) IN (SELECT 1, 2)
    
    started returning 0 instead of 1 after WL#6369, and it started raising
    an assert failure after bug#20729351.
    
    The problem is very similar to bug#22090717, in that the optimization
    of the IN expression happens earlier now than before. It tries to read
    the values returned from the UNION subquery before the subquery has
    been evaluated. The patch enhances the fix for bug#22090717 with an
    extra check in Item_in_subselect::mark_as_outer() to prevent reading
    of data from a subquery before the subquery has been evaluated, except
    in the case where the subquery returns a basic constant (a literal).
    985e788e
    Bug#22089623: ASSERTION IN ITEM_FUNC_TRIG_COND::VAL_INT() WITH SUBQUERY
    Knut Anders Hatlen authored
                  ON LEFT SIDE OF IN
    
    Problem:
    
    If the left expression of an IN expression is a row subquery that does
    not access any tables, an assertion could be raised in debug builds,
    and wrong results could be returned in release builds.
    
    The assert failure was introduced by the fix for bug#20729351, but the
    wrong results were seen even before that fix.
    
    Analysis:
    
    The root cause for these problems is an inconsistency in how
    nullability is represented. For Item_row, the maybe_null flag tells if
    any of the columns in the row could be NULL. For scalar subqueries and
    row subqueries represented by Item_singlerow_subselect, maybe_null has
    this meaning:
    
    - If the subquery is scalar, maybe_null == true means that the
      returned column could be NULL, or that the subquery could return an
      empty result (which would make the scalar subquery evaluate to
      NULL). This is consistent with how maybe_null is used in Item_row.
    
    - If the subquery returns multiple columns (a row subquery) and it is
      not a UNION, maybe_null == true means that the subquery could return
      empty results. And, conversely, maybe_null == false means that it
      could not return empty results. It does not say anything about the
      nullability of the columns. This is not consistent with how
      maybe_null is used in Item_row and in scalar subqueries.
    
    - If the subquery is a row query that is a UNION, maybe_null == true
      means that it accesses tables, whereas maybe_null == false means
      that it does not access any tables.
    
    Take these two equivalent statements:
    
      SELECT (NULL, NULL) IN ...
    
      SELECT (SELECT NULL, NULL) IN ...
    
    In the former, the Item_row that represents the left operand of the IN
    expression, has maybe_null == true. In the latter, the
    Item_singlerow_subselect that represents the left operand, has
    maybe_null == false.
    
    Or take these two equivalent statements:
    
      SELECT (SELECT 1, 2 WHERE FALSE) IN ...
    
      SELECT (SELECT 1, 2 WHERE FALSE UNION SELECT 1, 2 WHERE FALSE) IN ...
    
    In the former of these, the Item_singlerow_subselect has maybe_null ==
    true. In the latter, maybe_null is false.
    
    These inconsistencies confuse the code that transforms IN expressions
    to EXISTS expressions in the following way:
    
    Item_in_subselect::row_value_transformer() sets up a data structure
    (pushed_cond_guards) that might be needed in the synthetic predicates
    that are created later. It only does this if left_expr->maybe_null is
    true. However, the data structure is needed if any of the values in
    the left expression could be NULL, and this is not what
    left_expr->maybe_null tells us if left_expr is an
    Item_singlerow_subselect, only if it is an Item_row.
    
    Because of this, it might skip setting up pushed_cond_guards even
    though it will be needed later. When it is attempted used later, and
    it is found that it is NULL, it results in an assertion failure or a
    wrong result.
    
    Before bug#20729351, all the columns returned by the row subquery
    (represented by Item_cache objects) had maybe_null set to false, even
    if the value could be NULL. This was clearly wrong, and it caused
    wrong results in some cases. The assert wasn't seen, though.
    
    After bug#20729351, all the columns returned by the row subquery have
    maybe_null pessimistically set to true. This made the queries take a
    slightly different code path, which still produced wrong results, but
    now also raise an assert failure.
    
    Solution:
    
    This patch fixes these problems by making the maybe_null flag in
    Item_singlerow_subselect have a value that is more consistent. Now it
    is false if and only if we know for sure that it is never going to
    return empty results, and none of the returned columns are nullable.
    
    It also makes the maybe_null flag of the individual columns in the row
    subquery more accurate. Instead of being always false/true as they
    were before/after 20729351, the maybe_null flag now tells the actual
    nullability of the column (if the column could have the value NULL, or
    if the subquery could return an empty result, maybe_null is true).
    
    Additionally, testing of this fix exposed another, earlier wrong
    result regression, caused by "WL#6369: EXPLAIN for other thread." A
    query such as
    
      SELECT (SELECT 1, 2 UNION SELECT 1, 2) IN (SELECT 1, 2)
    
    started returning 0 instead of 1 after WL#6369, and it started raising
    an assert failure after bug#20729351.
    
    The problem is very similar to bug#22090717, in that the optimization
    of the IN expression happens earlier now than before. It tries to read
    the values returned from the UNION subquery before the subquery has
    been evaluated. The patch enhances the fix for bug#22090717 with an
    extra check in Item_in_subselect::mark_as_outer() to prevent reading
    of data from a subquery before the subquery has been evaluated, except
    in the case where the subquery returns a basic constant (a literal).
Loading