mysql-test/include/subquery_mat.inc · mysql-8.0.0 · Rasoul Jahanshahi / Mysql Server

Dec 02, 2015

Bug#22089623: ASSERTION IN ITEM_FUNC_TRIG_COND::VAL_INT() WITH SUBQUERY · 985e788e

Knut Anders Hatlen authored Dec 02, 2015

              ON LEFT SIDE OF IN

Problem:

If the left expression of an IN expression is a row subquery that does
not access any tables, an assertion could be raised in debug builds,
and wrong results could be returned in release builds.

The assert failure was introduced by the fix for bug#20729351, but the
wrong results were seen even before that fix.

Analysis:

The root cause for these problems is an inconsistency in how
nullability is represented. For Item_row, the maybe_null flag tells if
any of the columns in the row could be NULL. For scalar subqueries and
row subqueries represented by Item_singlerow_subselect, maybe_null has
this meaning:

- If the subquery is scalar, maybe_null == true means that the
  returned column could be NULL, or that the subquery could return an
  empty result (which would make the scalar subquery evaluate to
  NULL). This is consistent with how maybe_null is used in Item_row.

- If the subquery returns multiple columns (a row subquery) and it is
  not a UNION, maybe_null == true means that the subquery could return
  empty results. And, conversely, maybe_null == false means that it
  could not return empty results. It does not say anything about the
  nullability of the columns. This is not consistent with how
  maybe_null is used in Item_row and in scalar subqueries.

- If the subquery is a row query that is a UNION, maybe_null == true
  means that it accesses tables, whereas maybe_null == false means
  that it does not access any tables.

Take these two equivalent statements:

  SELECT (NULL, NULL) IN ...

  SELECT (SELECT NULL, NULL) IN ...

In the former, the Item_row that represents the left operand of the IN
expression, has maybe_null == true. In the latter, the
Item_singlerow_subselect that represents the left operand, has
maybe_null == false.

Or take these two equivalent statements:

  SELECT (SELECT 1, 2 WHERE FALSE) IN ...

  SELECT (SELECT 1, 2 WHERE FALSE UNION SELECT 1, 2 WHERE FALSE) IN ...

In the former of these, the Item_singlerow_subselect has maybe_null ==
true. In the latter, maybe_null is false.

These inconsistencies confuse the code that transforms IN expressions
to EXISTS expressions in the following way:

Item_in_subselect::row_value_transformer() sets up a data structure
(pushed_cond_guards) that might be needed in the synthetic predicates
that are created later. It only does this if left_expr->maybe_null is
true. However, the data structure is needed if any of the values in
the left expression could be NULL, and this is not what
left_expr->maybe_null tells us if left_expr is an
Item_singlerow_subselect, only if it is an Item_row.

Because of this, it might skip setting up pushed_cond_guards even
though it will be needed later. When it is attempted used later, and
it is found that it is NULL, it results in an assertion failure or a
wrong result.

Before bug#20729351, all the columns returned by the row subquery
(represented by Item_cache objects) had maybe_null set to false, even
if the value could be NULL. This was clearly wrong, and it caused
wrong results in some cases. The assert wasn't seen, though.

After bug#20729351, all the columns returned by the row subquery have
maybe_null pessimistically set to true. This made the queries take a
slightly different code path, which still produced wrong results, but
now also raise an assert failure.

Solution:

This patch fixes these problems by making the maybe_null flag in
Item_singlerow_subselect have a value that is more consistent. Now it
is false if and only if we know for sure that it is never going to
return empty results, and none of the returned columns are nullable.

It also makes the maybe_null flag of the individual columns in the row
subquery more accurate. Instead of being always false/true as they
were before/after 20729351, the maybe_null flag now tells the actual
nullability of the column (if the column could have the value NULL, or
if the subquery could return an empty result, maybe_null is true).

Additionally, testing of this fix exposed another, earlier wrong
result regression, caused by "WL#6369: EXPLAIN for other thread." A
query such as

  SELECT (SELECT 1, 2 UNION SELECT 1, 2) IN (SELECT 1, 2)

started returning 0 instead of 1 after WL#6369, and it started raising
an assert failure after bug#20729351.

The problem is very similar to bug#22090717, in that the optimization
of the IN expression happens earlier now than before. It tries to read
the values returned from the UNION subquery before the subquery has
been evaluated. The patch enhances the fix for bug#22090717 with an
extra check in Item_in_subselect::mark_as_outer() to prevent reading
of data from a subquery before the subquery has been evaluated, except
in the case where the subquery returns a basic constant (a literal).

985e788e

Bug#22089623: ASSERTION IN ITEM_FUNC_TRIG_COND::VAL_INT() WITH SUBQUERY

Knut Anders Hatlen authored Dec 02, 2015

              ON LEFT SIDE OF IN

Problem:

If the left expression of an IN expression is a row subquery that does
not access any tables, an assertion could be raised in debug builds,
and wrong results could be returned in release builds.

The assert failure was introduced by the fix for bug#20729351, but the
wrong results were seen even before that fix.

Analysis:

The root cause for these problems is an inconsistency in how
nullability is represented. For Item_row, the maybe_null flag tells if
any of the columns in the row could be NULL. For scalar subqueries and
row subqueries represented by Item_singlerow_subselect, maybe_null has
this meaning:

- If the subquery is scalar, maybe_null == true means that the
  returned column could be NULL, or that the subquery could return an
  empty result (which would make the scalar subquery evaluate to
  NULL). This is consistent with how maybe_null is used in Item_row.

- If the subquery returns multiple columns (a row subquery) and it is
  not a UNION, maybe_null == true means that the subquery could return
  empty results. And, conversely, maybe_null == false means that it
  could not return empty results. It does not say anything about the
  nullability of the columns. This is not consistent with how
  maybe_null is used in Item_row and in scalar subqueries.

- If the subquery is a row query that is a UNION, maybe_null == true
  means that it accesses tables, whereas maybe_null == false means
  that it does not access any tables.

Take these two equivalent statements:

  SELECT (NULL, NULL) IN ...

  SELECT (SELECT NULL, NULL) IN ...

In the former, the Item_row that represents the left operand of the IN
expression, has maybe_null == true. In the latter, the
Item_singlerow_subselect that represents the left operand, has
maybe_null == false.

Or take these two equivalent statements:

  SELECT (SELECT 1, 2 WHERE FALSE) IN ...

  SELECT (SELECT 1, 2 WHERE FALSE UNION SELECT 1, 2 WHERE FALSE) IN ...

In the former of these, the Item_singlerow_subselect has maybe_null ==
true. In the latter, maybe_null is false.

These inconsistencies confuse the code that transforms IN expressions
to EXISTS expressions in the following way:

Item_in_subselect::row_value_transformer() sets up a data structure
(pushed_cond_guards) that might be needed in the synthetic predicates
that are created later. It only does this if left_expr->maybe_null is
true. However, the data structure is needed if any of the values in
the left expression could be NULL, and this is not what
left_expr->maybe_null tells us if left_expr is an
Item_singlerow_subselect, only if it is an Item_row.

Because of this, it might skip setting up pushed_cond_guards even
though it will be needed later. When it is attempted used later, and
it is found that it is NULL, it results in an assertion failure or a
wrong result.

Before bug#20729351, all the columns returned by the row subquery
(represented by Item_cache objects) had maybe_null set to false, even
if the value could be NULL. This was clearly wrong, and it caused
wrong results in some cases. The assert wasn't seen, though.

After bug#20729351, all the columns returned by the row subquery have
maybe_null pessimistically set to true. This made the queries take a
slightly different code path, which still produced wrong results, but
now also raise an assert failure.

Solution:

This patch fixes these problems by making the maybe_null flag in
Item_singlerow_subselect have a value that is more consistent. Now it
is false if and only if we know for sure that it is never going to
return empty results, and none of the returned columns are nullable.

It also makes the maybe_null flag of the individual columns in the row
subquery more accurate. Instead of being always false/true as they
were before/after 20729351, the maybe_null flag now tells the actual
nullability of the column (if the column could have the value NULL, or
if the subquery could return an empty result, maybe_null is true).

Additionally, testing of this fix exposed another, earlier wrong
result regression, caused by "WL#6369: EXPLAIN for other thread." A
query such as

  SELECT (SELECT 1, 2 UNION SELECT 1, 2) IN (SELECT 1, 2)

started returning 0 instead of 1 after WL#6369, and it started raising
an assert failure after bug#20729351.

The problem is very similar to bug#22090717, in that the optimization
of the IN expression happens earlier now than before. It tries to read
the values returned from the UNION subquery before the subquery has
been evaluated. The patch enhances the fix for bug#22090717 with an
extra check in Item_in_subselect::mark_as_outer() to prevent reading
of data from a subquery before the subquery has been evaluated, except
in the case where the subquery returns a basic constant (a literal).