-
Knut Anders Hatlen authored
ON LEFT SIDE OF IN Problem: If the left expression of an IN expression is a row subquery that does not access any tables, an assertion could be raised in debug builds, and wrong results could be returned in release builds. The assert failure was introduced by the fix for bug#20729351, but the wrong results were seen even before that fix. Analysis: The root cause for these problems is an inconsistency in how nullability is represented. For Item_row, the maybe_null flag tells if any of the columns in the row could be NULL. For scalar subqueries and row subqueries represented by Item_singlerow_subselect, maybe_null has this meaning: - If the subquery is scalar, maybe_null == true means that the returned column could be NULL, or that the subquery could return an empty result (which would make the scalar subquery evaluate to NULL). This is consistent with how maybe_null is used in Item_row. - If the subquery returns multiple columns (a row subquery) and it is not a UNION, maybe_null == true means that the subquery could return empty results. And, conversely, maybe_null == false means that it could not return empty results. It does not say anything about the nullability of the columns. This is not consistent with how maybe_null is used in Item_row and in scalar subqueries. - If the subquery is a row query that is a UNION, maybe_null == true means that it accesses tables, whereas maybe_null == false means that it does not access any tables. Take these two equivalent statements: SELECT (NULL, NULL) IN ... SELECT (SELECT NULL, NULL) IN ... In the former, the Item_row that represents the left operand of the IN expression, has maybe_null == true. In the latter, the Item_singlerow_subselect that represents the left operand, has maybe_null == false. Or take these two equivalent statements: SELECT (SELECT 1, 2 WHERE FALSE) IN ... SELECT (SELECT 1, 2 WHERE FALSE UNION SELECT 1, 2 WHERE FALSE) IN ... In the former of these, the Item_singlerow_subselect has maybe_null == true. In the latter, maybe_null is false. These inconsistencies confuse the code that transforms IN expressions to EXISTS expressions in the following way: Item_in_subselect::row_value_transformer() sets up a data structure (pushed_cond_guards) that might be needed in the synthetic predicates that are created later. It only does this if left_expr->maybe_null is true. However, the data structure is needed if any of the values in the left expression could be NULL, and this is not what left_expr->maybe_null tells us if left_expr is an Item_singlerow_subselect, only if it is an Item_row. Because of this, it might skip setting up pushed_cond_guards even though it will be needed later. When it is attempted used later, and it is found that it is NULL, it results in an assertion failure or a wrong result. Before bug#20729351, all the columns returned by the row subquery (represented by Item_cache objects) had maybe_null set to false, even if the value could be NULL. This was clearly wrong, and it caused wrong results in some cases. The assert wasn't seen, though. After bug#20729351, all the columns returned by the row subquery have maybe_null pessimistically set to true. This made the queries take a slightly different code path, which still produced wrong results, but now also raise an assert failure. Solution: This patch fixes these problems by making the maybe_null flag in Item_singlerow_subselect have a value that is more consistent. Now it is false if and only if we know for sure that it is never going to return empty results, and none of the returned columns are nullable. It also makes the maybe_null flag of the individual columns in the row subquery more accurate. Instead of being always false/true as they were before/after 20729351, the maybe_null flag now tells the actual nullability of the column (if the column could have the value NULL, or if the subquery could return an empty result, maybe_null is true). Additionally, testing of this fix exposed another, earlier wrong result regression, caused by "WL#6369: EXPLAIN for other thread." A query such as SELECT (SELECT 1, 2 UNION SELECT 1, 2) IN (SELECT 1, 2) started returning 0 instead of 1 after WL#6369, and it started raising an assert failure after bug#20729351. The problem is very similar to bug#22090717, in that the optimization of the IN expression happens earlier now than before. It tries to read the values returned from the UNION subquery before the subquery has been evaluated. The patch enhances the fix for bug#22090717 with an extra check in Item_in_subselect::mark_as_outer() to prevent reading of data from a subquery before the subquery has been evaluated, except in the case where the subquery returns a basic constant (a literal).
Knut Anders Hatlen authoredON LEFT SIDE OF IN Problem: If the left expression of an IN expression is a row subquery that does not access any tables, an assertion could be raised in debug builds, and wrong results could be returned in release builds. The assert failure was introduced by the fix for bug#20729351, but the wrong results were seen even before that fix. Analysis: The root cause for these problems is an inconsistency in how nullability is represented. For Item_row, the maybe_null flag tells if any of the columns in the row could be NULL. For scalar subqueries and row subqueries represented by Item_singlerow_subselect, maybe_null has this meaning: - If the subquery is scalar, maybe_null == true means that the returned column could be NULL, or that the subquery could return an empty result (which would make the scalar subquery evaluate to NULL). This is consistent with how maybe_null is used in Item_row. - If the subquery returns multiple columns (a row subquery) and it is not a UNION, maybe_null == true means that the subquery could return empty results. And, conversely, maybe_null == false means that it could not return empty results. It does not say anything about the nullability of the columns. This is not consistent with how maybe_null is used in Item_row and in scalar subqueries. - If the subquery is a row query that is a UNION, maybe_null == true means that it accesses tables, whereas maybe_null == false means that it does not access any tables. Take these two equivalent statements: SELECT (NULL, NULL) IN ... SELECT (SELECT NULL, NULL) IN ... In the former, the Item_row that represents the left operand of the IN expression, has maybe_null == true. In the latter, the Item_singlerow_subselect that represents the left operand, has maybe_null == false. Or take these two equivalent statements: SELECT (SELECT 1, 2 WHERE FALSE) IN ... SELECT (SELECT 1, 2 WHERE FALSE UNION SELECT 1, 2 WHERE FALSE) IN ... In the former of these, the Item_singlerow_subselect has maybe_null == true. In the latter, maybe_null is false. These inconsistencies confuse the code that transforms IN expressions to EXISTS expressions in the following way: Item_in_subselect::row_value_transformer() sets up a data structure (pushed_cond_guards) that might be needed in the synthetic predicates that are created later. It only does this if left_expr->maybe_null is true. However, the data structure is needed if any of the values in the left expression could be NULL, and this is not what left_expr->maybe_null tells us if left_expr is an Item_singlerow_subselect, only if it is an Item_row. Because of this, it might skip setting up pushed_cond_guards even though it will be needed later. When it is attempted used later, and it is found that it is NULL, it results in an assertion failure or a wrong result. Before bug#20729351, all the columns returned by the row subquery (represented by Item_cache objects) had maybe_null set to false, even if the value could be NULL. This was clearly wrong, and it caused wrong results in some cases. The assert wasn't seen, though. After bug#20729351, all the columns returned by the row subquery have maybe_null pessimistically set to true. This made the queries take a slightly different code path, which still produced wrong results, but now also raise an assert failure. Solution: This patch fixes these problems by making the maybe_null flag in Item_singlerow_subselect have a value that is more consistent. Now it is false if and only if we know for sure that it is never going to return empty results, and none of the returned columns are nullable. It also makes the maybe_null flag of the individual columns in the row subquery more accurate. Instead of being always false/true as they were before/after 20729351, the maybe_null flag now tells the actual nullability of the column (if the column could have the value NULL, or if the subquery could return an empty result, maybe_null is true). Additionally, testing of this fix exposed another, earlier wrong result regression, caused by "WL#6369: EXPLAIN for other thread." A query such as SELECT (SELECT 1, 2 UNION SELECT 1, 2) IN (SELECT 1, 2) started returning 0 instead of 1 after WL#6369, and it started raising an assert failure after bug#20729351. The problem is very similar to bug#22090717, in that the optimization of the IN expression happens earlier now than before. It tries to read the values returned from the UNION subquery before the subquery has been evaluated. The patch enhances the fix for bug#22090717 with an extra check in Item_in_subselect::mark_as_outer() to prevent reading of data from a subquery before the subquery has been evaluated, except in the case where the subquery returns a basic constant (a literal).
Loading