Skip to content
  • Guilhem Bichot's avatar
    dc80b26d
    WL#4245 Convert NOT IN and NOT EXISTS in WHERE/ON to an antijoin · dc80b26d
    Guilhem Bichot authored
    Bug#29027883 INCORRECT RESULT OF LEFT JOIN
    Bug#29540230 LEFT JOIN + IS NOT TRUE CONDITION YIELDS WRONG RESULT
    
    The goal of this WL is to rewrite
    
    SELECT ... FROM ot WHERE oe NOT IN (SELECT ie FROM it1, it2 ...)
    
    to
    
    SELECT ... FROM ot ANTIJOIN (it1, it2 ...) ON oe=ie;
    
    ("merging" the subquery).
    It is like a semijoin except that rows of "ot" which have a match are
    rejected.
    The antijoin is internally handled as a special LEFT JOIN:
    
    SELECT ... FROM ot LEFT JOIN (it1, it2 ...) ON oe=ie
    WHERE right-side row IS NULL COMPLEMENTED;
    
    So a post-filter condition, "IS NULL COMPLEMENTED", is added to it1.
    As an optimization, a "Not Exists" tag is added to "it1": when a row
    of it1 has matched there is no need to test other rows of it.
    
    We also have the possibility to use Materialization-lookup: (it1,it2)
    is stored into a tmp table, which gets the IS NULL COMPLEMENTED
    condition and the "Not Exists" tag.
    
    Duplicate weedout is also possible. Loosescan and Materialization-scan
    are not because a row of 'ot' must always be emitted.
    
    Antijoin is not possible, if there are NULLs on either side of
    the NOT IN. For NOT EXISTS it's always possible, as it's equivalent to
    WHERE 1 NOT IN (SELECT 1 FROM ...).
    
    antijoin is visible in the rewritten query shown by EXPLAIN, in the
    optimizer trace, and through the "Not Exists" tag on the table in
    EXPLAIN.
    
    Two bugs found during development and posing problems were fixed
    (references above).
    
    Good extras:
    - Materialization-lookup strategy was previously forbidden for semijoins
    inside LEFT JOIN, now it is allowed.
    - WHERE oe IN (SELECT ie) IS TRUE
    didn't allow semijoin (as IN wasn't seen as top-level), now it does
    - WHERE oe IN (SELECT ie) IS NOT TRUE,
    allows antijoin (NULLs cause no problems then).
    - WHERE oe IN (SELECT ie) IS FALSE allow antijoin if there are no
    NULLables.
    
    A limitation: semijoin/antijoin are not done if containing an
    antijoin (they thus remain subqueries).
    
    top_level_item() is renamed to apply_is_true().
    is_top_level_item() is renamed to ignore_unknown().
    
    Item_func_true/false are deleted, their logic goes into parent Item_func_truth.
    The Bool_test enum (BOOL_IS_TRUE/etc) is used in Item_func_truth
    instead of boolean members.
    IS NULL and IS NOT NULL are not included into this change.
    
    The code converting
    NOT_NULLABLE_TEMPORAL_COLUMN IS NULL
    to
    NOT_NULLABLE_TEMPORAL_COLUMN='0000-00-00'
    is moved from optimization phase to Item_func_isnull::fix_fields.
    
    neg_transformer()s are changed to truth_transformer()s: on top of
    being able to push down NOT, we can also push down IS [NOT]
    TRUE/FALSE. And such transformers are added to Item_func_truth.
    So
    NOT (x IS TRUE) becomes
    x IS NOT TRUE (no NOT item)
    And
    (expr IS TRUE) IS FALSE becomes
    expr IS NOT TRUE (no IS FALSE item).
    This reduces the depth of the Item tree.
    
    If an item has no truth_transformer(), as a fallback it gets wrapped
    into a real Item_func_not/Item_func_truth.
    
    Thanks to the truth_transformer() Item_in_subselect gets the signal
    that it's, for example, inside IS NOT TRUE (and can thus do antijoin).
    This is done by a new function change_truth_value_of_expression(); which
    also does the job of negate_condition() which is suppressed from code.
    It calls truth_transformer(), which, likewise, also does the job of
    neg_transformer() which is suppressed.
    
    "::print()" functions of subquery items now take care of printing the "NOT" and
    "IS etc" that they absorbed.
    
    COALESCE(x_1,x_2,...) is marked as non-nullable as long as one x_i is
    not nullable.
    
    Approved by: Roy Lyseng <roy.lyseng@oracle.com>
    dc80b26d
    WL#4245 Convert NOT IN and NOT EXISTS in WHERE/ON to an antijoin
    Guilhem Bichot authored
    Bug#29027883 INCORRECT RESULT OF LEFT JOIN
    Bug#29540230 LEFT JOIN + IS NOT TRUE CONDITION YIELDS WRONG RESULT
    
    The goal of this WL is to rewrite
    
    SELECT ... FROM ot WHERE oe NOT IN (SELECT ie FROM it1, it2 ...)
    
    to
    
    SELECT ... FROM ot ANTIJOIN (it1, it2 ...) ON oe=ie;
    
    ("merging" the subquery).
    It is like a semijoin except that rows of "ot" which have a match are
    rejected.
    The antijoin is internally handled as a special LEFT JOIN:
    
    SELECT ... FROM ot LEFT JOIN (it1, it2 ...) ON oe=ie
    WHERE right-side row IS NULL COMPLEMENTED;
    
    So a post-filter condition, "IS NULL COMPLEMENTED", is added to it1.
    As an optimization, a "Not Exists" tag is added to "it1": when a row
    of it1 has matched there is no need to test other rows of it.
    
    We also have the possibility to use Materialization-lookup: (it1,it2)
    is stored into a tmp table, which gets the IS NULL COMPLEMENTED
    condition and the "Not Exists" tag.
    
    Duplicate weedout is also possible. Loosescan and Materialization-scan
    are not because a row of 'ot' must always be emitted.
    
    Antijoin is not possible, if there are NULLs on either side of
    the NOT IN. For NOT EXISTS it's always possible, as it's equivalent to
    WHERE 1 NOT IN (SELECT 1 FROM ...).
    
    antijoin is visible in the rewritten query shown by EXPLAIN, in the
    optimizer trace, and through the "Not Exists" tag on the table in
    EXPLAIN.
    
    Two bugs found during development and posing problems were fixed
    (references above).
    
    Good extras:
    - Materialization-lookup strategy was previously forbidden for semijoins
    inside LEFT JOIN, now it is allowed.
    - WHERE oe IN (SELECT ie) IS TRUE
    didn't allow semijoin (as IN wasn't seen as top-level), now it does
    - WHERE oe IN (SELECT ie) IS NOT TRUE,
    allows antijoin (NULLs cause no problems then).
    - WHERE oe IN (SELECT ie) IS FALSE allow antijoin if there are no
    NULLables.
    
    A limitation: semijoin/antijoin are not done if containing an
    antijoin (they thus remain subqueries).
    
    top_level_item() is renamed to apply_is_true().
    is_top_level_item() is renamed to ignore_unknown().
    
    Item_func_true/false are deleted, their logic goes into parent Item_func_truth.
    The Bool_test enum (BOOL_IS_TRUE/etc) is used in Item_func_truth
    instead of boolean members.
    IS NULL and IS NOT NULL are not included into this change.
    
    The code converting
    NOT_NULLABLE_TEMPORAL_COLUMN IS NULL
    to
    NOT_NULLABLE_TEMPORAL_COLUMN='0000-00-00'
    is moved from optimization phase to Item_func_isnull::fix_fields.
    
    neg_transformer()s are changed to truth_transformer()s: on top of
    being able to push down NOT, we can also push down IS [NOT]
    TRUE/FALSE. And such transformers are added to Item_func_truth.
    So
    NOT (x IS TRUE) becomes
    x IS NOT TRUE (no NOT item)
    And
    (expr IS TRUE) IS FALSE becomes
    expr IS NOT TRUE (no IS FALSE item).
    This reduces the depth of the Item tree.
    
    If an item has no truth_transformer(), as a fallback it gets wrapped
    into a real Item_func_not/Item_func_truth.
    
    Thanks to the truth_transformer() Item_in_subselect gets the signal
    that it's, for example, inside IS NOT TRUE (and can thus do antijoin).
    This is done by a new function change_truth_value_of_expression(); which
    also does the job of negate_condition() which is suppressed from code.
    It calls truth_transformer(), which, likewise, also does the job of
    neg_transformer() which is suppressed.
    
    "::print()" functions of subquery items now take care of printing the "NOT" and
    "IS etc" that they absorbed.
    
    COALESCE(x_1,x_2,...) is marked as non-nullable as long as one x_i is
    not nullable.
    
    Approved by: Roy Lyseng <roy.lyseng@oracle.com>
Loading