Skip to content
  • Roy Lyseng's avatar
    3ea5ec75
    Bug#32244631: Illegal mix of collations (ascii_general_ci,implicit) · 3ea5ec75
    Roy Lyseng authored
    Bug#32501472: Concatenation may create invalid UTF8 strings
    Bug#24847620: Illegal ASCII values accepted into ASCII columns
    Bug#30746908: Assertion `(tlen % 4) == 0' in my_strnncollsp_utf32
    
    The basic problem to be solved here is bug 32244631 which is a
    regression from WL#9384. The problem is a comparison between a column
    and a system variable. Before WL#9384, the system variable was deemed
    a const value, but due to this worklog, it is now deemed
    const_for_execution. The collation aggregation performed by
    a comparison operation operates differently on these two types, so
    that we get an error with incompatible character sets.
    This is due to setting up Item_func_conv wrapper object with a "safe"
    parameter, which fails in the latter case.
    
    It may be possible to adjust the use of the "safe" parameter, however
    there is also a good reason to ditch the Item_func_conv object
    altogether. Instead, the executor can perform dynamic conversion of
    character strings during execution, and only issue an error message
    when that conversion fails. One example is comparison of a latin1
    string and a utf8 string, and the collation aggregator has decided
    that the comparison should be using a latin1 collation. If the utf8
    strings contains only latin1 characters, we will now allow this
    comparison. We will report an error only if the utf8 string contains
    characters outside the latin1 character set.
    
    There are also other benefits with removing Item_func_conv objects:
    These objects are added during preparation, thus some properties
    such as filter properties for histograms are currently not propagated
    through Item_func_conv objects. Without these objects, we get improved
    statistics for histogram calculation.
    
    The function agg_item_set_converter() is modified with a new argument
    only_consts: When this is true, do not add conversion objects for
    items that are const_for_execution. This option is chosen for
    comparison operators. However, string-producing functions such as
    concatenations still have this argument as false, meaning that
    conversion objects are still used.
    
    The function Arg_comparator::compare_string() is enhanced so that
    arguments with character set different from the comparison character
    set are converted before the comparison. Another fix is also applied
    to this function: Instead of adjusting String objects before and after
    comparison, function sortcmp() is replaced with the underlying function
    strnncollsp(), which operates on the attributes of the String objects.
    
    Related to this problem is also a problem with concatenation and other
    string-producing operations, which may generate characters outside
    the expected character set when one of the arguments is a binary string
    (bug 32501472). This problem is fixed by adding calls to
    is_valid_string() for all arguments that are binary strings and should
    be interpreted as a restricted character set. All such arguments are
    now evaluated with a new function Item_str_func::eval_string_arg().
    The function handles simple copy, character set conversion and
    character set validation, as necessary by metadata and query properties.
    
    The third bug handled here is handling of ASCII characters beyond the
    7-bit characters. These are invalid, and have already been rejected,
    except when the source is a binary string (bug 21774967 started issuing
    warnings for such characters). We now  reject such strings, even when
    coming from binary strings. The characters are now filtered out in
    is_valid_string(), and the code to issue warnings formerly placed in
    field_well_formed_copy_nchars() is now removed.
    
    Bug#30746908 is fixed because we no longer need to generate
    Item_func_conv objects for comparisons.
    
    convert_zerofill_number_to_string() needed an adjustement: Strings that
    are substituted for zero-filled numbers have derivation set to NUMERIC,
    so that they are accepted by the new asserts in concatenation functions.
    
    Finally, a problem with the return value of the function
    my_well_formed_len_ascii() is fixed.
    
    There are several test changes due to this fix, here are some
    explanations:
    
    - main.subquery_sj_*
    
      Conversion node is removed from column t1.var_10_latin.
      Conversion is performed in comparison function.
    
    - main.subquery_sj_innodb_*
    
      Conversion node is removed from column t2.c1.
    
    - main.myisam_mrr_*
    
      Conversion node is removed from column t3.c1.
    
    - opt_trace.general2_no_prot
    
      Conversion node is removed from column table2.col_varchar_10_latin1_key.
      Because of this, range analysis was made possible.
    
    - main.func_concat
    
      Conversion node is removed from concat expression.
    
    - main.join_outer*
    
      Conversion node is removed from column t6b.col_varchar_10_latin1_key.
    
    - main.range_*
    
      Character CHAR(128) incompatible with UTF8 replaced with CHAR(127)
    
    - main.opt_hints_join_order
    
      An index was enabled when Item_func_conv object was removed.
      (However, it is rejected later due to being in the wrong character set).
    
    - main.derived
    
      Row count estimate is improved when Item_func_conv object is removed.
    
    - main.ctype_ucs
    
      Because the character set check is delayed until runtime, one error
      message is changed and one query now succeds.
    
      In another case with non-standard connection character set, an
      index lookup against a DATE column is restored.
    
    - i_main.subquery-bug24595581
    
      Since character set check is now delayed until execution, test case succeeds.
    
    - main.insert_update
    
      Added errors when rejecting invalid ASCII strings for insert and update.
      Added errors when rejecting invalid ASCII strings for comparison.
    
    - main.hash_join
    
      Conversion node is removed for column t2.col1. In addition, row estimate
      is corrected since it does not have to deal with the conversion node.
    
    Change-Id: I6623e9d6939754d614a2786ec436bde3a952f8c9
    3ea5ec75
    Bug#32244631: Illegal mix of collations (ascii_general_ci,implicit)
    Roy Lyseng authored
    Bug#32501472: Concatenation may create invalid UTF8 strings
    Bug#24847620: Illegal ASCII values accepted into ASCII columns
    Bug#30746908: Assertion `(tlen % 4) == 0' in my_strnncollsp_utf32
    
    The basic problem to be solved here is bug 32244631 which is a
    regression from WL#9384. The problem is a comparison between a column
    and a system variable. Before WL#9384, the system variable was deemed
    a const value, but due to this worklog, it is now deemed
    const_for_execution. The collation aggregation performed by
    a comparison operation operates differently on these two types, so
    that we get an error with incompatible character sets.
    This is due to setting up Item_func_conv wrapper object with a "safe"
    parameter, which fails in the latter case.
    
    It may be possible to adjust the use of the "safe" parameter, however
    there is also a good reason to ditch the Item_func_conv object
    altogether. Instead, the executor can perform dynamic conversion of
    character strings during execution, and only issue an error message
    when that conversion fails. One example is comparison of a latin1
    string and a utf8 string, and the collation aggregator has decided
    that the comparison should be using a latin1 collation. If the utf8
    strings contains only latin1 characters, we will now allow this
    comparison. We will report an error only if the utf8 string contains
    characters outside the latin1 character set.
    
    There are also other benefits with removing Item_func_conv objects:
    These objects are added during preparation, thus some properties
    such as filter properties for histograms are currently not propagated
    through Item_func_conv objects. Without these objects, we get improved
    statistics for histogram calculation.
    
    The function agg_item_set_converter() is modified with a new argument
    only_consts: When this is true, do not add conversion objects for
    items that are const_for_execution. This option is chosen for
    comparison operators. However, string-producing functions such as
    concatenations still have this argument as false, meaning that
    conversion objects are still used.
    
    The function Arg_comparator::compare_string() is enhanced so that
    arguments with character set different from the comparison character
    set are converted before the comparison. Another fix is also applied
    to this function: Instead of adjusting String objects before and after
    comparison, function sortcmp() is replaced with the underlying function
    strnncollsp(), which operates on the attributes of the String objects.
    
    Related to this problem is also a problem with concatenation and other
    string-producing operations, which may generate characters outside
    the expected character set when one of the arguments is a binary string
    (bug 32501472). This problem is fixed by adding calls to
    is_valid_string() for all arguments that are binary strings and should
    be interpreted as a restricted character set. All such arguments are
    now evaluated with a new function Item_str_func::eval_string_arg().
    The function handles simple copy, character set conversion and
    character set validation, as necessary by metadata and query properties.
    
    The third bug handled here is handling of ASCII characters beyond the
    7-bit characters. These are invalid, and have already been rejected,
    except when the source is a binary string (bug 21774967 started issuing
    warnings for such characters). We now  reject such strings, even when
    coming from binary strings. The characters are now filtered out in
    is_valid_string(), and the code to issue warnings formerly placed in
    field_well_formed_copy_nchars() is now removed.
    
    Bug#30746908 is fixed because we no longer need to generate
    Item_func_conv objects for comparisons.
    
    convert_zerofill_number_to_string() needed an adjustement: Strings that
    are substituted for zero-filled numbers have derivation set to NUMERIC,
    so that they are accepted by the new asserts in concatenation functions.
    
    Finally, a problem with the return value of the function
    my_well_formed_len_ascii() is fixed.
    
    There are several test changes due to this fix, here are some
    explanations:
    
    - main.subquery_sj_*
    
      Conversion node is removed from column t1.var_10_latin.
      Conversion is performed in comparison function.
    
    - main.subquery_sj_innodb_*
    
      Conversion node is removed from column t2.c1.
    
    - main.myisam_mrr_*
    
      Conversion node is removed from column t3.c1.
    
    - opt_trace.general2_no_prot
    
      Conversion node is removed from column table2.col_varchar_10_latin1_key.
      Because of this, range analysis was made possible.
    
    - main.func_concat
    
      Conversion node is removed from concat expression.
    
    - main.join_outer*
    
      Conversion node is removed from column t6b.col_varchar_10_latin1_key.
    
    - main.range_*
    
      Character CHAR(128) incompatible with UTF8 replaced with CHAR(127)
    
    - main.opt_hints_join_order
    
      An index was enabled when Item_func_conv object was removed.
      (However, it is rejected later due to being in the wrong character set).
    
    - main.derived
    
      Row count estimate is improved when Item_func_conv object is removed.
    
    - main.ctype_ucs
    
      Because the character set check is delayed until runtime, one error
      message is changed and one query now succeds.
    
      In another case with non-standard connection character set, an
      index lookup against a DATE column is restored.
    
    - i_main.subquery-bug24595581
    
      Since character set check is now delayed until execution, test case succeeds.
    
    - main.insert_update
    
      Added errors when rejecting invalid ASCII strings for insert and update.
      Added errors when rejecting invalid ASCII strings for comparison.
    
    - main.hash_join
    
      Conversion node is removed for column t2.col1. In addition, row estimate
      is corrected since it does not have to deal with the conversion node.
    
    Change-Id: I6623e9d6939754d614a2786ec436bde3a952f8c9
Loading