-
Roy Lyseng authored
Bug#32501472: Concatenation may create invalid UTF8 strings Bug#24847620: Illegal ASCII values accepted into ASCII columns Bug#30746908: Assertion `(tlen % 4) == 0' in my_strnncollsp_utf32 The basic problem to be solved here is bug 32244631 which is a regression from WL#9384. The problem is a comparison between a column and a system variable. Before WL#9384, the system variable was deemed a const value, but due to this worklog, it is now deemed const_for_execution. The collation aggregation performed by a comparison operation operates differently on these two types, so that we get an error with incompatible character sets. This is due to setting up Item_func_conv wrapper object with a "safe" parameter, which fails in the latter case. It may be possible to adjust the use of the "safe" parameter, however there is also a good reason to ditch the Item_func_conv object altogether. Instead, the executor can perform dynamic conversion of character strings during execution, and only issue an error message when that conversion fails. One example is comparison of a latin1 string and a utf8 string, and the collation aggregator has decided that the comparison should be using a latin1 collation. If the utf8 strings contains only latin1 characters, we will now allow this comparison. We will report an error only if the utf8 string contains characters outside the latin1 character set. There are also other benefits with removing Item_func_conv objects: These objects are added during preparation, thus some properties such as filter properties for histograms are currently not propagated through Item_func_conv objects. Without these objects, we get improved statistics for histogram calculation. The function agg_item_set_converter() is modified with a new argument only_consts: When this is true, do not add conversion objects for items that are const_for_execution. This option is chosen for comparison operators. However, string-producing functions such as concatenations still have this argument as false, meaning that conversion objects are still used. The function Arg_comparator::compare_string() is enhanced so that arguments with character set different from the comparison character set are converted before the comparison. Another fix is also applied to this function: Instead of adjusting String objects before and after comparison, function sortcmp() is replaced with the underlying function strnncollsp(), which operates on the attributes of the String objects. Related to this problem is also a problem with concatenation and other string-producing operations, which may generate characters outside the expected character set when one of the arguments is a binary string (bug 32501472). This problem is fixed by adding calls to is_valid_string() for all arguments that are binary strings and should be interpreted as a restricted character set. All such arguments are now evaluated with a new function Item_str_func::eval_string_arg(). The function handles simple copy, character set conversion and character set validation, as necessary by metadata and query properties. The third bug handled here is handling of ASCII characters beyond the 7-bit characters. These are invalid, and have already been rejected, except when the source is a binary string (bug 21774967 started issuing warnings for such characters). We now reject such strings, even when coming from binary strings. The characters are now filtered out in is_valid_string(), and the code to issue warnings formerly placed in field_well_formed_copy_nchars() is now removed. Bug#30746908 is fixed because we no longer need to generate Item_func_conv objects for comparisons. convert_zerofill_number_to_string() needed an adjustement: Strings that are substituted for zero-filled numbers have derivation set to NUMERIC, so that they are accepted by the new asserts in concatenation functions. Finally, a problem with the return value of the function my_well_formed_len_ascii() is fixed. There are several test changes due to this fix, here are some explanations: - main.subquery_sj_* Conversion node is removed from column t1.var_10_latin. Conversion is performed in comparison function. - main.subquery_sj_innodb_* Conversion node is removed from column t2.c1. - main.myisam_mrr_* Conversion node is removed from column t3.c1. - opt_trace.general2_no_prot Conversion node is removed from column table2.col_varchar_10_latin1_key. Because of this, range analysis was made possible. - main.func_concat Conversion node is removed from concat expression. - main.join_outer* Conversion node is removed from column t6b.col_varchar_10_latin1_key. - main.range_* Character CHAR(128) incompatible with UTF8 replaced with CHAR(127) - main.opt_hints_join_order An index was enabled when Item_func_conv object was removed. (However, it is rejected later due to being in the wrong character set). - main.derived Row count estimate is improved when Item_func_conv object is removed. - main.ctype_ucs Because the character set check is delayed until runtime, one error message is changed and one query now succeds. In another case with non-standard connection character set, an index lookup against a DATE column is restored. - i_main.subquery-bug24595581 Since character set check is now delayed until execution, test case succeeds. - main.insert_update Added errors when rejecting invalid ASCII strings for insert and update. Added errors when rejecting invalid ASCII strings for comparison. - main.hash_join Conversion node is removed for column t2.col1. In addition, row estimate is corrected since it does not have to deal with the conversion node. Change-Id: I6623e9d6939754d614a2786ec436bde3a952f8c9
Roy Lyseng authoredBug#32501472: Concatenation may create invalid UTF8 strings Bug#24847620: Illegal ASCII values accepted into ASCII columns Bug#30746908: Assertion `(tlen % 4) == 0' in my_strnncollsp_utf32 The basic problem to be solved here is bug 32244631 which is a regression from WL#9384. The problem is a comparison between a column and a system variable. Before WL#9384, the system variable was deemed a const value, but due to this worklog, it is now deemed const_for_execution. The collation aggregation performed by a comparison operation operates differently on these two types, so that we get an error with incompatible character sets. This is due to setting up Item_func_conv wrapper object with a "safe" parameter, which fails in the latter case. It may be possible to adjust the use of the "safe" parameter, however there is also a good reason to ditch the Item_func_conv object altogether. Instead, the executor can perform dynamic conversion of character strings during execution, and only issue an error message when that conversion fails. One example is comparison of a latin1 string and a utf8 string, and the collation aggregator has decided that the comparison should be using a latin1 collation. If the utf8 strings contains only latin1 characters, we will now allow this comparison. We will report an error only if the utf8 string contains characters outside the latin1 character set. There are also other benefits with removing Item_func_conv objects: These objects are added during preparation, thus some properties such as filter properties for histograms are currently not propagated through Item_func_conv objects. Without these objects, we get improved statistics for histogram calculation. The function agg_item_set_converter() is modified with a new argument only_consts: When this is true, do not add conversion objects for items that are const_for_execution. This option is chosen for comparison operators. However, string-producing functions such as concatenations still have this argument as false, meaning that conversion objects are still used. The function Arg_comparator::compare_string() is enhanced so that arguments with character set different from the comparison character set are converted before the comparison. Another fix is also applied to this function: Instead of adjusting String objects before and after comparison, function sortcmp() is replaced with the underlying function strnncollsp(), which operates on the attributes of the String objects. Related to this problem is also a problem with concatenation and other string-producing operations, which may generate characters outside the expected character set when one of the arguments is a binary string (bug 32501472). This problem is fixed by adding calls to is_valid_string() for all arguments that are binary strings and should be interpreted as a restricted character set. All such arguments are now evaluated with a new function Item_str_func::eval_string_arg(). The function handles simple copy, character set conversion and character set validation, as necessary by metadata and query properties. The third bug handled here is handling of ASCII characters beyond the 7-bit characters. These are invalid, and have already been rejected, except when the source is a binary string (bug 21774967 started issuing warnings for such characters). We now reject such strings, even when coming from binary strings. The characters are now filtered out in is_valid_string(), and the code to issue warnings formerly placed in field_well_formed_copy_nchars() is now removed. Bug#30746908 is fixed because we no longer need to generate Item_func_conv objects for comparisons. convert_zerofill_number_to_string() needed an adjustement: Strings that are substituted for zero-filled numbers have derivation set to NUMERIC, so that they are accepted by the new asserts in concatenation functions. Finally, a problem with the return value of the function my_well_formed_len_ascii() is fixed. There are several test changes due to this fix, here are some explanations: - main.subquery_sj_* Conversion node is removed from column t1.var_10_latin. Conversion is performed in comparison function. - main.subquery_sj_innodb_* Conversion node is removed from column t2.c1. - main.myisam_mrr_* Conversion node is removed from column t3.c1. - opt_trace.general2_no_prot Conversion node is removed from column table2.col_varchar_10_latin1_key. Because of this, range analysis was made possible. - main.func_concat Conversion node is removed from concat expression. - main.join_outer* Conversion node is removed from column t6b.col_varchar_10_latin1_key. - main.range_* Character CHAR(128) incompatible with UTF8 replaced with CHAR(127) - main.opt_hints_join_order An index was enabled when Item_func_conv object was removed. (However, it is rejected later due to being in the wrong character set). - main.derived Row count estimate is improved when Item_func_conv object is removed. - main.ctype_ucs Because the character set check is delayed until runtime, one error message is changed and one query now succeds. In another case with non-standard connection character set, an index lookup against a DATE column is restored. - i_main.subquery-bug24595581 Since character set check is now delayed until execution, test case succeeds. - main.insert_update Added errors when rejecting invalid ASCII strings for insert and update. Added errors when rejecting invalid ASCII strings for comparison. - main.hash_join Conversion node is removed for column t2.col1. In addition, row estimate is corrected since it does not have to deal with the conversion node. Change-Id: I6623e9d6939754d614a2786ec436bde3a952f8c9
Loading