-
Kailasnath Nagarkar authored
RESPECTED ISSUE: group_concat function returns data in bytes upto group_concat_max_len. However, output of the select query when used with union is shorter in terms of number of characters than the query without union. Also, there is difference in metadata for the two queries. Case 1: select group_concat(distinct ' ', col1) from table. In this case data is returned upto group_concat_max_len bytes. Metadata: When this select query is used to create table, the datatype for the column created for group_concat() is text. This is because, Item_func_group_concat::make_string_field() function calculates maximum number of characters based on maximum byte length divided by mbminlen. Case 2: select group_concat(distinct ' ', col1) from table as c union select '' as c; In this case also group_concat returns data upto group_concat_max_len bytes. However, in this case, Field_varstring::store() gets called. This function calls field_well_formed_copy_nchars(). The nchars parameter passed to this fucntion is field_length/mbmaxlen. Therefore the output is shorter. (e.g. for utf8 1024/3 = 341 characters) Metadata: When this select query is used to create table, the datatype for the column created for group_concat() is VARCHAR(341). This is because, Item::make_string_field() function calculates maximum number of characters based on maximum byte length divided by mbmaxlen. In both the cases, if maximum number of characters exceed 512 characters limit, then TEXT field is created, else VARCHAR field is created. However, note that, to calculate maximum number of characters, in case 1, mbminlen is used, whereas in case 2, mbmaxlen is used. Therefore, there is inconsistency in metadata for the two queries. SOLUTION: max_length for group concat was set to group_concat_max_len. Changed this to first calculate max chars using mbminlen and then set max_length accounting for mbmaxlen for max chars. So, max_length = (group_concat_max_len/mbminlen) * mbmaxlen Added new variable group_concat_max_len to class Item_func_group_concat. This variable will hold actual value of group_concat_max_len. The output of group_concat will be restricted to this value instead of max_length. As mentioned above, instead of mbmaxlen, we now use mbminlen to determine maximum number of characters. This provides ability to store more characters in case of charsets that support variable length characters in case the data has characters that have length less than mbmaxlen. The change in metadata for first test is since group_concat_max_len is 400, and charset is utf8mb4, mbmaxlen is 4, hence length is 1600. The change in metadata for second test is since we are now using set_data_type_blob() which sets type as MYSQL_TYPE_LONG_BLOB (251).
Kailasnath Nagarkar authoredRESPECTED ISSUE: group_concat function returns data in bytes upto group_concat_max_len. However, output of the select query when used with union is shorter in terms of number of characters than the query without union. Also, there is difference in metadata for the two queries. Case 1: select group_concat(distinct ' ', col1) from table. In this case data is returned upto group_concat_max_len bytes. Metadata: When this select query is used to create table, the datatype for the column created for group_concat() is text. This is because, Item_func_group_concat::make_string_field() function calculates maximum number of characters based on maximum byte length divided by mbminlen. Case 2: select group_concat(distinct ' ', col1) from table as c union select '' as c; In this case also group_concat returns data upto group_concat_max_len bytes. However, in this case, Field_varstring::store() gets called. This function calls field_well_formed_copy_nchars(). The nchars parameter passed to this fucntion is field_length/mbmaxlen. Therefore the output is shorter. (e.g. for utf8 1024/3 = 341 characters) Metadata: When this select query is used to create table, the datatype for the column created for group_concat() is VARCHAR(341). This is because, Item::make_string_field() function calculates maximum number of characters based on maximum byte length divided by mbmaxlen. In both the cases, if maximum number of characters exceed 512 characters limit, then TEXT field is created, else VARCHAR field is created. However, note that, to calculate maximum number of characters, in case 1, mbminlen is used, whereas in case 2, mbmaxlen is used. Therefore, there is inconsistency in metadata for the two queries. SOLUTION: max_length for group concat was set to group_concat_max_len. Changed this to first calculate max chars using mbminlen and then set max_length accounting for mbmaxlen for max chars. So, max_length = (group_concat_max_len/mbminlen) * mbmaxlen Added new variable group_concat_max_len to class Item_func_group_concat. This variable will hold actual value of group_concat_max_len. The output of group_concat will be restricted to this value instead of max_length. As mentioned above, instead of mbmaxlen, we now use mbminlen to determine maximum number of characters. This provides ability to store more characters in case of charsets that support variable length characters in case the data has characters that have length less than mbmaxlen. The change in metadata for first test is since group_concat_max_len is 400, and charset is utf8mb4, mbmaxlen is 4, hence length is 1600. The change in metadata for second test is since we are now using set_data_type_blob() which sets type as MYSQL_TYPE_LONG_BLOB (251).
Loading