Skip to content
  • Kailasnath Nagarkar's avatar
    460dc442
    Bug #25050090: LIMITATION @@GROUP_CONCAT_MAX_LEN IS NOT · 460dc442
    Kailasnath Nagarkar authored
                   RESPECTED
    
    ISSUE:
    group_concat function returns data in bytes upto
    group_concat_max_len.
    However, output of the select query when used with union
    is shorter in terms of number of characters than the query
    without union.
    
    Also, there is difference in metadata for the two queries.
    
    Case 1: select group_concat(distinct ' ', col1)
            from table.
    
    In this case data is returned upto group_concat_max_len bytes.
    
    Metadata: When this select query is used to create table,
    the datatype for the column created for group_concat() is
    text. This is because, Item_func_group_concat::make_string_field()
    function calculates maximum number of characters based on
    maximum byte length divided by mbminlen.
    
    Case 2: select group_concat(distinct ' ', col1)
            from table as c
            union
            select '' as c;
    
    In this case also group_concat returns data upto
    group_concat_max_len bytes.
    However, in this case, Field_varstring::store()
    gets called.
    This function calls field_well_formed_copy_nchars().
    The nchars parameter passed to this fucntion is
    field_length/mbmaxlen.
    Therefore the output is shorter.
    (e.g. for utf8 1024/3 = 341 characters)
    
    Metadata: When this select query is used to create table,
    the datatype for the column created for group_concat() is
    VARCHAR(341). This is because, Item::make_string_field()
    function calculates maximum number of characters based on
    maximum byte length divided by mbmaxlen.
    
    In both the cases, if maximum number of characters exceed 512
    characters limit, then TEXT field is created, else VARCHAR
    field is created.
    
    However, note that, to calculate maximum number of
    characters, in case 1,  mbminlen is used, whereas in
    case 2, mbmaxlen is used.
    
    Therefore, there is inconsistency in metadata for the two
    queries.
    
    SOLUTION:
    max_length for group concat was set to group_concat_max_len.
    Changed this to first calculate max chars using mbminlen
    and then set max_length accounting for mbmaxlen for max chars.
    So, max_length = (group_concat_max_len/mbminlen) * mbmaxlen
    Added new variable group_concat_max_len to class Item_func_group_concat.
    This variable will hold actual value of group_concat_max_len.
    The output of group_concat will be restricted to this value instead
    of max_length.
    
    As mentioned above, instead of mbmaxlen, we now use mbminlen
    to determine maximum number of characters.
    This provides ability to store more characters in case of
    charsets that support variable length characters in case the
    data has characters that have length less than mbmaxlen.
    
    
    
    The change in metadata for first test is since group_concat_max_len
    is 400, and charset is utf8mb4, mbmaxlen is 4, hence length is 1600.
    
    The change in metadata for second test is since we are now using
    set_data_type_blob() which sets type as MYSQL_TYPE_LONG_BLOB (251).
    460dc442
    Bug #25050090: LIMITATION @@GROUP_CONCAT_MAX_LEN IS NOT
    Kailasnath Nagarkar authored
                   RESPECTED
    
    ISSUE:
    group_concat function returns data in bytes upto
    group_concat_max_len.
    However, output of the select query when used with union
    is shorter in terms of number of characters than the query
    without union.
    
    Also, there is difference in metadata for the two queries.
    
    Case 1: select group_concat(distinct ' ', col1)
            from table.
    
    In this case data is returned upto group_concat_max_len bytes.
    
    Metadata: When this select query is used to create table,
    the datatype for the column created for group_concat() is
    text. This is because, Item_func_group_concat::make_string_field()
    function calculates maximum number of characters based on
    maximum byte length divided by mbminlen.
    
    Case 2: select group_concat(distinct ' ', col1)
            from table as c
            union
            select '' as c;
    
    In this case also group_concat returns data upto
    group_concat_max_len bytes.
    However, in this case, Field_varstring::store()
    gets called.
    This function calls field_well_formed_copy_nchars().
    The nchars parameter passed to this fucntion is
    field_length/mbmaxlen.
    Therefore the output is shorter.
    (e.g. for utf8 1024/3 = 341 characters)
    
    Metadata: When this select query is used to create table,
    the datatype for the column created for group_concat() is
    VARCHAR(341). This is because, Item::make_string_field()
    function calculates maximum number of characters based on
    maximum byte length divided by mbmaxlen.
    
    In both the cases, if maximum number of characters exceed 512
    characters limit, then TEXT field is created, else VARCHAR
    field is created.
    
    However, note that, to calculate maximum number of
    characters, in case 1,  mbminlen is used, whereas in
    case 2, mbmaxlen is used.
    
    Therefore, there is inconsistency in metadata for the two
    queries.
    
    SOLUTION:
    max_length for group concat was set to group_concat_max_len.
    Changed this to first calculate max chars using mbminlen
    and then set max_length accounting for mbmaxlen for max chars.
    So, max_length = (group_concat_max_len/mbminlen) * mbmaxlen
    Added new variable group_concat_max_len to class Item_func_group_concat.
    This variable will hold actual value of group_concat_max_len.
    The output of group_concat will be restricted to this value instead
    of max_length.
    
    As mentioned above, instead of mbmaxlen, we now use mbminlen
    to determine maximum number of characters.
    This provides ability to store more characters in case of
    charsets that support variable length characters in case the
    data has characters that have length less than mbmaxlen.
    
    
    
    The change in metadata for first test is since group_concat_max_len
    is 400, and charset is utf8mb4, mbmaxlen is 4, hence length is 1600.
    
    The change in metadata for second test is since we are now using
    set_data_type_blob() which sets type as MYSQL_TYPE_LONG_BLOB (251).
Loading