Skip to content
  • Steinar H. Gunderson's avatar
    e0315768
    Bug #24823885: PERFORMANCE REGRESSION WHEN CHANGING CHARACTER SET TO UTF8MB4 · e0315768
    Steinar H. Gunderson authored
    Increase max_length_for_sort_data from 1024 to 4096.
    
    This parameter controls the threshold for when we stop sorting full rows,
    and instead just sort the sort key plus a row ID, and then go back to the
    table to pick up those rows afterwards.
    
    Since this parameter was introduced and got its current value in 2003,
    several important things have happened:
    
     - Computers have gotten more RAM, so that sort buffers can be larger.
     - We have switched from MyISAM to InnoDB, where picking out a row by ID
       is much more expensive.
     - Unicode collations, for which we grossly overestimate row size in the
       typical case (we assume the string is completely filled with maximum-length
       UTF-8 characters, whereas the typical is more like a 10–20% fill grade
       with ASCII), have become commonplace.
    
    Thus, increase this value; the actual value chosen is a bit arbitrary and would
    benefit from actual benchmarks across a wide variety of real loads, but it's
    obviously a step in the right direction.
    
    sysbench result goes from 7080 -> 9078 tps (+28.2%).
    
    Change-Id: I031ded33e5a18ca903b4549b5692563137672408
    e0315768
    Bug #24823885: PERFORMANCE REGRESSION WHEN CHANGING CHARACTER SET TO UTF8MB4
    Steinar H. Gunderson authored
    Increase max_length_for_sort_data from 1024 to 4096.
    
    This parameter controls the threshold for when we stop sorting full rows,
    and instead just sort the sort key plus a row ID, and then go back to the
    table to pick up those rows afterwards.
    
    Since this parameter was introduced and got its current value in 2003,
    several important things have happened:
    
     - Computers have gotten more RAM, so that sort buffers can be larger.
     - We have switched from MyISAM to InnoDB, where picking out a row by ID
       is much more expensive.
     - Unicode collations, for which we grossly overestimate row size in the
       typical case (we assume the string is completely filled with maximum-length
       UTF-8 characters, whereas the typical is more like a 10–20% fill grade
       with ASCII), have become commonplace.
    
    Thus, increase this value; the actual value chosen is a bit arbitrary and would
    benefit from actual benchmarks across a wide variety of real loads, but it's
    obviously a step in the right direction.
    
    sysbench result goes from 7080 -> 9078 tps (+28.2%).
    
    Change-Id: I031ded33e5a18ca903b4549b5692563137672408
Loading