Skip to content
  • Steinar H. Gunderson's avatar
    e0e13703
    Bug #30400985: SORT SHORT BLOBS AS ADDON FIELDS · e0e13703
    Steinar H. Gunderson authored
    Currently, including any kind of blob as payload in an ordering operation makes
    us revert to sorting row IDs only instead of complete rows. This means we need
    a second pass to fetch the actual rows from disk after the sort itself has
    completed.
    
    Most blobs, however, arise from JSON and GEOMETRY columns (we implement them
    internally using LONGBLOB), and even though they can theoretically be 4 GB,
    they almost never are. Thus, loosen up the restriction and make them into
    packed addons. We do the same for 1- and 2-byte blobs (tinyblob, blob),
    while leaving 3- and 4-byte blobs (mediumblob, longblob) with the existing
    behavior.
    
    Note that in the case of multi-megabyte JSON or GEOMETRY fields and small sort
    buffers, we will now return "out of memory" during runtime instead of sorting
    them. However, the user can increase the sort buffer size, and it is generally
    not considered unreasonable to ask the user for a sort buffer large enough to
    hold a few of their JSON rows (just like we require a sort buffer large enough
    to hold a few VARCHAR(60000) rows if the table has any), especially as we've
    never given a guarantee to the contrary.
    
    A single-threaded benchmark sorting a JSON column (as a passive field, ie.,
    not sorting _on_ a JSON column) showed a significant speed increase (+83% qps).
    
    The benchmark was run using the sysbench JSON schema, ie.:
    
      CREATE TABLE sbtestNN (
        id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
        doc JSON NOT NULL, PRIMARY KEY (id));
    
    The JSON column set up by sysbench is 255 bytes long in text format. The query
    used is a slight modification on a standard sysbench JSON query:
    
      SELECT JSON_EXTRACT(doc,'$.name') FROM sbtestNN WHERE id BETWEEN ? AND ? ORDER BY (id+1)
    
    The id+1 is to force a sort that includes the JSON column (without the
    addition, we'd simply scan along the primary key). In the old code,
    this sorts row IDs (requiring us to go down to the table in the second
    phase), but in the new, it sorts the entire row.
    
    This deprecates the system variable max_length_for_sort_data (not to be
    confused with max_sort_length, which governs truncation length for PAD SPACE
    and BLOB sort keys); it is not used for anything anymore.
    
    Change-Id: I5f74d81cbc2815b8b539d8da92285499fda25e33
    e0e13703
    Bug #30400985: SORT SHORT BLOBS AS ADDON FIELDS
    Steinar H. Gunderson authored
    Currently, including any kind of blob as payload in an ordering operation makes
    us revert to sorting row IDs only instead of complete rows. This means we need
    a second pass to fetch the actual rows from disk after the sort itself has
    completed.
    
    Most blobs, however, arise from JSON and GEOMETRY columns (we implement them
    internally using LONGBLOB), and even though they can theoretically be 4 GB,
    they almost never are. Thus, loosen up the restriction and make them into
    packed addons. We do the same for 1- and 2-byte blobs (tinyblob, blob),
    while leaving 3- and 4-byte blobs (mediumblob, longblob) with the existing
    behavior.
    
    Note that in the case of multi-megabyte JSON or GEOMETRY fields and small sort
    buffers, we will now return "out of memory" during runtime instead of sorting
    them. However, the user can increase the sort buffer size, and it is generally
    not considered unreasonable to ask the user for a sort buffer large enough to
    hold a few of their JSON rows (just like we require a sort buffer large enough
    to hold a few VARCHAR(60000) rows if the table has any), especially as we've
    never given a guarantee to the contrary.
    
    A single-threaded benchmark sorting a JSON column (as a passive field, ie.,
    not sorting _on_ a JSON column) showed a significant speed increase (+83% qps).
    
    The benchmark was run using the sysbench JSON schema, ie.:
    
      CREATE TABLE sbtestNN (
        id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
        doc JSON NOT NULL, PRIMARY KEY (id));
    
    The JSON column set up by sysbench is 255 bytes long in text format. The query
    used is a slight modification on a standard sysbench JSON query:
    
      SELECT JSON_EXTRACT(doc,'$.name') FROM sbtestNN WHERE id BETWEEN ? AND ? ORDER BY (id+1)
    
    The id+1 is to force a sort that includes the JSON column (without the
    addition, we'd simply scan along the primary key). In the old code,
    this sorts row IDs (requiring us to go down to the table in the second
    phase), but in the new, it sorts the entire row.
    
    This deprecates the system variable max_length_for_sort_data (not to be
    confused with max_sort_length, which governs truncation length for PAD SPACE
    and BLOB sort keys); it is not used for anything anymore.
    
    Change-Id: I5f74d81cbc2815b8b539d8da92285499fda25e33
Loading