mysql-test/include/diff_tables.inc · 98b2ccb470de120d36bc4a623c814cdfded958ec · Rasoul Jahanshahi / Mysql Server

Dec 18, 2019

Bug #30400985: SORT SHORT BLOBS AS ADDON FIELDS · e0e13703

Steinar H. Gunderson authored Dec 18, 2019

Currently, including any kind of blob as payload in an ordering operation makes
us revert to sorting row IDs only instead of complete rows. This means we need
a second pass to fetch the actual rows from disk after the sort itself has
completed.

Most blobs, however, arise from JSON and GEOMETRY columns (we implement them
internally using LONGBLOB), and even though they can theoretically be 4 GB,
they almost never are. Thus, loosen up the restriction and make them into
packed addons. We do the same for 1- and 2-byte blobs (tinyblob, blob),
while leaving 3- and 4-byte blobs (mediumblob, longblob) with the existing
behavior.

Note that in the case of multi-megabyte JSON or GEOMETRY fields and small sort
buffers, we will now return "out of memory" during runtime instead of sorting
them. However, the user can increase the sort buffer size, and it is generally
not considered unreasonable to ask the user for a sort buffer large enough to
hold a few of their JSON rows (just like we require a sort buffer large enough
to hold a few VARCHAR(60000) rows if the table has any), especially as we've
never given a guarantee to the contrary.

A single-threaded benchmark sorting a JSON column (as a passive field, ie.,
not sorting _on_ a JSON column) showed a significant speed increase (+83% qps).

The benchmark was run using the sysbench JSON schema, ie.:

CREATE TABLE sbtestNN (
id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
doc JSON NOT NULL, PRIMARY KEY (id));

The JSON column set up by sysbench is 255 bytes long in text format. The query
used is a slight modification on a standard sysbench JSON query:

SELECT JSON_EXTRACT(doc,'$.name') FROM sbtestNN WHERE id BETWEEN ? AND ? ORDER BY (id+1)

The id+1 is to force a sort that includes the JSON column (without the
addition, we'd simply scan along the primary key). In the old code,
this sorts row IDs (requiring us to go down to the table in the second
phase), but in the new, it sorts the entire row.

This deprecates the system variable max_length_for_sort_data (not to be
confused with max_sort_length, which governs truncation length for PAD SPACE
and BLOB sort keys); it is not used for anything anymore.

Change-Id: I5f74d81cbc2815b8b539d8da92285499fda25e33

e0e13703

Bug #30400985: SORT SHORT BLOBS AS ADDON FIELDS