-
Steinar H. Gunderson authored
Currently, including any kind of blob as payload in an ordering operation makes us revert to sorting row IDs only instead of complete rows. This means we need a second pass to fetch the actual rows from disk after the sort itself has completed. Most blobs, however, arise from JSON and GEOMETRY columns (we implement them internally using LONGBLOB), and even though they can theoretically be 4 GB, they almost never are. Thus, loosen up the restriction and make them into packed addons. We do the same for 1- and 2-byte blobs (tinyblob, blob), while leaving 3- and 4-byte blobs (mediumblob, longblob) with the existing behavior. Note that in the case of multi-megabyte JSON or GEOMETRY fields and small sort buffers, we will now return "out of memory" during runtime instead of sorting them. However, the user can increase the sort buffer size, and it is generally not considered unreasonable to ask the user for a sort buffer large enough to hold a few of their JSON rows (just like we require a sort buffer large enough to hold a few VARCHAR(60000) rows if the table has any), especially as we've never given a guarantee to the contrary. A single-threaded benchmark sorting a JSON column (as a passive field, ie., not sorting _on_ a JSON column) showed a significant speed increase (+83% qps). The benchmark was run using the sysbench JSON schema, ie.: CREATE TABLE sbtestNN ( id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT, doc JSON NOT NULL, PRIMARY KEY (id)); The JSON column set up by sysbench is 255 bytes long in text format. The query used is a slight modification on a standard sysbench JSON query: SELECT JSON_EXTRACT(doc,'$.name') FROM sbtestNN WHERE id BETWEEN ? AND ? ORDER BY (id+1) The id+1 is to force a sort that includes the JSON column (without the addition, we'd simply scan along the primary key). In the old code, this sorts row IDs (requiring us to go down to the table in the second phase), but in the new, it sorts the entire row. This deprecates the system variable max_length_for_sort_data (not to be confused with max_sort_length, which governs truncation length for PAD SPACE and BLOB sort keys); it is not used for anything anymore. Change-Id: I5f74d81cbc2815b8b539d8da92285499fda25e33
Steinar H. Gunderson authoredCurrently, including any kind of blob as payload in an ordering operation makes us revert to sorting row IDs only instead of complete rows. This means we need a second pass to fetch the actual rows from disk after the sort itself has completed. Most blobs, however, arise from JSON and GEOMETRY columns (we implement them internally using LONGBLOB), and even though they can theoretically be 4 GB, they almost never are. Thus, loosen up the restriction and make them into packed addons. We do the same for 1- and 2-byte blobs (tinyblob, blob), while leaving 3- and 4-byte blobs (mediumblob, longblob) with the existing behavior. Note that in the case of multi-megabyte JSON or GEOMETRY fields and small sort buffers, we will now return "out of memory" during runtime instead of sorting them. However, the user can increase the sort buffer size, and it is generally not considered unreasonable to ask the user for a sort buffer large enough to hold a few of their JSON rows (just like we require a sort buffer large enough to hold a few VARCHAR(60000) rows if the table has any), especially as we've never given a guarantee to the contrary. A single-threaded benchmark sorting a JSON column (as a passive field, ie., not sorting _on_ a JSON column) showed a significant speed increase (+83% qps). The benchmark was run using the sysbench JSON schema, ie.: CREATE TABLE sbtestNN ( id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT, doc JSON NOT NULL, PRIMARY KEY (id)); The JSON column set up by sysbench is 255 bytes long in text format. The query used is a slight modification on a standard sysbench JSON query: SELECT JSON_EXTRACT(doc,'$.name') FROM sbtestNN WHERE id BETWEEN ? AND ? ORDER BY (id+1) The id+1 is to force a sort that includes the JSON column (without the addition, we'd simply scan along the primary key). In the old code, this sorts row IDs (requiring us to go down to the table in the second phase), but in the new, it sorts the entire row. This deprecates the system variable max_length_for_sort_data (not to be confused with max_sort_length, which governs truncation length for PAD SPACE and BLOB sort keys); it is not used for anything anymore. Change-Id: I5f74d81cbc2815b8b539d8da92285499fda25e33
Loading