-
Erik Froseth authored
If the hash table in a hash join goes full during a query, the hash join will write both inputs out to temporary files on disk in order to process both inputs in smaller chunks. This means that both inputs will be read to the end before it starts producing output rows. This can be sub-optimal if the query has a small LIMIT; the output rows could be produced a lot earlier if we did not read both inputs to the end before starting to produce rows. The test case provided in the bug report is a query with 13 levels of nested hash join, where each hash join is the Cartesian product. As the hash joins starts writing both inputs to disk, it ends up taking a long time and consuming a lot of disk space. To mitigate this problem, the hash join is not allowed to spill to disk if there is a LIMIT in the query. When not spilling to disk, the hash join works like this: 1. Read as many rows as possible from the build input into an in-memory hash table. 2. When the hash table is full (we have reached the limit set by the system variable join_buffer_size), start reading from the beginning of the probe input, probing for matches in the hash table. Output a row for each match found. 3. When the probe input is empty, see if there are any remaining rows in the build input. If so, clear the in-memory hash table and go to step 1, continuing from the build input where we stopped the last time. If not, the join is done. This will cause the hash join to potentially produce rows a lot earlier, which dramatically reduces execution time in certain cases. There are however two cases where we always allow spill to disk even if the query contains LIMIT, and that is if we either have grouping or sorting in the query. In those cases, the iterator above us will most likely consume the entire result set anyways. Change-Id: I481208c0fe2f5b2f437a1647ec90af6650c5514f
Erik Froseth authoredIf the hash table in a hash join goes full during a query, the hash join will write both inputs out to temporary files on disk in order to process both inputs in smaller chunks. This means that both inputs will be read to the end before it starts producing output rows. This can be sub-optimal if the query has a small LIMIT; the output rows could be produced a lot earlier if we did not read both inputs to the end before starting to produce rows. The test case provided in the bug report is a query with 13 levels of nested hash join, where each hash join is the Cartesian product. As the hash joins starts writing both inputs to disk, it ends up taking a long time and consuming a lot of disk space. To mitigate this problem, the hash join is not allowed to spill to disk if there is a LIMIT in the query. When not spilling to disk, the hash join works like this: 1. Read as many rows as possible from the build input into an in-memory hash table. 2. When the hash table is full (we have reached the limit set by the system variable join_buffer_size), start reading from the beginning of the probe input, probing for matches in the hash table. Output a row for each match found. 3. When the probe input is empty, see if there are any remaining rows in the build input. If so, clear the in-memory hash table and go to step 1, continuing from the build input where we stopped the last time. If not, the join is done. This will cause the hash join to potentially produce rows a lot earlier, which dramatically reduces execution time in certain cases. There are however two cases where we always allow spill to disk even if the query contains LIMIT, and that is if we either have grouping or sorting in the query. In those cases, the iterator above us will most likely consume the entire result set anyways. Change-Id: I481208c0fe2f5b2f437a1647ec90af6650c5514f
Loading