-
Erik Froseth authored
This patch adds hash joins to MySQL. We have chosen to implement hash join with spill to disk if the input(s) become too big to fit into memory. We start out by trying to do everything in memory. If we run out of memory, we partition each input into smaller partitions, also called hash join chunk. Each pair of partitions is then processed as a normal in-memory hash join. The amount of memory available is controlled by the variable "join_buffer_size". In this round, we will replace block-nested loop (BNL) (both with and without join conditions) with hash join whenever possible. That is, if the join condition is an equi-join where each side of the condition refers to each side of the join, and the plan can be executed using the iterator execution engine, we will replace BNL with hash join. This version of hash join only supports inner joins, so outer joins will be available at a later stage. There have been no changes to the join optimizer, so the optimizer is still making execution plans thinking it is going to execute a BNL. This should clearly be changed, but that is left to a later patch/worklog. Note that we are relying on the new iterator execution engine to execute hash joins. This means that any query that executes using the old execution engine won't be able to execute hash joins. We have benchmarked hash join against both block-nested loop and index lookups. Hash join is faster than BNL in almost all cases, especially with larger inputs; in many queries, hash join completes in seconds where BNL is still running after several hours. Compared with index lookups, we have seen cases in DBT-3 where an execution plan with hash join is both on par and faster than doing a nested loop with index lookup. But the differences here are not that big, and there are many cases where an index lookup is faster than hash join. Change-Id: Ib3b3daaa8abc93567f99bf36bd1ef2c7633368a7
Erik Froseth authoredThis patch adds hash joins to MySQL. We have chosen to implement hash join with spill to disk if the input(s) become too big to fit into memory. We start out by trying to do everything in memory. If we run out of memory, we partition each input into smaller partitions, also called hash join chunk. Each pair of partitions is then processed as a normal in-memory hash join. The amount of memory available is controlled by the variable "join_buffer_size". In this round, we will replace block-nested loop (BNL) (both with and without join conditions) with hash join whenever possible. That is, if the join condition is an equi-join where each side of the condition refers to each side of the join, and the plan can be executed using the iterator execution engine, we will replace BNL with hash join. This version of hash join only supports inner joins, so outer joins will be available at a later stage. There have been no changes to the join optimizer, so the optimizer is still making execution plans thinking it is going to execute a BNL. This should clearly be changed, but that is left to a later patch/worklog. Note that we are relying on the new iterator execution engine to execute hash joins. This means that any query that executes using the old execution engine won't be able to execute hash joins. We have benchmarked hash join against both block-nested loop and index lookups. Hash join is faster than BNL in almost all cases, especially with larger inputs; in many queries, hash join completes in seconds where BNL is still running after several hours. Compared with index lookups, we have seen cases in DBT-3 where an execution plan with hash join is both on par and faster than doing a nested loop with index lookup. But the differences here are not that big, and there are many cases where an index lookup is faster than hash join. Change-Id: Ib3b3daaa8abc93567f99bf36bd1ef2c7633368a7
Loading