Skip to content
  • Erik Froseth's avatar
    06d27cbe
    WL#2241 Implement hash join · 06d27cbe
    Erik Froseth authored
    This patch adds hash joins to MySQL. We have chosen to implement hash join
    with spill to disk if the input(s) become too big to fit into memory. We
    start out by trying to do everything in memory. If we run out of memory,
    we partition each input into smaller partitions, also called hash join chunk.
    Each pair of partitions is then processed as a normal in-memory hash join.
    The amount of memory available is controlled by the variable "join_buffer_size".
    
    In this round, we will replace block-nested loop (BNL) (both with and
    without join conditions) with hash join whenever possible. That is,
    if the join condition is an equi-join where each side of the condition
    refers to each side of the join, and the plan can be executed using the
    iterator execution engine, we will replace BNL with hash join.
    This version of hash join only supports inner joins, so outer joins
    will be available at a later stage.
    
    There have been no changes to the join optimizer, so the optimizer is
    still making execution plans thinking it is going to execute a BNL. This
    should clearly be changed, but that is left to a later patch/worklog.
    
    Note that we are relying on the new iterator execution engine to execute
    hash joins. This means that any query that executes using the old
    execution engine won't be able to execute hash joins.
    
    We have benchmarked hash join against both block-nested loop and index
    lookups. Hash join is faster than BNL in almost all cases, especially
    with larger inputs; in many queries, hash join completes in seconds
    where BNL is still running after several hours. Compared with
    index lookups, we have seen cases in DBT-3 where an execution plan with
    hash join is both on par and faster than doing a nested loop with index
    lookup. But the differences here are not that big, and there are many
    cases where an index lookup is faster than hash join.
    
    Change-Id: Ib3b3daaa8abc93567f99bf36bd1ef2c7633368a7
    06d27cbe
    WL#2241 Implement hash join
    Erik Froseth authored
    This patch adds hash joins to MySQL. We have chosen to implement hash join
    with spill to disk if the input(s) become too big to fit into memory. We
    start out by trying to do everything in memory. If we run out of memory,
    we partition each input into smaller partitions, also called hash join chunk.
    Each pair of partitions is then processed as a normal in-memory hash join.
    The amount of memory available is controlled by the variable "join_buffer_size".
    
    In this round, we will replace block-nested loop (BNL) (both with and
    without join conditions) with hash join whenever possible. That is,
    if the join condition is an equi-join where each side of the condition
    refers to each side of the join, and the plan can be executed using the
    iterator execution engine, we will replace BNL with hash join.
    This version of hash join only supports inner joins, so outer joins
    will be available at a later stage.
    
    There have been no changes to the join optimizer, so the optimizer is
    still making execution plans thinking it is going to execute a BNL. This
    should clearly be changed, but that is left to a later patch/worklog.
    
    Note that we are relying on the new iterator execution engine to execute
    hash joins. This means that any query that executes using the old
    execution engine won't be able to execute hash joins.
    
    We have benchmarked hash join against both block-nested loop and index
    lookups. Hash join is faster than BNL in almost all cases, especially
    with larger inputs; in many queries, hash join completes in seconds
    where BNL is still running after several hours. Compared with
    index lookups, we have seen cases in DBT-3 where an execution plan with
    hash join is both on par and faster than doing a nested loop with index
    lookup. But the differences here are not that big, and there are many
    cases where an index lookup is faster than hash join.
    
    Change-Id: Ib3b3daaa8abc93567f99bf36bd1ef2c7633368a7
Loading