sql/hash_join_chunk.h · mysql-8.0.21 · Rasoul Jahanshahi / Mysql Server

Dec 18, 2019

WL#13377 Add support for hash outer, anti and semi join [8/8, outer join] · 3e1f79bc

Erik Froseth authored Dec 18, 2019

This patch implements support for hash outer join.

The right table is assigned as the build input, using the join attribute as the
hash table key. The left table is then assigned as the probe input. For each row
from the probe input, we do a lookup in the hash table using the join attribute.
For each matching row found in the hash table, output the joined row. If no
matching row is found, a null-complemented row is returned.

If the hash join spills to disk, we cannot output NULL-complemented rows before
the probe row is read back from disk; a probe row can match a row from the build
input we haven't seen yet (it's been written out to disk because the hash table
was full). So when writing a probe row out to disk, we add one byte in front of
the row that works as a match flag. When we read the probe row back from chunk
file later, we can output a NULL-complemented row _only_ if the probe row did
not have a matching row in the hash table before it was written out to disk.

Mapping existing execution plans to hash outer join is tricky due to the way
MySQL sets up outer join plans. Consider the following query:

  SELECT * FROM t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.i = t3.i) ON t2.i = t1.i;

Traditionally, MySQL tries to collect conditions in multi-equality conditions.
For the query above, it means that t1.i, t2.i and t3.i will be collected in
one single multi-equality condition during optimization, namely
(t1.i = t2.i = t3.i). When creating the query plan, the optimizer will then
create equality conditions (x = y) by traversing the multiple equality condition
(t1.i, t2.i, t3.i) and combining each of the fields with a field from the
table the optimizer think it will evaluate first (in the old executor with
nested loop, this is the same as the first table in QEP_TAB order).
As a result, the above query ends up looking like this after optimization:

  SELECT * FROM t1 LEFT JOIN (t2 LEFT JOIN t3 ON t1.i = t3.i) ON t2.i = t1.i;

Note that t2.i = t3.i has been changed to t1.i = t3.i. In a nested loop, this
works just fine (the executor will read the tables from left to right in the
order t1, t2, t3). But with hash join, this breaks down; the join condition for
't2 LEFT JOIN t3' will refer to the unavailable table 't1':

       left hash join <-- t2.i = t1.i
       |            |
      t1       left hash join <-- t1.i = t3.i !!
               |            |
              t2            t3

Note that if a user tries to execute the rewritten query above, the server will
return "Unknown column 't1.i'". In order to reverse this, we must go through
each join condition and see if we have any fields that refers to tables we
cannot reach. Each unreachable field must be replaced with a field from the
multi-equality that the hash join actually can reach. Since the multi-equality
is lost at this point, this patch now saves these so that we can perform this
operation when hash join is set up. Note that this can also happen even if the
column is wrapped inside a function, such as '... ON ABS(t2.i) = t3.i'.

Traditional EXPLAIN and EXPLAIN FORMAT=json sends a normalized version of the
query as a warning to the client. The query is printed after the iterator tree
is constructed, meaning that the normalized query will contain the reversal of
column substitution that happens when creating a hash join iterator. As a result
of this, some tests in MTR will print different join columns if the query is to
be executed by hash join.

Change-Id: I138bf992ef5dfc083ce9f91a0ab126b35855bb0b

3e1f79bc

WL#13377 Add support for hash outer, anti and semi join [8/8, outer join]

Erik Froseth authored Dec 18, 2019

This patch implements support for hash outer join.

The right table is assigned as the build input, using the join attribute as the
hash table key. The left table is then assigned as the probe input. For each row
from the probe input, we do a lookup in the hash table using the join attribute.
For each matching row found in the hash table, output the joined row. If no
matching row is found, a null-complemented row is returned.

If the hash join spills to disk, we cannot output NULL-complemented rows before
the probe row is read back from disk; a probe row can match a row from the build
input we haven't seen yet (it's been written out to disk because the hash table
was full). So when writing a probe row out to disk, we add one byte in front of
the row that works as a match flag. When we read the probe row back from chunk
file later, we can output a NULL-complemented row _only_ if the probe row did
not have a matching row in the hash table before it was written out to disk.

Mapping existing execution plans to hash outer join is tricky due to the way
MySQL sets up outer join plans. Consider the following query:

  SELECT * FROM t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.i = t3.i) ON t2.i = t1.i;

Traditionally, MySQL tries to collect conditions in multi-equality conditions.
For the query above, it means that t1.i, t2.i and t3.i will be collected in
one single multi-equality condition during optimization, namely
(t1.i = t2.i = t3.i). When creating the query plan, the optimizer will then
create equality conditions (x = y) by traversing the multiple equality condition
(t1.i, t2.i, t3.i) and combining each of the fields with a field from the
table the optimizer think it will evaluate first (in the old executor with
nested loop, this is the same as the first table in QEP_TAB order).
As a result, the above query ends up looking like this after optimization:

  SELECT * FROM t1 LEFT JOIN (t2 LEFT JOIN t3 ON t1.i = t3.i) ON t2.i = t1.i;

Note that t2.i = t3.i has been changed to t1.i = t3.i. In a nested loop, this
works just fine (the executor will read the tables from left to right in the
order t1, t2, t3). But with hash join, this breaks down; the join condition for
't2 LEFT JOIN t3' will refer to the unavailable table 't1':

       left hash join <-- t2.i = t1.i
       |            |
      t1       left hash join <-- t1.i = t3.i !!
               |            |
              t2            t3

Note that if a user tries to execute the rewritten query above, the server will
return "Unknown column 't1.i'". In order to reverse this, we must go through
each join condition and see if we have any fields that refers to tables we
cannot reach. Each unreachable field must be replaced with a field from the
multi-equality that the hash join actually can reach. Since the multi-equality
is lost at this point, this patch now saves these so that we can perform this
operation when hash join is set up. Note that this can also happen even if the
column is wrapped inside a function, such as '... ON ABS(t2.i) = t3.i'.

Traditional EXPLAIN and EXPLAIN FORMAT=json sends a normalized version of the
query as a warning to the client. The query is printed after the iterator tree
is constructed, meaning that the normalized query will contain the reversal of
column substitution that happens when creating a hash join iterator. As a result
of this, some tests in MTR will print different join columns if the query is to
be executed by hash join.

Change-Id: I138bf992ef5dfc083ce9f91a0ab126b35855bb0b