-
Erik Froseth authored
This patch implements support for hash outer join. The right table is assigned as the build input, using the join attribute as the hash table key. The left table is then assigned as the probe input. For each row from the probe input, we do a lookup in the hash table using the join attribute. For each matching row found in the hash table, output the joined row. If no matching row is found, a null-complemented row is returned. If the hash join spills to disk, we cannot output NULL-complemented rows before the probe row is read back from disk; a probe row can match a row from the build input we haven't seen yet (it's been written out to disk because the hash table was full). So when writing a probe row out to disk, we add one byte in front of the row that works as a match flag. When we read the probe row back from chunk file later, we can output a NULL-complemented row _only_ if the probe row did not have a matching row in the hash table before it was written out to disk. Mapping existing execution plans to hash outer join is tricky due to the way MySQL sets up outer join plans. Consider the following query: SELECT * FROM t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.i = t3.i) ON t2.i = t1.i; Traditionally, MySQL tries to collect conditions in multi-equality conditions. For the query above, it means that t1.i, t2.i and t3.i will be collected in one single multi-equality condition during optimization, namely (t1.i = t2.i = t3.i). When creating the query plan, the optimizer will then create equality conditions (x = y) by traversing the multiple equality condition (t1.i, t2.i, t3.i) and combining each of the fields with a field from the table the optimizer think it will evaluate first (in the old executor with nested loop, this is the same as the first table in QEP_TAB order). As a result, the above query ends up looking like this after optimization: SELECT * FROM t1 LEFT JOIN (t2 LEFT JOIN t3 ON t1.i = t3.i) ON t2.i = t1.i; Note that t2.i = t3.i has been changed to t1.i = t3.i. In a nested loop, this works just fine (the executor will read the tables from left to right in the order t1, t2, t3). But with hash join, this breaks down; the join condition for 't2 LEFT JOIN t3' will refer to the unavailable table 't1': left hash join <-- t2.i = t1.i | | t1 left hash join <-- t1.i = t3.i !! | | t2 t3 Note that if a user tries to execute the rewritten query above, the server will return "Unknown column 't1.i'". In order to reverse this, we must go through each join condition and see if we have any fields that refers to tables we cannot reach. Each unreachable field must be replaced with a field from the multi-equality that the hash join actually can reach. Since the multi-equality is lost at this point, this patch now saves these so that we can perform this operation when hash join is set up. Note that this can also happen even if the column is wrapped inside a function, such as '... ON ABS(t2.i) = t3.i'. Traditional EXPLAIN and EXPLAIN FORMAT=json sends a normalized version of the query as a warning to the client. The query is printed after the iterator tree is constructed, meaning that the normalized query will contain the reversal of column substitution that happens when creating a hash join iterator. As a result of this, some tests in MTR will print different join columns if the query is to be executed by hash join. Change-Id: I138bf992ef5dfc083ce9f91a0ab126b35855bb0b
Erik Froseth authoredThis patch implements support for hash outer join. The right table is assigned as the build input, using the join attribute as the hash table key. The left table is then assigned as the probe input. For each row from the probe input, we do a lookup in the hash table using the join attribute. For each matching row found in the hash table, output the joined row. If no matching row is found, a null-complemented row is returned. If the hash join spills to disk, we cannot output NULL-complemented rows before the probe row is read back from disk; a probe row can match a row from the build input we haven't seen yet (it's been written out to disk because the hash table was full). So when writing a probe row out to disk, we add one byte in front of the row that works as a match flag. When we read the probe row back from chunk file later, we can output a NULL-complemented row _only_ if the probe row did not have a matching row in the hash table before it was written out to disk. Mapping existing execution plans to hash outer join is tricky due to the way MySQL sets up outer join plans. Consider the following query: SELECT * FROM t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.i = t3.i) ON t2.i = t1.i; Traditionally, MySQL tries to collect conditions in multi-equality conditions. For the query above, it means that t1.i, t2.i and t3.i will be collected in one single multi-equality condition during optimization, namely (t1.i = t2.i = t3.i). When creating the query plan, the optimizer will then create equality conditions (x = y) by traversing the multiple equality condition (t1.i, t2.i, t3.i) and combining each of the fields with a field from the table the optimizer think it will evaluate first (in the old executor with nested loop, this is the same as the first table in QEP_TAB order). As a result, the above query ends up looking like this after optimization: SELECT * FROM t1 LEFT JOIN (t2 LEFT JOIN t3 ON t1.i = t3.i) ON t2.i = t1.i; Note that t2.i = t3.i has been changed to t1.i = t3.i. In a nested loop, this works just fine (the executor will read the tables from left to right in the order t1, t2, t3). But with hash join, this breaks down; the join condition for 't2 LEFT JOIN t3' will refer to the unavailable table 't1': left hash join <-- t2.i = t1.i | | t1 left hash join <-- t1.i = t3.i !! | | t2 t3 Note that if a user tries to execute the rewritten query above, the server will return "Unknown column 't1.i'". In order to reverse this, we must go through each join condition and see if we have any fields that refers to tables we cannot reach. Each unreachable field must be replaced with a field from the multi-equality that the hash join actually can reach. Since the multi-equality is lost at this point, this patch now saves these so that we can perform this operation when hash join is set up. Note that this can also happen even if the column is wrapped inside a function, such as '... ON ABS(t2.i) = t3.i'. Traditional EXPLAIN and EXPLAIN FORMAT=json sends a normalized version of the query as a warning to the client. The query is printed after the iterator tree is constructed, meaning that the normalized query will contain the reversal of column substitution that happens when creating a hash join iterator. As a result of this, some tests in MTR will print different join columns if the query is to be executed by hash join. Change-Id: I138bf992ef5dfc083ce9f91a0ab126b35855bb0b
Loading