mysql-test/include/desc_index.inc · 98b2ccb470de120d36bc4a623c814cdfded958ec · Rasoul Jahanshahi / Mysql Server

Oct 09, 2020

WL #14333: SELECT DISTINCT for hypergraph join optimizer · d960e0d2

Steinar H. Gunderson authored Oct 09, 2020

Implement support for SELECT DISTINCT in the hypergraph join optimizer.
This works in the same fashion as it does in the existing one (at least
after we changed to the iterator executor):

 - If there's a GROUP BY and DISTINCT together, DISTINCT is almost
   always folded into the GROUP BY (this happens before the optimizer).
 - DISTINCT is implemented as a sort with duplicate removal.
 - If there's DISTINCT and ORDER BY together, we attempt to fold the
   ORDER BY into the DISTINCT if possible, to save on the number of
   sorts.

We also have exactly the same issues with row IDs as before; if the
DISTINCT sort needs row IDs (e.g. because it wants to sort a long blob),
the ORDER BY sort also needs to have row IDs. And if we have a sort for
DISTINCT, we need a streaming (or materialization, in the rare case of
a row ID) after any grouping, just as for ORDER BY. All of this requires
a slightly convoluted setup where we set up the Filesort objects for
the two sorts before we actually start making iterators.

Change-Id: Id4b64c39d1a1468e488d5d3b2c068d6bbb797967

d960e0d2

WL #14333: SELECT DISTINCT for hypergraph join optimizer

Steinar H. Gunderson authored Oct 09, 2020

Implement support for SELECT DISTINCT in the hypergraph join optimizer.
This works in the same fashion as it does in the existing one (at least
after we changed to the iterator executor):

 - If there's a GROUP BY and DISTINCT together, DISTINCT is almost
   always folded into the GROUP BY (this happens before the optimizer).
 - DISTINCT is implemented as a sort with duplicate removal.
 - If there's DISTINCT and ORDER BY together, we attempt to fold the
   ORDER BY into the DISTINCT if possible, to save on the number of
   sorts.

We also have exactly the same issues with row IDs as before; if the
DISTINCT sort needs row IDs (e.g. because it wants to sort a long blob),
the ORDER BY sort also needs to have row IDs. And if we have a sort for
DISTINCT, we need a streaming (or materialization, in the rare case of
a row ID) after any grouping, just as for ORDER BY. All of this requires
a slightly convoluted setup where we set up the Filesort objects for
the two sorts before we actually start making iterators.

Change-Id: Id4b64c39d1a1468e488d5d3b2c068d6bbb797967