Skip to content
  • Steinar H. Gunderson's avatar
    d960e0d2
    WL #14333: SELECT DISTINCT for hypergraph join optimizer · d960e0d2
    Steinar H. Gunderson authored
    Implement support for SELECT DISTINCT in the hypergraph join optimizer.
    This works in the same fashion as it does in the existing one (at least
    after we changed to the iterator executor):
    
     - If there's a GROUP BY and DISTINCT together, DISTINCT is almost
       always folded into the GROUP BY (this happens before the optimizer).
     - DISTINCT is implemented as a sort with duplicate removal.
     - If there's DISTINCT and ORDER BY together, we attempt to fold the
       ORDER BY into the DISTINCT if possible, to save on the number of
       sorts.
    
    We also have exactly the same issues with row IDs as before; if the
    DISTINCT sort needs row IDs (e.g. because it wants to sort a long blob),
    the ORDER BY sort also needs to have row IDs. And if we have a sort for
    DISTINCT, we need a streaming (or materialization, in the rare case of
    a row ID) after any grouping, just as for ORDER BY. All of this requires
    a slightly convoluted setup where we set up the Filesort objects for
    the two sorts before we actually start making iterators.
    
    Change-Id: Id4b64c39d1a1468e488d5d3b2c068d6bbb797967
    d960e0d2
    WL #14333: SELECT DISTINCT for hypergraph join optimizer
    Steinar H. Gunderson authored
    Implement support for SELECT DISTINCT in the hypergraph join optimizer.
    This works in the same fashion as it does in the existing one (at least
    after we changed to the iterator executor):
    
     - If there's a GROUP BY and DISTINCT together, DISTINCT is almost
       always folded into the GROUP BY (this happens before the optimizer).
     - DISTINCT is implemented as a sort with duplicate removal.
     - If there's DISTINCT and ORDER BY together, we attempt to fold the
       ORDER BY into the DISTINCT if possible, to save on the number of
       sorts.
    
    We also have exactly the same issues with row IDs as before; if the
    DISTINCT sort needs row IDs (e.g. because it wants to sort a long blob),
    the ORDER BY sort also needs to have row IDs. And if we have a sort for
    DISTINCT, we need a streaming (or materialization, in the rare case of
    a row ID) after any grouping, just as for ORDER BY. All of this requires
    a slightly convoluted setup where we set up the Filesort objects for
    the two sorts before we actually start making iterators.
    
    Change-Id: Id4b64c39d1a1468e488d5d3b2c068d6bbb797967
Loading