Fix single_distinct_to_groupby for arbitrary expressions #1519
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
This addresses the bug in
single_distinct_to_groupby.rsdescribed in #1512Right now the
SingleDistinctToGroupByoptimizer rule only works on literal or column expressions. For more complex expressions (like binary operators), the optimization breaks due to searching for the right/left operands in the derived schema - where the relevant fields do not exist. This adds an alias to the group by expression and uses the alias to convert the outer expression to a column expression.As an example - consider the query
SELECT fn1(DISTINCT 2 * col), fn2(DISTINCT 2 * col) FROM t. Previously this was being rewritten as something like:This breaks, since converting the outer binary expression into a physical plan requires the subquery schema to have
colas a column. This change rewrites the plan with an alias, as follows:What changes are included in this PR?
This just includes the logic fix and some tests that would have broken before.
Alternatives considered
We could also just disable this optimization for non literal/column expressions.
Input needed
Using a constant for the alias feels a little questionable but I think there's no risk of collisions based on how
is_single_distinct_aggworks. Open to suggestions though.