Skip to content

Conversation

@james727
Copy link
Contributor

@james727 james727 commented Jan 3, 2022

Which issue does this PR close?

This addresses the bug in single_distinct_to_groupby.rs described in #1512

Right now the SingleDistinctToGroupBy optimizer rule only works on literal or column expressions. For more complex expressions (like binary operators), the optimization breaks due to searching for the right/left operands in the derived schema - where the relevant fields do not exist. This adds an alias to the group by expression and uses the alias to convert the outer expression to a column expression.

As an example - consider the query SELECT fn1(DISTINCT 2 * col), fn2(DISTINCT 2 * col) FROM t. Previously this was being rewritten as something like:

SELECT fn1(2 * col), fn2(2 * col)
  FROM (
    SELECT 2 * col FROM t GROUP BY 1
  )

This breaks, since converting the outer binary expression into a physical plan requires the subquery schema to have col as a column. This change rewrites the plan with an alias, as follows:

SELECT fn1(alias1), fn2(alias1)
  FROM (
    SELECT 2 * col AS alias1 FROM t GROUP BY 1
  )

What changes are included in this PR?

This just includes the logic fix and some tests that would have broken before.

Alternatives considered

We could also just disable this optimization for non literal/column expressions.

Input needed

Using a constant for the alias feels a little questionable but I think there's no risk of collisions based on how is_single_distinct_agg works. Open to suggestions though.

@james727 james727 force-pushed the fix_optimize_distinct branch 3 times, most recently from 3167492 to 3cd8699 Compare January 3, 2022 22:00
@james727 james727 force-pushed the fix_optimize_distinct branch from 3cd8699 to af623db Compare January 4, 2022 00:48
@alamb
Copy link
Contributor

alamb commented Jan 4, 2022

Thanks @james727 -- I'll take a look at this tomorrow!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great -- thank you so much @james727 -- great first contribution

@alamb alamb merged commit 2fae23f into apache:master Jan 5, 2022
@alamb
Copy link
Contributor

alamb commented Jan 5, 2022

Thanks again @james727

@james727 james727 deleted the fix_optimize_distinct branch January 5, 2022 14:48
@james727
Copy link
Contributor Author

james727 commented Jan 5, 2022

No problem, glad it's useful. Appreciate the review.

@alamb alamb added the bug Something isn't working label Feb 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants