-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-40382][SQL] Group distinct aggregate expressions by semantically equivalent children in RewriteDistinctAggregates
#37825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
a5a6fc0
0a109d9
38f1f6a
3fa3588
4a40f91
165f558
27dcffe
484ca8e
208fe82
882cdaa
f53136d
9938252
f7d29df
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
…ction children to Spark strategies
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -527,8 +527,10 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { | |
|
|
||
| val (functionsWithDistinct, functionsWithoutDistinct) = | ||
| aggregateExpressions.partition(_.isDistinct) | ||
| if (functionsWithDistinct.map( | ||
| _.aggregateFunction.children.filterNot(_.foldable).toSet).distinct.length > 1) { | ||
| val distinctAggChildSets = functionsWithDistinct.map { ae => | ||
| ExpressionSet(ae.aggregateFunction.children.filterNot(_.foldable)) | ||
| }.distinct | ||
| if (distinctAggChildSets.length > 1) { | ||
| // This is a sanity check. We should not reach here when we have multiple distinct | ||
| // column sets. Our `RewriteDistinctAggregates` should take care this case. | ||
| throw new IllegalStateException( | ||
|
|
@@ -560,7 +562,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { | |
| // [COUNT(DISTINCT bar), COUNT(DISTINCT foo)] is disallowed because those two distinct | ||
| // aggregates have different column expressions. | ||
| val distinctExpressions = | ||
| functionsWithDistinct.head.aggregateFunction.children.filterNot(_.foldable) | ||
| functionsWithDistinct.flatMap( | ||
| _.aggregateFunction.children.filterNot(_.foldable)).distinct | ||
|
||
| val normalizedNamedDistinctExpressions = distinctExpressions.map { e => | ||
| // Ideally this should be done in `NormalizeFloatingNumbers`, but we do it here | ||
| // because `distinctExpressions` is not extracted during logical phase. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm a little confused. Why do we change here? Since all children are semantically equivalent, we can just pick the first distinct function. If we need to look up the child later, we should make sure it uses
ExpressionSet.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, possibly I was too. I had not read all of
planAggregateWithOneDistinctyet, and I see the creation ofrewrittenDistinctFunctions, where I can possibly take advantage of semantic equivalence.