-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-12740] [SPARK-13932] support grouping()/grouping_id() in having/order clause #12235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @cloud-fan |
| } | ||
| } | ||
|
|
||
| private def isAggregateExpression(e: Expression): Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need this method anymore? It's just a simple isInstanceOf now
|
overall LGTM |
|
Test build #55201 has finished for PR 12235 at commit
|
|
@marmbrus Could you also take a quick look on this one? |
|
Test build #55229 has finished for PR 12235 at commit
|
| val groupingIdName: String = "grouping__id" | ||
| // The attribute name used by Hive, which has different result than Spark, deprecated. | ||
| val hiveGroupingIdName: String = "grouping__id" | ||
| val groupingIdName: String = "spark_grouping_id" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what's going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"grouping__id" came from Hive, but unfortunately the implementation is wrong, see https://issues.apache.org/jira/browse/HIVE-12833. So we deprecated to favor the standard function grouping_id() as public API. "spark_grouping_id" is the virtual column only used internally.
|
LGTM |
|
Merged into master, thanks! |
What changes were proposed in this pull request?
This PR brings the support of using grouping()/grouping_id() in HAVING/ORDER BY clause.
The resolved grouping()/grouping_id() will be replaced by unresolved "spark_gropuing_id" virtual attribute, then resolved by ResolveMissingAttribute.
This PR also fix the HAVING clause that access a grouping column that is not presented in SELECT clause, for example:
How was this patch tested?
Add new tests.