-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24035][SQL] SQL syntax for Pivot #21187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #89950 has finished for PR 21187 at commit
|
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.spark.sql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use the infra for SQLQueryTestSuite?
|
LGTM thanks for doing this! |
|
Would be great to make Notice that Oracle's PIVOT when used on XML doesn't require list of values either - |
|
Thank you, @Tagar, for you comment! I think by saying "making FOR section optional", you actually mean to support "IN ANY". |
|
@maryannxue, yep, "IN ANY" is what I meant. I missed it was already in the description - thanks for clarifying this. |
|
In-any-subquery in Pivot can be implemented like what we did in the other parts, but let us first leave it as the future item. @maryannxue Maybe you can create a JIRA. The Oracle-compatible syntax in this PR is good enough. Thanks for your work! @maryannxue |
| FULL: 'FULL'; | ||
| NATURAL: 'NATURAL'; | ||
| ON: 'ON'; | ||
| PIVOT: 'PIVOT'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the keywords you added here to nonReserved (line 723)? Also update the suite TableIdentifierParserSuite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll update TableIndentifierParserSuite. I believe I've added them to nonReserved already. Did I miss something?
| aggregates: Seq[Expression], | ||
| child: LogicalPlan) extends UnaryNode { | ||
| child: LogicalPlan, | ||
| groupByExprsImplicit: Boolean = false) extends UnaryNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add add the parameter descriptions like what we did in GroupingSets (line 663-675)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just change groupByExprs: Seq[NamedExpression] -> groupByExprs: Option[Seq[NamedExpression]]? Then, we do not need this extra flag. In the pattern matching, we just need to do something like Some, or None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I hesitated for a while... trying to figure out which way it would be clearer, using Option or an extra flag. I did not have a preference, so I'll do Option if you think it's better.
| child: LogicalPlan) extends UnaryNode { | ||
| child: LogicalPlan, | ||
| groupByExprsImplicit: Boolean = false) extends UnaryNode { | ||
| override lazy val resolved = false // Pivot will be replaced after being resolved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a negative test case to show which errors we could issue when Pivot can't be resolved? If the error does not make sense to the end users, we can improve the function checkAnalysis in the trait CheckAnalysis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ResolvePivot rule, the Pivot node is guaranteed to be resolved and replaced, either with a Project or an Aggregate, except when either of the two errors occur:
- Child expressions or child node of
Pivotcannot be resolved, which is a general Analyzer error and not specific to Pivot. I'll add a test case for this anyway. - Pivot's
aggregatesare not really aggregate expressions (not guaranteed by the Parser). I have added this check inResolvePivotrule, and the last test in "pivot.sql" is dedicated for this check.
So here, marking Pivot's resolved field as false is to stop its parent from reference-resolving or star-expansion and wait till it has been replaced, otherwise the star-expansion will be incorrect.
|
@gatorsmile "In-any-subquery in Pivot can be implemented like what we did in the other parts", can you make this clearer? The Pivot's "IN" values are special coz they will later become the columns of the relation. |
|
Test build #90012 has finished for PR 21187 at commit
|
|
retest this please |
|
Test build #90024 has finished for PR 21187 at commit
|
gatorsmile
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except one minor comment.
| override lazy val resolved = false // Pivot will be replaced after being resolved. | ||
| override def output: Seq[Attribute] = | ||
| groupByExprsOpt.getOrElse(Seq.empty).map(_.toAttribute) ++ aggregates match { | ||
| case agg :: Nil => pivotValues.map(value => AttributeReference(value.toString, agg.dataType)()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: indent issues. Let us just clean the code at the same time.
override def output: Seq[Attribute] = {
val pivotAgg = aggregates match {
case agg :: Nil =>
pivotValues.map(value => AttributeReference(value.toString, agg.dataType)())
case _ =>
pivotValues.flatMap { value =>
aggregates.map(agg => AttributeReference(value + "_" + agg.sql, agg.dataType)())
}
}
groupByExprsOpt.getOrElse(Seq.empty).map(_.toAttribute) ++ pivotAgg
}
Yes. The In subquery needs to be executed before/during query analysis stage. Thus, it is different. |
|
Thank you, @gatorsmile, for the review and comments! I have opened SPARK-24162, SPARK-24163 and SPARK-24164 as follow-up improvements for this issue. Please feel free to assign them to me. |
|
Test build #90084 has finished for PR 21187 at commit
|
|
Test build #90094 has finished for PR 21187 at commit
|
|
retest this please |
|
Test build #90144 has finished for PR 21187 at commit
|
|
Thanks for your fast and great work! Merged to master. |
What changes were proposed in this pull request?
Add SQL support for Pivot according to Pivot grammar defined by Oracle (https://docs.oracle.com/database/121/SQLRF/img_text/pivot_clause.htm) with some simplifications, based on our existing functionality and limitations for Pivot at the backend:
The code changes are:
How was this patch tested?
A new test suite PivotSuite is added.