-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15677][SQL] Query with scalar sub-query in the SELECT list throws UnsupportedOperationException #13418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nsupportedOperationException.
|
LGTM CC @hvanhovell @davies @cloud-fan |
| def apply(plan: LogicalPlan): LogicalPlan = plan transform { | ||
| case Project(projectList, LocalRelation(output, data)) => | ||
| case p @ Project(projectList, LocalRelation(output, data)) | ||
| if !p.expressions.exists(ScalarSubquery.hasScalarSubquery) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it more general to check unvaluable expressions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davies Thank you for the comment. I am looking into generalizing the condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davies Sorry for the delay in replying. I am new to the Spark code. I've looked at Unevaluable expressions. My findings are that checking for Unevaluable expressions would be too general since a lot of expressions mix in this trait. For example, AttributeReference is one of them. If we explicitly check for Unevaluable expressions, a simple query of the form "select c1 from t1"
would be regressed. Let me know I misunderstood your requirement. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think AttributeReference is the only exception, it will be replaced to BoundReference when create an Projection, we could have a special case for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to catch Unevaluable and special case AttributeReference
|
Test build #3033 has finished for PR 13418 at commit
|
| } | ||
|
|
||
| def hasScalarSubquery(e: Expression): Boolean = { | ||
| e.find { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.find(_.isInstanceOf[ScalarSubquery]).isDefined
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rxin Thank you for the review. I aligned my code to the existing implementation. But I can replace the method call with your suggestion.
|
@gatorsmile @davies @rxin @cloud-fan I've incorporated the comments. Please advise. Thank you. |
| def apply(plan: LogicalPlan): LogicalPlan = plan transform { | ||
| case Project(projectList, LocalRelation(output, data)) => | ||
| case p @ Project(projectList, LocalRelation(output, data)) | ||
| if !p.expressions.exists(hasUnevaluableExpr) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.expressions is just the projectList
|
@cloud-fan I replaced p.expressions with projectList. Thanks. |
| Array(Row("two")) | ||
| ) | ||
|
|
||
| checkAnswer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to create a new test case for it
|
LGTM except one test comment. |
|
@cloud-fan I moved the unit tests to a new test case. Thank you. |
| ) | ||
| } | ||
|
|
||
| test("SPARK-15677: Scalar sub-query in Select list against a DataFrame generated query") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should mention that this bug only exists in local relation?
|
Thank you @cloud-fan. I mentioned the local relations in the test case description and move the test cases under withTempTable. |
|
LGTM pending jenkins |
|
@cloud-fan Thank you. |
|
whoops triggered a build on this one... sorry about that |
|
Test build #3045 has finished for PR 13418 at commit
|
…ows UnsupportedOperationException
## What changes were proposed in this pull request?
Queries with scalar sub-query in the SELECT list run against a local, in-memory relation throw
UnsupportedOperationException exception.
Problem repro:
```SQL
scala> Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t1")
scala> Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t2")
scala> sql("select (select min(c1) from t2) from t1").show()
java.lang.UnsupportedOperationException: Cannot evaluate expression: scalar-subquery#62 []
at org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:215)
at org.apache.spark.sql.catalyst.expressions.ScalarSubquery.eval(subquery.scala:62)
at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:142)
at org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:45)
at org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:29)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation$$anonfun$apply$37.applyOrElse(Optimizer.scala:1473)
```
The problem is specific to local, in memory relations. It is caused by rule ConvertToLocalRelation, which attempts to push down
a scalar-subquery expression to the local tables.
The solution prevents the rule to apply if Project references scalar subqueries.
## How was this patch tested?
Added regression tests to SubquerySuite.scala
Author: Ioana Delaney <[email protected]>
Closes #13418 from ioana-delaney/scalarSubV2.
(cherry picked from commit 9e2eb13)
Signed-off-by: Wenchen Fan <[email protected]>
|
thanks, merging to master and 2.0! |
|
@cloud-fan @gatorsmile @davies @rxin @hvanhovell Thank you all. This was my first PR! |
What changes were proposed in this pull request?
Queries with scalar sub-query in the SELECT list run against a local, in-memory relation throw
UnsupportedOperationException exception.
Problem repro:
The problem is specific to local, in memory relations. It is caused by rule ConvertToLocalRelation, which attempts to push down
a scalar-subquery expression to the local tables.
The solution prevents the rule to apply if Project references scalar subqueries.
How was this patch tested?
Added regression tests to SubquerySuite.scala