Skip to content

Conversation

@ioana-delaney
Copy link
Contributor

What changes were proposed in this pull request?

Queries with scalar sub-query in the SELECT list run against a local, in-memory relation throw
UnsupportedOperationException exception.

Problem repro:

scala> Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t1")
scala> Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t2")
scala> sql("select (select min(c1) from t2) from t1").show()

java.lang.UnsupportedOperationException: Cannot evaluate expression: scalar-subquery#62 []
  at org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:215)
  at org.apache.spark.sql.catalyst.expressions.ScalarSubquery.eval(subquery.scala:62)
  at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:142)
  at org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:45)
  at org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:29)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  at scala.collection.immutable.List.map(List.scala:285)
  at org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation$$anonfun$apply$37.applyOrElse(Optimizer.scala:1473)

The problem is specific to local, in memory relations. It is caused by rule ConvertToLocalRelation, which attempts to push down
a scalar-subquery expression to the local tables.

The solution prevents the rule to apply if Project references scalar subqueries.

How was this patch tested?

Added regression tests to SubquerySuite.scala

@gatorsmile
Copy link
Member

LGTM CC @hvanhovell @davies @cloud-fan

def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case Project(projectList, LocalRelation(output, data)) =>
case p @ Project(projectList, LocalRelation(output, data))
if !p.expressions.exists(ScalarSubquery.hasScalarSubquery) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it more general to check unvaluable expressions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davies Thank you for the comment. I am looking into generalizing the condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davies Sorry for the delay in replying. I am new to the Spark code. I've looked at Unevaluable expressions. My findings are that checking for Unevaluable expressions would be too general since a lot of expressions mix in this trait. For example, AttributeReference is one of them. If we explicitly check for Unevaluable expressions, a simple query of the form "select c1 from t1"
would be regressed. Let me know I misunderstood your requirement. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think AttributeReference is the only exception, it will be replaced to BoundReference when create an Projection, we could have a special case for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to catch Unevaluable and special case AttributeReference

@SparkQA
Copy link

SparkQA commented May 31, 2016

Test build #3033 has finished for PR 13418 at commit faded1d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}

def hasScalarSubquery(e: Expression): Boolean = {
e.find {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.find(_.isInstanceOf[ScalarSubquery]).isDefined

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rxin Thank you for the review. I aligned my code to the existing implementation. But I can replace the method call with your suggestion.

@ioana-delaney
Copy link
Contributor Author

@gatorsmile @davies @rxin @cloud-fan I've incorporated the comments. Please advise. Thank you.

def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case Project(projectList, LocalRelation(output, data)) =>
case p @ Project(projectList, LocalRelation(output, data))
if !p.expressions.exists(hasUnevaluableExpr) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.expressions is just the projectList

@ioana-delaney
Copy link
Contributor Author

@cloud-fan I replaced p.expressions with projectList. Thanks.

Array(Row("two"))
)

checkAnswer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to create a new test case for it

@cloud-fan
Copy link
Contributor

LGTM except one test comment.

@ioana-delaney
Copy link
Contributor Author

@cloud-fan I moved the unit tests to a new test case. Thank you.

)
}

test("SPARK-15677: Scalar sub-query in Select list against a DataFrame generated query") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should mention that this bug only exists in local relation?

@ioana-delaney
Copy link
Contributor Author

Thank you @cloud-fan. I mentioned the local relations in the test case description and move the test cases under withTempTable.

@cloud-fan
Copy link
Contributor

LGTM pending jenkins

@ioana-delaney
Copy link
Contributor Author

@cloud-fan Thank you.

@hvanhovell
Copy link
Contributor

whoops triggered a build on this one... sorry about that

@SparkQA
Copy link

SparkQA commented Jun 3, 2016

Test build #3045 has finished for PR 13418 at commit 77166bc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 9e2eb13 Jun 3, 2016
asfgit pushed a commit that referenced this pull request Jun 3, 2016
…ows UnsupportedOperationException

## What changes were proposed in this pull request?
Queries with scalar sub-query in the SELECT list run against a local, in-memory relation throw
UnsupportedOperationException exception.

Problem repro:
```SQL
scala> Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t1")
scala> Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t2")
scala> sql("select (select min(c1) from t2) from t1").show()

java.lang.UnsupportedOperationException: Cannot evaluate expression: scalar-subquery#62 []
  at org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:215)
  at org.apache.spark.sql.catalyst.expressions.ScalarSubquery.eval(subquery.scala:62)
  at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:142)
  at org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:45)
  at org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:29)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  at scala.collection.immutable.List.map(List.scala:285)
  at org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation$$anonfun$apply$37.applyOrElse(Optimizer.scala:1473)
```
The problem is specific to local, in memory relations. It is caused by rule ConvertToLocalRelation, which attempts to push down
a scalar-subquery expression to the local tables.

The solution prevents the rule to apply if Project references scalar subqueries.

## How was this patch tested?
Added regression tests to SubquerySuite.scala

Author: Ioana Delaney <[email protected]>

Closes #13418 from ioana-delaney/scalarSubV2.

(cherry picked from commit 9e2eb13)
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan
Copy link
Contributor

thanks, merging to master and 2.0!

@ioana-delaney
Copy link
Contributor Author

@cloud-fan @gatorsmile @davies @rxin @hvanhovell Thank you all. This was my first PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants