Skip to content

Conversation

@ravipesala
Copy link
Contributor

This PR supports subqueries in preicates 'in' clause. The queries will be transformed to the LeftSemi join as mentioned below.

Case 1 Uncorelated queries

-- original query
select C
from R1
where R1.A in (Select B from R2)
-- rewritten query
Select C
from R1 left semijoin R2 on R1.A = R2.B

Case 2 Corelated queries

-- original query
select C
from R1
where R1.A in (Select B from R2 where R1.X = R2.Y)
-- rewritten query
select C
from R1 left semi join
(select B, R2.Y as sq1_col0 from R2) sq1
on R1.X = sq1.sq1_col0 and R1.A = sq1.B

Restriction : Alias need to be used as we convert it into join queries.
Complete specification is available in https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf

@ravipesala ravipesala force-pushed the SPARK-4226 branch 2 times, most recently from 3f63e1c to 59dfab5 Compare November 21, 2014 17:17
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document what the arguments are here? It's not clear to me why we need exp (i.e., why can we just get it from the output of child. Also, style-wise I'd avoid abbreviation when not necessary and child is kind of an odd name given that its a different type of tree. Finally, should this be a LeafExpression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comments.
Here exp is like predicate value. For example
SELECT * FROM src a WHERE a.key in (SELECT b.key FROM src b) . In this exp is a.key and child is subquery.
Now I have updated the names of them and added the documentation.

@marmbrus
Copy link
Contributor

marmbrus commented Dec 2, 2014

Cool feature! Thanks for working on this.

@marmbrus
Copy link
Contributor

marmbrus commented Dec 2, 2014

ok to test

@SparkQA
Copy link

SparkQA commented Dec 2, 2014

Test build #24010 has finished for PR 3249 at commit 59dfab5.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • case class SubqueryExpression(exp: Expression, child: LogicalPlan) extends Expression

@scwf
Copy link
Contributor

scwf commented Dec 2, 2014

Hi @ravipesala, can you rebase this PR

@ravipesala
Copy link
Contributor Author

Rebased with master. And fixed comments

@SparkQA
Copy link

SparkQA commented Dec 3, 2014

Test build #24090 has finished for PR 3249 at commit 353b86b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SubqueryExpression(value: Expression, subquery: LogicalPlan) extends Expression

@SparkQA
Copy link

SparkQA commented Dec 3, 2014

Test build #24091 has finished for PR 3249 at commit 0f5cc3c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SubqueryExpression(value: Expression, subquery: LogicalPlan) extends Expression

@SparkQA
Copy link

SparkQA commented Dec 5, 2014

Test build #24177 has finished for PR 3249 at commit ea4e121.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SubqueryExpression(value: Expression, subquery: LogicalPlan) extends Expression

@SparkQA
Copy link

SparkQA commented Dec 5, 2014

Test build #24178 has finished for PR 3249 at commit d62887e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SubqueryExpression(value: Expression, subquery: LogicalPlan) extends Expression

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space before {.

Also I would consider doing this in two steps to avoid depending on transform for side effects: a collect to get the list and then a transform to replace with true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Done in two steps.

@marmbrus
Copy link
Contributor

Sorry for the delay reviewing this, but I finally had time to take a more thorough look at the code. I think this will be a really cool feature to have but I think there is some work that still needs to be done in the analysis rule. At a high level, I think what needs to be done is as follows: make sure the sub tree is already analyzed, which should simplify some things. Second, when possible we should be working with attributes in a way that uses expression ids (i.e. using AttributeSet and AttributeMap when possible).

@SparkQA
Copy link

SparkQA commented Feb 27, 2015

Test build #28083 has finished for PR 3249 at commit 7653eee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SubqueryExpression(subquery: LogicalPlan) extends Expression
    • logError("User class threw exception: " + cause.getMessage, cause)

@chenghao-intel
Copy link
Contributor

@ravipesala, do you have any idea how to implement the NOT IN? I believe we should consider how to implement the NOT IN when doing IN, or should they come within the same PR?

BTW, can you also enable the hive compatible test like subquery_in.q or subquery_in_having.q if you think that's also supported in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of making the Subquery as a fake expression, a better idea probably create a new logical plan like

SubQueryIn(left: LogicalPlan, nested: LogicalPlan, isNotIn:Boolean)

That's also how I implement the EXISTS at https://github.com/apache/spark/pull/4812/files#diff-9a11e98e8f4bd1c4bb18ca6a7a7b8948R262

@chenghao-intel
Copy link
Contributor

Thank you @ravipesala for implementing this, however, this PR probably involve some unnecessary join condition transformation, probably you need to understand the rule of pushing down the join filter / condition first. Sorry, please correct me if I misunderstood something.

@ravipesala
Copy link
Contributor Author

@chenghao-intel Thank you for reviewing it.I will go through your comments and fix it. And regarding not in case we can use left outer join . I will try to add to same PR.

@marmbrus
Copy link
Contributor

marmbrus commented Apr 3, 2015

Sorry for letting this languish. What is the status here and how does this relate to #4812?

@SparkQA
Copy link

SparkQA commented Jun 24, 2015

Test build #35706 has finished for PR 3249 at commit 7653eee.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • case class SubqueryExpression(subquery: LogicalPlan) extends Expression

@andrewor14
Copy link
Contributor

@ravipesala can you answer @marmbrus' question and/or rebase this to master so we can decide how to proceed with it?

@andrewor14
Copy link
Contributor

Also cc @chenghao-intel who wrote the similar patch #4812

@rxin
Copy link
Contributor

rxin commented Dec 31, 2015

I'm going to close this pull request. If this is still relevant and you are interested in pushing it forward, please open a new pull request. Thanks!

@asfgit asfgit closed this in 7b4452b Dec 31, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants