Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented May 7, 2021

What changes were proposed in this pull request?

This PR makes the below case work well.

select a b from values(1) t(a) distribute by a;
== Parsed Logical Plan ==
'RepartitionByExpression ['a]
+- 'Project ['a AS b#42]
   +- 'SubqueryAlias t
      +- 'UnresolvedInlineTable [a], [List(1)]

== Analyzed Logical Plan ==
org.apache.spark.sql.AnalysisException: cannot resolve 'a' given input columns: [b]; line 1 pos 62;
'RepartitionByExpression ['a]
+- Project [a#48 AS b#42]
   +- SubqueryAlias t
      +- LocalRelation [a#48]

Why are the changes needed?

bugfix

Does this PR introduce any user-facing change?

yes, the original attributes can be used in distribute by / cluster by and hints like /*+ REPARTITION(3, c) */

How was this patch tested?

new tests

@github-actions github-actions bot added the SQL label May 7, 2021
@yaooqinn
Copy link
Member Author

yaooqinn commented May 7, 2021

cc @cloud-fan @maropu @HyukjinKwon thanks for reviewing

@SparkQA
Copy link

SparkQA commented May 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42753/

@SparkQA
Copy link

SparkQA commented May 7, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42753/

@cloud-fan
Copy link
Contributor

is this a regression? It seems to me that this should fail, as it's similar to sql("select a b from values(1) t(a)").repartitionBy("a")

@SparkQA
Copy link

SparkQA commented May 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42756/

@SparkQA
Copy link

SparkQA commented May 7, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42756/

@yaooqinn
Copy link
Member Author

yaooqinn commented May 7, 2021

is this a regression?

this is not a regression, but hive supports it.

It seems to me that this should fail, as it's similar to sql("select a b from values(1) t(a)").repartitionBy("a")

For resolving attributes, distribute by and cluster by clauses should behave the same as sort by and group by?

@yaooqinn yaooqinn changed the title [SPARK-35331][SQL] Attributes become unknown in RepartitionByExpression after aliased [SPARK-35331][SQL] Support resolving missing attrs for distribute/cluster by May 7, 2021
@yaooqinn yaooqinn changed the title [SPARK-35331][SQL] Support resolving missing attrs for distribute/cluster by [SPARK-35331][SQL] Support resolving missing attrs for distribute/cluster by/repartition hint May 7, 2021
@SparkQA
Copy link

SparkQA commented May 7, 2021

Test build #138231 has finished for PR 32465 at commit 0c711e3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 7, 2021

Test build #138234 has finished for PR 32465 at commit 03ed3a5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @yaooqinn .
As you mentioned, this is not a regression and didn't work before. I believe this should be merged as a new improvement to Apache Spark 3.2.0.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-35331][SQL] Support resolving missing attrs for distribute/cluster by/repartition hint [SPARK-35331][SQL] Support resolving missing attrs for distribute/cluster by/repartition May 7, 2021
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-35331][SQL] Support resolving missing attrs for distribute/cluster by/repartition [SPARK-35331][SQL] Support resolving missing attrs for distribute/cluster by/repartition hint May 7, 2021
}
}

test("SPARK-35331: Fix resolving original expression in RepartitionByExpression after aliased") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind, could you add a test case for HINT explicitly?

@dongjoon-hyun
Copy link
Member

Thank you for adding a test, @yaooqinn .

@SparkQA
Copy link

SparkQA commented May 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42795/

@SparkQA
Copy link

SparkQA commented May 8, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42795/

@yaooqinn
Copy link
Member Author

yaooqinn commented May 8, 2021

Thank you for adding a test, @yaooqinn .

It's my pleasure

@yaooqinn
Copy link
Member Author

yaooqinn commented May 8, 2021

Hi, @yaooqinn .
As you mentioned, this is not a regression and didn't work before. I believe this should be merged as a new improvement to Apache Spark 3.2.0.

OK~

@SparkQA
Copy link

SparkQA commented May 8, 2021

Test build #138273 has finished for PR 32465 at commit 2ce0fb6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42810/

@SparkQA
Copy link

SparkQA commented May 8, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42810/

@dongjoon-hyun
Copy link
Member

Merged to master. Thank you, @yaooqinn and @cloud-fan .

@SparkQA
Copy link

SparkQA commented May 8, 2021

Test build #138288 has finished for PR 32465 at commit 5163262.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn yaooqinn deleted the SPARK-35331 branch October 25, 2024 05:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants