Skip to content

Conversation

@cfmcgrady
Copy link
Contributor

What changes were proposed in this pull request?

Use UnresolvedHint.resolved = child.resolved instead UnresolvedHint.resolved = false, then the plan contains UnresolvedHint child can be optimized by rule in batch Resolution.

For instance, before this pr, the following plan can't be optimized by ResolveReferences.

!'Project [*]                                      
 +- SubqueryAlias __auto_generated_subquery_name  
    +- UnresolvedHint use_hash                     
       +- Project [42 AS 42#10]                    
          +- OneRowRelation

Why are the changes needed?

fix hint in subquery bug

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New test.

@github-actions github-actions bot added the SQL label Jun 9, 2021
@cfmcgrady
Copy link
Contributor Author

cc @cloud-fan @maropu

extends UnaryNode {

override lazy val resolved: Boolean = false
override lazy val resolved: Boolean = child.resolved
Copy link
Contributor

@cloud-fan cloud-fan Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add some comments to explain the reason: we need it to be resolved so that the analyzer can continue to analyze the rest of the query plan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make up for this change, let's add a check in CheckAnalysis and fail if the plan still contains UnresolvedHint after analysis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok for me to add this check.
And note that in the build-in analyzer, all of UnresolvedHint will be removed by batch Remove Unresolved Hints.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, it's just for sanity check, to make sure UnresolvedHint shouldn't exist after analysis.

@cloud-fan
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Jun 10, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44142/

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.1/3.0!

@cloud-fan cloud-fan closed this in 5280f02 Jun 10, 2021
cloud-fan pushed a commit that referenced this pull request Jun 10, 2021
…query

Use `UnresolvedHint.resolved = child.resolved` instead `UnresolvedHint.resolved = false`, then the plan contains `UnresolvedHint` child can be optimized by rule in batch `Resolution`.

For instance, before this pr, the following plan can't be optimized by `ResolveReferences`.
```
!'Project [*]
 +- SubqueryAlias __auto_generated_subquery_name
    +- UnresolvedHint use_hash
       +- Project [42 AS 42#10]
          +- OneRowRelation
```

fix hint in subquery bug

No.

New test.

Closes #32841 from cfmcgrady/SPARK-35673.

Authored-by: Fu Chen <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 5280f02)
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Jun 10, 2021
…query

Use `UnresolvedHint.resolved = child.resolved` instead `UnresolvedHint.resolved = false`, then the plan contains `UnresolvedHint` child can be optimized by rule in batch `Resolution`.

For instance, before this pr, the following plan can't be optimized by `ResolveReferences`.
```
!'Project [*]
 +- SubqueryAlias __auto_generated_subquery_name
    +- UnresolvedHint use_hash
       +- Project [42 AS 42#10]
          +- OneRowRelation
```

fix hint in subquery bug

No.

New test.

Closes #32841 from cfmcgrady/SPARK-35673.

Authored-by: Fu Chen <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 5280f02)
Signed-off-by: Wenchen Fan <[email protected]>
@SparkQA
Copy link

SparkQA commented Jun 10, 2021

Test build #139615 has finished for PR 32841 at commit 94d22b2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jun 11, 2021

late lgtm. Thank you for this fix, @cfmcgrady .

jdcasale pushed a commit to palantir/spark that referenced this pull request Jun 22, 2021
…query

Use `UnresolvedHint.resolved = child.resolved` instead `UnresolvedHint.resolved = false`, then the plan contains `UnresolvedHint` child can be optimized by rule in batch `Resolution`.

For instance, before this pr, the following plan can't be optimized by `ResolveReferences`.
```
!'Project [*]
 +- SubqueryAlias __auto_generated_subquery_name
    +- UnresolvedHint use_hash
       +- Project [42 AS 42#10]
          +- OneRowRelation
```

fix hint in subquery bug

No.

New test.

Closes apache#32841 from cfmcgrady/SPARK-35673.

Authored-by: Fu Chen <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
…query

Use `UnresolvedHint.resolved = child.resolved` instead `UnresolvedHint.resolved = false`, then the plan contains `UnresolvedHint` child can be optimized by rule in batch `Resolution`.

For instance, before this pr, the following plan can't be optimized by `ResolveReferences`.
```
!'Project [*]
 +- SubqueryAlias __auto_generated_subquery_name
    +- UnresolvedHint use_hash
       +- Project [42 AS 42#10]
          +- OneRowRelation
```

fix hint in subquery bug

No.

New test.

Closes apache#32841 from cfmcgrady/SPARK-35673.

Authored-by: Fu Chen <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 5280f02)
Signed-off-by: Wenchen Fan <[email protected]>
@AngersZhuuuu
Copy link
Contributor

This issue was introduced by #25746
Looks good.

fishcus pushed a commit to fishcus/spark that referenced this pull request Jan 12, 2022
…query

Use `UnresolvedHint.resolved = child.resolved` instead `UnresolvedHint.resolved = false`, then the plan contains `UnresolvedHint` child can be optimized by rule in batch `Resolution`.

For instance, before this pr, the following plan can't be optimized by `ResolveReferences`.
```
!'Project [*]
 +- SubqueryAlias __auto_generated_subquery_name
    +- UnresolvedHint use_hash
       +- Project [42 AS 42#10]
          +- OneRowRelation
```

fix hint in subquery bug

No.

New test.

Closes apache#32841 from cfmcgrady/SPARK-35673.

Authored-by: Fu Chen <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 5280f02)
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Nov 15, 2022
### What changes were proposed in this pull request?

Skip `UnresolvedHint` in rule `AddMetadataColumns` to avoid call exprId on `UnresolvedAttribute`.

### Why are the changes needed?

```
CREATE TABLE t1(c1 bigint) USING PARQUET;
CREATE TABLE t2(c2 bigint) USING PARQUET;
SELECT /*+ hash(t2) */ * FROM t1 join t2 on c1 = c2;
```

failed with msg:
```
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to exprId on unresolved object
  at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.exprId(unresolved.scala:147)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4(Analyzer.scala:1005)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4$adapted(Analyzer.scala:1005)
  at scala.collection.Iterator.exists(Iterator.scala:969)
  at scala.collection.Iterator.exists$(Iterator.scala:967)
  at scala.collection.AbstractIterator.exists(Iterator.scala:1431)
  at scala.collection.IterableLike.exists(IterableLike.scala:79)
  at scala.collection.IterableLike.exists$(IterableLike.scala:78)
  at scala.collection.AbstractIterable.exists(Iterable.scala:56)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3(Analyzer.scala:1005)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3$adapted(Analyzer.scala:1005)
```

But before just a warning: `WARN HintErrorLogger: Unrecognized hint: hash(t2)`

### Does this PR introduce _any_ user-facing change?

yes, fix regression from 3.3.1.

Note, the root reason is we mark `UnresolvedHint` is resolved if child is resolved since #32841, then #37758 trigger this bug.

### How was this patch tested?

add test

Closes #38662 from ulysses-you/hint.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Nov 15, 2022
Skip `UnresolvedHint` in rule `AddMetadataColumns` to avoid call exprId on `UnresolvedAttribute`.

```
CREATE TABLE t1(c1 bigint) USING PARQUET;
CREATE TABLE t2(c2 bigint) USING PARQUET;
SELECT /*+ hash(t2) */ * FROM t1 join t2 on c1 = c2;
```

failed with msg:
```
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to exprId on unresolved object
  at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.exprId(unresolved.scala:147)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4(Analyzer.scala:1005)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4$adapted(Analyzer.scala:1005)
  at scala.collection.Iterator.exists(Iterator.scala:969)
  at scala.collection.Iterator.exists$(Iterator.scala:967)
  at scala.collection.AbstractIterator.exists(Iterator.scala:1431)
  at scala.collection.IterableLike.exists(IterableLike.scala:79)
  at scala.collection.IterableLike.exists$(IterableLike.scala:78)
  at scala.collection.AbstractIterable.exists(Iterable.scala:56)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3(Analyzer.scala:1005)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3$adapted(Analyzer.scala:1005)
```

But before just a warning: `WARN HintErrorLogger: Unrecognized hint: hash(t2)`

yes, fix regression from 3.3.1.

Note, the root reason is we mark `UnresolvedHint` is resolved if child is resolved since #32841, then #37758 trigger this bug.

add test

Closes #38662 from ulysses-you/hint.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit a9bf5d2)
Signed-off-by: Wenchen Fan <[email protected]>
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
### What changes were proposed in this pull request?

Skip `UnresolvedHint` in rule `AddMetadataColumns` to avoid call exprId on `UnresolvedAttribute`.

### Why are the changes needed?

```
CREATE TABLE t1(c1 bigint) USING PARQUET;
CREATE TABLE t2(c2 bigint) USING PARQUET;
SELECT /*+ hash(t2) */ * FROM t1 join t2 on c1 = c2;
```

failed with msg:
```
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to exprId on unresolved object
  at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.exprId(unresolved.scala:147)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4(Analyzer.scala:1005)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4$adapted(Analyzer.scala:1005)
  at scala.collection.Iterator.exists(Iterator.scala:969)
  at scala.collection.Iterator.exists$(Iterator.scala:967)
  at scala.collection.AbstractIterator.exists(Iterator.scala:1431)
  at scala.collection.IterableLike.exists(IterableLike.scala:79)
  at scala.collection.IterableLike.exists$(IterableLike.scala:78)
  at scala.collection.AbstractIterable.exists(Iterable.scala:56)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3(Analyzer.scala:1005)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3$adapted(Analyzer.scala:1005)
```

But before just a warning: `WARN HintErrorLogger: Unrecognized hint: hash(t2)`

### Does this PR introduce _any_ user-facing change?

yes, fix regression from 3.3.1.

Note, the root reason is we mark `UnresolvedHint` is resolved if child is resolved since apache#32841, then apache#37758 trigger this bug.

### How was this patch tested?

add test

Closes apache#38662 from ulysses-you/hint.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants