-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35673][SQL] Fix user-defined hint and unrecognized hint in subquery. #32841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| extends UnaryNode { | ||
|
|
||
| override lazy val resolved: Boolean = false | ||
| override lazy val resolved: Boolean = child.resolved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add some comments to explain the reason: we need it to be resolved so that the analyzer can continue to analyze the rest of the query plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make up for this change, let's add a check in CheckAnalysis and fail if the plan still contains UnresolvedHint after analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok for me to add this check.
And note that in the build-in analyzer, all of UnresolvedHint will be removed by batch Remove Unresolved Hints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, it's just for sanity check, to make sure UnresolvedHint shouldn't exist after analysis.
|
ok to test |
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
thanks, merging to master/3.1/3.0! |
…query
Use `UnresolvedHint.resolved = child.resolved` instead `UnresolvedHint.resolved = false`, then the plan contains `UnresolvedHint` child can be optimized by rule in batch `Resolution`.
For instance, before this pr, the following plan can't be optimized by `ResolveReferences`.
```
!'Project [*]
+- SubqueryAlias __auto_generated_subquery_name
+- UnresolvedHint use_hash
+- Project [42 AS 42#10]
+- OneRowRelation
```
fix hint in subquery bug
No.
New test.
Closes #32841 from cfmcgrady/SPARK-35673.
Authored-by: Fu Chen <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 5280f02)
Signed-off-by: Wenchen Fan <[email protected]>
…query
Use `UnresolvedHint.resolved = child.resolved` instead `UnresolvedHint.resolved = false`, then the plan contains `UnresolvedHint` child can be optimized by rule in batch `Resolution`.
For instance, before this pr, the following plan can't be optimized by `ResolveReferences`.
```
!'Project [*]
+- SubqueryAlias __auto_generated_subquery_name
+- UnresolvedHint use_hash
+- Project [42 AS 42#10]
+- OneRowRelation
```
fix hint in subquery bug
No.
New test.
Closes #32841 from cfmcgrady/SPARK-35673.
Authored-by: Fu Chen <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 5280f02)
Signed-off-by: Wenchen Fan <[email protected]>
|
Test build #139615 has finished for PR 32841 at commit
|
|
late lgtm. Thank you for this fix, @cfmcgrady . |
…query
Use `UnresolvedHint.resolved = child.resolved` instead `UnresolvedHint.resolved = false`, then the plan contains `UnresolvedHint` child can be optimized by rule in batch `Resolution`.
For instance, before this pr, the following plan can't be optimized by `ResolveReferences`.
```
!'Project [*]
+- SubqueryAlias __auto_generated_subquery_name
+- UnresolvedHint use_hash
+- Project [42 AS 42#10]
+- OneRowRelation
```
fix hint in subquery bug
No.
New test.
Closes apache#32841 from cfmcgrady/SPARK-35673.
Authored-by: Fu Chen <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
…query
Use `UnresolvedHint.resolved = child.resolved` instead `UnresolvedHint.resolved = false`, then the plan contains `UnresolvedHint` child can be optimized by rule in batch `Resolution`.
For instance, before this pr, the following plan can't be optimized by `ResolveReferences`.
```
!'Project [*]
+- SubqueryAlias __auto_generated_subquery_name
+- UnresolvedHint use_hash
+- Project [42 AS 42#10]
+- OneRowRelation
```
fix hint in subquery bug
No.
New test.
Closes apache#32841 from cfmcgrady/SPARK-35673.
Authored-by: Fu Chen <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 5280f02)
Signed-off-by: Wenchen Fan <[email protected]>
|
This issue was introduced by #25746 |
…query
Use `UnresolvedHint.resolved = child.resolved` instead `UnresolvedHint.resolved = false`, then the plan contains `UnresolvedHint` child can be optimized by rule in batch `Resolution`.
For instance, before this pr, the following plan can't be optimized by `ResolveReferences`.
```
!'Project [*]
+- SubqueryAlias __auto_generated_subquery_name
+- UnresolvedHint use_hash
+- Project [42 AS 42#10]
+- OneRowRelation
```
fix hint in subquery bug
No.
New test.
Closes apache#32841 from cfmcgrady/SPARK-35673.
Authored-by: Fu Chen <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 5280f02)
Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request? Skip `UnresolvedHint` in rule `AddMetadataColumns` to avoid call exprId on `UnresolvedAttribute`. ### Why are the changes needed? ``` CREATE TABLE t1(c1 bigint) USING PARQUET; CREATE TABLE t2(c2 bigint) USING PARQUET; SELECT /*+ hash(t2) */ * FROM t1 join t2 on c1 = c2; ``` failed with msg: ``` org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to exprId on unresolved object at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.exprId(unresolved.scala:147) at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4(Analyzer.scala:1005) at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4$adapted(Analyzer.scala:1005) at scala.collection.Iterator.exists(Iterator.scala:969) at scala.collection.Iterator.exists$(Iterator.scala:967) at scala.collection.AbstractIterator.exists(Iterator.scala:1431) at scala.collection.IterableLike.exists(IterableLike.scala:79) at scala.collection.IterableLike.exists$(IterableLike.scala:78) at scala.collection.AbstractIterable.exists(Iterable.scala:56) at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3(Analyzer.scala:1005) at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3$adapted(Analyzer.scala:1005) ``` But before just a warning: `WARN HintErrorLogger: Unrecognized hint: hash(t2)` ### Does this PR introduce _any_ user-facing change? yes, fix regression from 3.3.1. Note, the root reason is we mark `UnresolvedHint` is resolved if child is resolved since #32841, then #37758 trigger this bug. ### How was this patch tested? add test Closes #38662 from ulysses-you/hint. Authored-by: ulysses-you <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
Skip `UnresolvedHint` in rule `AddMetadataColumns` to avoid call exprId on `UnresolvedAttribute`. ``` CREATE TABLE t1(c1 bigint) USING PARQUET; CREATE TABLE t2(c2 bigint) USING PARQUET; SELECT /*+ hash(t2) */ * FROM t1 join t2 on c1 = c2; ``` failed with msg: ``` org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to exprId on unresolved object at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.exprId(unresolved.scala:147) at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4(Analyzer.scala:1005) at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4$adapted(Analyzer.scala:1005) at scala.collection.Iterator.exists(Iterator.scala:969) at scala.collection.Iterator.exists$(Iterator.scala:967) at scala.collection.AbstractIterator.exists(Iterator.scala:1431) at scala.collection.IterableLike.exists(IterableLike.scala:79) at scala.collection.IterableLike.exists$(IterableLike.scala:78) at scala.collection.AbstractIterable.exists(Iterable.scala:56) at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3(Analyzer.scala:1005) at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3$adapted(Analyzer.scala:1005) ``` But before just a warning: `WARN HintErrorLogger: Unrecognized hint: hash(t2)` yes, fix regression from 3.3.1. Note, the root reason is we mark `UnresolvedHint` is resolved if child is resolved since #32841, then #37758 trigger this bug. add test Closes #38662 from ulysses-you/hint. Authored-by: ulysses-you <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit a9bf5d2) Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request? Skip `UnresolvedHint` in rule `AddMetadataColumns` to avoid call exprId on `UnresolvedAttribute`. ### Why are the changes needed? ``` CREATE TABLE t1(c1 bigint) USING PARQUET; CREATE TABLE t2(c2 bigint) USING PARQUET; SELECT /*+ hash(t2) */ * FROM t1 join t2 on c1 = c2; ``` failed with msg: ``` org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to exprId on unresolved object at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.exprId(unresolved.scala:147) at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4(Analyzer.scala:1005) at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4$adapted(Analyzer.scala:1005) at scala.collection.Iterator.exists(Iterator.scala:969) at scala.collection.Iterator.exists$(Iterator.scala:967) at scala.collection.AbstractIterator.exists(Iterator.scala:1431) at scala.collection.IterableLike.exists(IterableLike.scala:79) at scala.collection.IterableLike.exists$(IterableLike.scala:78) at scala.collection.AbstractIterable.exists(Iterable.scala:56) at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3(Analyzer.scala:1005) at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3$adapted(Analyzer.scala:1005) ``` But before just a warning: `WARN HintErrorLogger: Unrecognized hint: hash(t2)` ### Does this PR introduce _any_ user-facing change? yes, fix regression from 3.3.1. Note, the root reason is we mark `UnresolvedHint` is resolved if child is resolved since apache#32841, then apache#37758 trigger this bug. ### How was this patch tested? add test Closes apache#38662 from ulysses-you/hint. Authored-by: ulysses-you <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
Use
UnresolvedHint.resolved = child.resolvedinsteadUnresolvedHint.resolved = false, then the plan containsUnresolvedHintchild can be optimized by rule in batchResolution.For instance, before this pr, the following plan can't be optimized by
ResolveReferences.Why are the changes needed?
fix hint in subquery bug
Does this PR introduce any user-facing change?
No.
How was this patch tested?
New test.