-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35298][SQL] Migrate to transformWithPruning for rules in Optimizer.scala #32439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #138144 has finished for PR 32439 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Test build #138170 has finished for PR 32439 at commit
|
|
Test build #138168 has finished for PR 32439 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
@hvanhovell @gengliangwang @dbaliafroozeh @maryannxue this PR is ready for review. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #138175 has finished for PR 32439 at commit
|
|
Test build #138180 has finished for PR 32439 at commit
|
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Test build #138186 has finished for PR 32439 at commit
|
|
|
||
| def apply(plan: LogicalPlan): LogicalPlan = removeProjectBeforeFilter(plan transform { | ||
| def apply(plan: LogicalPlan): LogicalPlan = removeProjectBeforeFilter( | ||
| plan.transformWithPruning(AlwaysProcess.fn, ruleId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have missed sth. from the previous commit, but what difference is there between regular transform and transformWithPruning(AlwaysProcess.fn, ...) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transform internally calls
transformWithPruning(AlwaysProcess.fn, UnknownRuleId) ...
The argument comments are here:
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
Lines 434 to 441 in e08c40f
| * @param cond a Lambda expression to prune tree traversals. If `cond.apply` returns false | |
| * on a TreeNode T, skips processing T and its subtree; otherwise, processes | |
| * T and its subtree recursively. | |
| * @param ruleId is a unique Id for `rule` to prune unnecessary tree traversals. When it is | |
| * UnknownRuleId, no pruning happens. Otherwise, if `rule` (with id `ruleId`) | |
| * has been marked as in effective on a TreeNode T, skips processing T and its | |
| * subtree. Do not pass it if the rule is not purely functional and reads a | |
| * varying initial state for different invocations. |
So here,
plan.transformWithPruning(AlwaysProcess.fn, ruleId) means there's no pruning based on TreePattern bits, but there's pruning based on ruleIds (if the rule is known to ineffective on a tree instance T, it will be skipped next time when it is invoked on the same tree instance T).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks!
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
| case _ => RoundRobinPartitioning(numPartitions) | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unnecessary change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
gengliangwang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Test build #138365 has finished for PR 32439 at commit
|
maryannxue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #138408 has finished for PR 32439 at commit
|
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Test build #138419 has finished for PR 32439 at commit
|
|
Test build #138439 has finished for PR 32439 at commit
|
|
Thanks, merging to master |
|
Hi, @gengliangwang and @sigmod . |
|
@dongjoon-hyun Thanks! @sigmod and I are looking at it. We will revert this if we can't resolve it shortly. |
### What changes were proposed in this pull request? After merging #32439, there is flaky error from the Github action job "Java 11 build with Maven": ``` Error: ## Exception when compiling 473 sources to /home/runner/work/spark/spark/sql/catalyst/target/scala-2.12/classes java.lang.StackOverflowError scala.reflect.internal.Trees.itransform(Trees.scala:1376) scala.reflect.internal.Trees.itransform$(Trees.scala:1374) scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28) scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28) scala.reflect.api.Trees$Transformer.transform(Trees.scala:2563) scala.tools.nsc.transform.TypingTransformers$TypingTransformer.transform(TypingTransformers.scala:51) ``` We can resolve it by increasing the stack size of JVM to 256M. The container for Github action jobs has 7G memory so this should be fine. ### Why are the changes needed? Fix flaky test failure in Java 11 build test ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Github action test Closes #32521 from gengliangwang/increaseStackSize. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

What changes were proposed in this pull request?
Added the following TreePattern enums:
Added tree traversal pruning to the following rules in Optimizer.scala:
Why are the changes needed?
Reduce the number of tree traversals and hence improve the query compilation latency.
perf diff:
Rule name | Total Time (baseline) | Total Time (experiment) | experiment/baseline
RemoveRedundantAggregates | 51290766 | 67070477 | 1.31
RemoveNoopOperators | 192371141 | 196631275 | 1.02
RemoveNoopUnion | 49222561 | 43266681 | 0.88
LimitPushDown | 40885185 | 21672646 | 0.53
ColumnPruning | 2003406120 | 1285562149 | 0.64
CollapseRepartition | 40648048 | 72646515 | 1.79
OptimizeRepartition | 37813850 | 20600803 | 0.54
OptimizeWindowFunctions | 174426904 | 46741409 | 0.27
CollapseWindow | 38959957 | 24542426 | 0.63
TransposeWindow | 33533191 | 20414930 | 0.61
InferFiltersFromGenerate | 21758688 | 15597344 | 0.72
InferFiltersFromConstraints | 518009794 | 493282321 | 0.95
CombineUnions | 67694022 | 70550382 | 1.04
CombineFilters | 35265060 | 29005424 | 0.82
EliminateSorts | 57025509 | 19795776 | 0.35
PruneFilters | 433964815 | 465579200 | 1.07
EliminateLimits | 44275393 | 24476859 | 0.55
DecimalAggregates | 83143172 | 28816090 | 0.35
ReplaceDistinctWithAggregate | 21783760 | 18287489 | 0.84
ReplaceIntersectWithSemiJoin | 22311271 | 16566393 | 0.74
ReplaceExceptWithAntiJoin | 23838520 | 16588808 | 0.70
RewriteExceptAll | 32750296 | 29421957 | 0.90
RewriteIntersectAll | 29760454 | 21243599 | 0.71
RemoveLiteralFromGroupExpressions | 28151861 | 25270947 | 0.90
RemoveRepetitionFromGroupExpressions | 29587030 | 23447041 | 0.79
OptimizeLimitZero | 18081943 | 15597344 | 0.86
Accumulated | 4129959311 | 3112676285 | 0.75
How was this patch tested?
Existing tests.