-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-39915][SQL] Ensure the output partitioning is user-specified in AQE #37612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
23eb828 to
e744b4e
Compare
|
cc @wangyum @zsxwing @cloud-fan @maryannxue if you have time to review |
e744b4e to
00e8406
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as `EnsureRequirements` can only optimize out user-specified repartition with `HashPartitioning`
This is not true any more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only removed the with HashPartitioning since we are going to check all partitionings which come from repartition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if EnsureRequirements can't remove user-specified repartition that is not HashPartitioning, we don't need to change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is right, but it can not describe the following code equivalently.. I changed this comments to:
// Note, here are two cases of how user-specified repartition can be optimized out:
// 1. `EnsureRequirements` can only optimize out user-specified repartition with
// `HashPartitioning`.
// 2. `AQEOptimizer` can optimize out user-specified repartition with all `Partitioning`,
// e.g. convert empty to local relation.
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a test case to cover this case, or it's just a safe guard for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's just a safe guard, the error msg is logging by logOnLevel which is debug by default at reOptimize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little overkill to validate parititoning again after EnsureRequirement. I removed this part of code.
239ec08 to
bd73540
Compare
|
It should be clear now. This pr only did two things:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // `DeserializeToObjectExec` is used by Spark internal e.g. `Dataset.rdd`. | |
| // `DeserializeToObjectExec` is used by Spark internally e.g. `Dataset.rdd`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // This is conflict with AQE framework since we may add shuffle back during re-optimize | |
| // This conflicts with AQE framework since we may add shuffle back during re-optimize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we fix PropagateEmptyRelationBase instead? I don't think we can optimize out Repartition which breaks user expectations. The change here only covers AQE and I think this is a problem for non AQE as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, we can still optimize out Repartition, but we should assign a Partitioning to LocalRelation. Ideally a empty data can be any partitioning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we fix PropagateEmptyRelationBase instead? I don't think we can optimize out Repartition which breaks user expectations. The change here only covers AQE and I think this is a problem for non AQE as well.
if we do not optimize the repartition at the top of empty relation, we also can not optimize other plan at the top of repartition. It may become a regression. We should only preserve the final repatition which can affect the final output partition as the requiredDistribution did.
Or, we can still optimize out Repartition, but we should assign a Partitioning to LocalRelation. Ideally a empty data can be any partitioning.
I actually thought about it, but it's not easy. If we want to assign a Partitioning to LocalRelation, we also need to consider how to propagate the Partitioning when we through other plan otherwise it does less meaning. Then we need care the output partitioning for all logical plan. This should not be expected since the partitioning is a physical concept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK let me make my proposal clear: let's not optimize out repartition if it's the root node, or below Project/Filter, in any cases. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skip optimize root user-specified repartition by 5466289
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make a new PR for this change? It's not only related to AQE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will send a new for non-AQE part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can a leaf node be root repartition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is LogicalQueryStage in AQE since the reparition is planned to shuffle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to move the apply method to the base class, so that we can share more code?
5466289 to
db00f58
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this not retained? repartition under Project should be respected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test is outdate, fixed them in 048e781bcaf5f78d28666c5af16745c6a93cc08c
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
Outdated
Show resolved
Hide resolved
db00f58 to
048e781
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is for AQE side to tag TreePattern. cc @cloud-fan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still need this comment change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it another bug? the num partitions is not respected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a magic from RangePartitioner, not sure it is a bug. RangePartitioner do sample to decide the range partition bounds, and the bounds will be empty if the data is empty. Then the output partition number is 1.
| def numPartitions: Int = rangeBounds.length + 1 |
087aba7 to
20ab8d4
Compare
| DisableUnnecessaryBucketedScan, | ||
| OptimizeSkewedJoin(ensureRequirements) | ||
| OptimizeSkewedJoin(ensureRequirements), | ||
| AdjustShuffleExchangePosition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we put it right after ensureRequirements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
| testData.select("key").collect().toSeq) | ||
|
|
||
| assert(spark.emptyDataFrame.coalesce(1).rdd.partitions.size === 0) | ||
| assert(spark.emptyDataFrame.coalesce(1).rdd.partitions.size === 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more failed test
|
@ulysses-you can you update the PR description? I think this is ready to merge. |
|
thank you @cloud-fan, updated |
|
thanks, merging to master/3.3! (the last commit only add comments) |
…n AQE ### What changes were proposed in this pull request? - Support get user-specified root repartition through `DeserializeToObjectExec` - Skip optimize empty for the root repartition which is user-specified - Add a new rule `AdjustShuffleExchangePosition` to adjust the shuffle we add back, so that we can restore shuffle safely. ### Why are the changes needed? AQE can not completely respect the user-specified repartition. The main reasons are: 1. the AQE optimzier will convert empty to local relation which does not reserve the partitioning info 2. the machine of AQE `requiredDistribution` only restore the repartition which does not support through `DeserializeToObjectExec` After the fix: The partition number of `spark.range(0).repartition(5).rdd.getNumPartitions` should be 5. ### Does this PR introduce _any_ user-facing change? yes, ensure the user-specified distribution. ### How was this patch tested? add tests Closes #37612 from ulysses-you/output-partition. Lead-authored-by: ulysses-you <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 801ca25) Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
DeserializeToObjectExecAdjustShuffleExchangePositionto adjust the shuffle we add back, so that we can restore shuffle safely.Why are the changes needed?
AQE can not completely respect the user-specified repartition. The main reasons are:
requiredDistributiononly restore the repartition which does not support throughDeserializeToObjectExecAfter the fix:
The partition number of
spark.range(0).repartition(5).rdd.getNumPartitionsshould be 5.Does this PR introduce any user-facing change?
yes, ensure the user-specified distribution.
How was this patch tested?
add tests