[SPARK-39915][SQL] Ensure the output partitioning is user-specified in AQE #37612

ulysses-you · 2022-08-22T10:04:50Z

What changes were proposed in this pull request?

Support get user-specified root repartition through DeserializeToObjectExec
Skip optimize empty for the root repartition which is user-specified
Add a new rule AdjustShuffleExchangePosition to adjust the shuffle we add back, so that we can restore shuffle safely.

Why are the changes needed?

AQE can not completely respect the user-specified repartition. The main reasons are:

the AQE optimzier will convert empty to local relation which does not reserve the partitioning info
the machine of AQE requiredDistribution only restore the repartition which does not support through DeserializeToObjectExec

After the fix:
The partition number of spark.range(0).repartition(5).rdd.getNumPartitions should be 5.

Does this PR introduce any user-facing change?

yes, ensure the user-specified distribution.

How was this patch tested?

add tests

ulysses-you · 2022-08-23T06:05:32Z

cc @wangyum @zsxwing @cloud-fan @maryannxue if you have time to review

cloud-fan · 2022-08-24T08:52:59Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEUtils.scala

as `EnsureRequirements` can only optimize out user-specified repartition with `HashPartitioning`

This is not true any more?

I only removed the with HashPartitioning since we are going to check all partitionings which come from repartition.

But if EnsureRequirements can't remove user-specified repartition that is not HashPartitioning, we don't need to change here?

That is right, but it can not describe the following code equivalently.. I changed this comments to:

// Note, here are two cases of how user-specified repartition can be optimized out: // 1. `EnsureRequirements` can only optimize out user-specified repartition with // `HashPartitioning`. // 2. `AQEOptimizer` can optimize out user-specified repartition with all `Partitioning`, // e.g. convert empty to local relation.

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala

cloud-fan · 2022-08-24T08:58:18Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ValidateSparkPlan.scala

do we have a test case to cover this case, or it's just a safe guard for now?

I think it's just a safe guard, the error msg is logging by logOnLevel which is debug by default at reOptimize

It's a little overkill to validate parititoning again after EnsureRequirement. I removed this part of code.

ulysses-you · 2022-08-25T01:40:32Z

It should be clear now. This pr only did two things:

Only apply AQE for the children of DeserializeToObjectExec
Check all partitioning for requiredDistribution so we can have a chance to add it back

cloud-fan · 2022-08-25T02:58:59Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala

Suggested change

// `DeserializeToObjectExec` is used by Spark internal e.g. `Dataset.rdd`.

// `DeserializeToObjectExec` is used by Spark internally e.g. `Dataset.rdd`.

cloud-fan · 2022-08-25T02:59:17Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala

Suggested change

// This is conflict with AQE framework since we may add shuffle back during re-optimize

// This conflicts with AQE framework since we may add shuffle back during re-optimize

cloud-fan · 2022-08-25T03:01:28Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEUtils.scala

shall we fix PropagateEmptyRelationBase instead? I don't think we can optimize out Repartition which breaks user expectations. The change here only covers AQE and I think this is a problem for non AQE as well.

Or, we can still optimize out Repartition, but we should assign a Partitioning to LocalRelation. Ideally a empty data can be any partitioning.

shall we fix PropagateEmptyRelationBase instead? I don't think we can optimize out Repartition which breaks user expectations. The change here only covers AQE and I think this is a problem for non AQE as well.

if we do not optimize the repartition at the top of empty relation, we also can not optimize other plan at the top of repartition. It may become a regression. We should only preserve the final repatition which can affect the final output partition as the requiredDistribution did.

Or, we can still optimize out Repartition, but we should assign a Partitioning to LocalRelation. Ideally a empty data can be any partitioning.

I actually thought about it, but it's not easy. If we want to assign a Partitioning to LocalRelation, we also need to consider how to propagate the Partitioning when we through other plan otherwise it does less meaning. Then we need care the output partitioning for all logical plan. This should not be expected since the partitioning is a physical concept.

OK let me make my proposal clear: let's not optimize out repartition if it's the root node, or below Project/Filter, in any cases. What do you think?

skip optimize root user-specified repartition by 5466289

cloud-fan · 2022-08-26T07:47:08Z

...catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelation.scala

can we make a new PR for this change? It's not only related to AQE.

ok, will send a new for non-AQE part

cloud-fan · 2022-08-26T07:47:28Z

...catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelation.scala

how can a leaf node be root repartition?

it is LogicalQueryStage in AQE since the reparition is planned to shuffle

cloud-fan · 2022-08-26T07:50:01Z

...catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelation.scala

is it possible to move the apply method to the base class, so that we can share more code?

cloud-fan · 2022-08-30T15:11:52Z

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala

why is this not retained? repartition under Project should be respected.

the test is outdate, fixed them in 048e781bcaf5f78d28666c5af16745c6a93cc08c

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala

ulysses-you · 2022-09-01T01:59:11Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/LogicalQueryStage.scala

this is for AQE side to tag TreePattern. cc @cloud-fan

cloud-fan · 2022-09-01T02:06:58Z

...catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelation.scala

do we still need this comment change?

cloud-fan · 2022-09-01T02:10:24Z

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala

is it another bug? the num partitions is not respected.

it's a magic from RangePartitioner, not sure it is a bug. RangePartitioner do sample to decide the range partition bounds, and the bounds will be empty if the data is empty. Then the output partition number is 1.

spark/core/src/main/scala/org/apache/spark/Partitioner.scala

Line 220 in a027db1

def numPartitions: Int = rangeBounds.length + 1

cloud-fan · 2022-09-13T06:36:19Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

      DisableUnnecessaryBucketedScan,
-      OptimizeSkewedJoin(ensureRequirements)
+      OptimizeSkewedJoin(ensureRequirements),
+      AdjustShuffleExchangePosition


shall we put it right after ensureRequirements?

ulysses-you · 2022-09-13T09:10:04Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

      testData.select("key").collect().toSeq)

-    assert(spark.emptyDataFrame.coalesce(1).rdd.partitions.size === 0)
+    assert(spark.emptyDataFrame.coalesce(1).rdd.partitions.size === 1)


one more failed test

cloud-fan · 2022-09-13T12:56:55Z

@ulysses-you can you update the PR description? I think this is ready to merge.

ulysses-you · 2022-09-13T13:11:05Z

thank you @cloud-fan, updated

cloud-fan · 2022-09-14T00:39:07Z

thanks, merging to master/3.3! (the last commit only add comments)

…n AQE ### What changes were proposed in this pull request? - Support get user-specified root repartition through `DeserializeToObjectExec` - Skip optimize empty for the root repartition which is user-specified - Add a new rule `AdjustShuffleExchangePosition` to adjust the shuffle we add back, so that we can restore shuffle safely. ### Why are the changes needed? AQE can not completely respect the user-specified repartition. The main reasons are: 1. the AQE optimzier will convert empty to local relation which does not reserve the partitioning info 2. the machine of AQE `requiredDistribution` only restore the repartition which does not support through `DeserializeToObjectExec` After the fix: The partition number of `spark.range(0).repartition(5).rdd.getNumPartitions` should be 5. ### Does this PR introduce _any_ user-facing change? yes, ensure the user-specified distribution. ### How was this patch tested? add tests Closes #37612 from ulysses-you/output-partition. Lead-authored-by: ulysses-you <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 801ca25) Signed-off-by: Wenchen Fan <[email protected]>

github-actions bot added the SQL label Aug 22, 2022

ulysses-you marked this pull request as draft August 22, 2022 11:24

ulysses-you force-pushed the output-partition branch 3 times, most recently from 23eb828 to e744b4e Compare August 23, 2022 03:16

ulysses-you marked this pull request as ready for review August 23, 2022 03:17

ulysses-you force-pushed the output-partition branch from e744b4e to 00e8406 Compare August 24, 2022 08:14

cloud-fan reviewed Aug 24, 2022

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Aug 24, 2022

View reviewed changes

ulysses-you force-pushed the output-partition branch from 239ec08 to bd73540 Compare August 25, 2022 01:27

cloud-fan reviewed Aug 25, 2022

View reviewed changes

cloud-fan reviewed Aug 26, 2022

View reviewed changes

ulysses-you force-pushed the output-partition branch from 5466289 to db00f58 Compare August 30, 2022 14:27

cloud-fan reviewed Aug 30, 2022

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala Outdated Show resolved Hide resolved

ulysses-you force-pushed the output-partition branch from db00f58 to 048e781 Compare August 31, 2022 02:55

ulysses-you commented Sep 1, 2022

View reviewed changes

cloud-fan reviewed Sep 1, 2022

View reviewed changes

cloud-fan approved these changes Sep 1, 2022

View reviewed changes

ulysses-you added 3 commits September 9, 2022 12:17

fix test

be42e23

fast fail

c12ea84

comments

20ab8d4

ulysses-you force-pushed the output-partition branch from 087aba7 to 20ab8d4 Compare September 9, 2022 04:18

adjust DeserializeToObjectExec

064342c

cloud-fan reviewed Sep 13, 2022

View reviewed changes

cloud-fan approved these changes Sep 13, 2022

View reviewed changes

ulysses-you added 2 commits September 13, 2022 14:49

adjust rule order

9c33b2f

fix test

25717cb

ulysses-you commented Sep 13, 2022

View reviewed changes

Update AdjustShuffleExchangePosition.scala

4709065

cloud-fan closed this in 801ca25 Sep 14, 2022

ulysses-you deleted the output-partition branch September 14, 2022 01:29

	// `DeserializeToObjectExec` is used by Spark internal e.g. `Dataset.rdd`.
	// `DeserializeToObjectExec` is used by Spark internally e.g. `Dataset.rdd`.

	// This is conflict with AQE framework since we may add shuffle back during re-optimize
	// This conflicts with AQE framework since we may add shuffle back during re-optimize

[SPARK-39915][SQL] Ensure the output partitioning is user-specified in AQE #37612

[SPARK-39915][SQL] Ensure the output partitioning is user-specified in AQE #37612

Uh oh!

Conversation

ulysses-you commented Aug 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

ulysses-you commented Aug 23, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ulysses-you commented Aug 25, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Sep 13, 2022

Uh oh!

ulysses-you commented Sep 13, 2022

Uh oh!

cloud-fan commented Sep 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ulysses-you commented Aug 22, 2022 •

edited

Loading

cloud-fan commented Sep 14, 2022 •

edited

Loading