[SPARK-49057][SQL] Do not block the AQE loop when submitting query stages #47533

cloud-fan · 2024-07-30T09:46:39Z

What changes were proposed in this pull request?

We missed the fact that submitting a shuffle or broadcast query stage can be heavy, as it needs to submit subqueries and wait for the results. This blocks the AQE loop and hurts the parallelism of AQE.

This PR fixes the problem by using shuffle/broadcast's own thread pool to wait for subqueries and other preparations.

This PR also re-implements #45234 to avoid submitting the shuffle job if the query is failed and all query stages need to be cancelled.

Why are the changes needed?

better parallelism for AQE

Does this PR introduce any user-facing change?

no

How was this patch tested?

new test case

Was this patch authored or co-authored using generative AI tooling?

no

cloud-fan · 2024-07-30T09:49:45Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala

   */
  final def cancelBroadcastJob(reason: Option[String]): Unit = {
-    if (isMaterializationStarted() && !this.relationFuture.isDone) {
+    if (!this.relationFuture.isDone) {


I do not re-implement broadcast cancellation, as we need more refactoring to move the creation of Future to BroadcastExchangeLike

cloud-fan · 2024-07-30T09:49:57Z

cc @ulysses-you @yaooqinn

yaooqinn · 2024-07-30T10:19:11Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala

+      .version("4.0.0")
+      .intConf
+      .checkValue(thres => thres > 0 && thres <= 1024, "The threshold must be in (0,1024].")
+      .createWithDefault(1024)


Can you explain why we pick this number? It might create memory pressure on the driver

The shuffle async job is just waiting for other work (subquery expression execution) to finish, which is very light-weighted. The broadcast async job executes a query and collects the result in the driver, which is very heavy. That's why we can give much larger parallelism to the shuffle async jobs. In our benchmark we found this number is reasonably good for TPC.

Is there a correlation with the number of system cores?

I don't think so, the BROADCAST_EXCHANGE_MAX_THREAD_THRESHOLD is also way larger than the driver system cores.

I'm not sure if this parameter has anything to do with SPARK-49091 or if it was just caused by SPARK-41914 which the reporter pointed to.

Also cc @wangyum

Update: SPARK-49091 is not related

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala

cloud-fan · 2024-08-05T06:55:26Z

thanks for the review, merging to master!

LuciferYang · 2024-10-03T16:30:52Z

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala

  }

-  test("SPARK-47148: AQE should avoid to materialize ShuffleQueryStage on the cancellation") {
+  test("SPARK-47148: AQE should avoid to submit shuffle job on cancellation") {


This test case seems to be a bit unstable:

https://github.com/apache/spark/actions/runs/11126022068/job/30915049564

https://github.com/apache/spark/actions/runs/11137584720/job/31007122056

[info] - SPARK-47148: AQE should avoid to submit shuffle job on cancellation *** FAILED *** (6 seconds, 94 milliseconds) [info] "Multiple failures in stage materialization." did not contain "coalesce test error" (AdaptiveQueryExecSuite.scala:939)

https://github.com/apache/spark/actions/runs/11162559129/job/31027480672

- SPARK-47148: AQE should avoid to submit shuffle job on cancellation *** FAILED *** "[SCALAR_SUBQUERY_TOO_MANY_ROWS] More than one row returned by a subquery used as an expression. SQLSTATE: 21000 == SQL (line 1, position 12) == SELECT id, (SELECT slow_udf() FROM range(2)) FROM range(5) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ " did not contain "coalesce test error" (AdaptiveQueryExecSuite.scala:939)

any good solutions?

I think it's because the slow_udf is not slow enough and the shuffle stage was submitted too early. Can you try to increase the sleep time in slow_udf and see if it fixes the problem?

dongjoon-hyun · 2024-10-16T14:55:43Z

To @cloud-fan and @LuciferYang , I made a follow-up PR to fix Multiple failures in stage materialization failure. I also hit the same issues multiple times in CIs.

[SPARK-49057][SQL][TESTS][FOLLOWUP] Handle _LEGACY_ERROR_TEMP_2235 error case #48498

[info] - SPARK-47148: AQE should avoid to submit shuffle job on cancellation *** FAILED *** (6 seconds, 94 milliseconds)
[info]   "Multiple failures in stage materialization." did not contain "coalesce test error" (AdaptiveQueryExecSuite.scala:939)

…error case ### What changes were proposed in this pull request? This PR aims to fix a flaky test by handling `_LEGACY_ERROR_TEMP_2235`(multiple failures exception) in addition to the single exception. ### Why are the changes needed? After merging - #47533 The following failures were reported multiple times in the PR and today. - https://github.com/apache/spark/actions/runs/11358629880/job/31593568476 - https://github.com/apache/spark/actions/runs/11367718498/job/31621128680 - https://github.com/apache/spark/actions/runs/11360602982/job/31598792247 ``` [info] - SPARK-47148: AQE should avoid to submit shuffle job on cancellation *** FAILED *** (6 seconds, 92 milliseconds) [info] "Multiple failures in stage materialization." did not contain "coalesce test error" (AdaptiveQueryExecSuite.scala:939) ``` The root cause is that `AdaptiveSparkPlanExec.cleanUpAndThrowException` throws two types of exceptions. When there are multiple errors, `_LEGACY_ERROR_TEMP_2235` is thrown. We need to handle this too in the test case. https://github.com/apache/spark/blob/bcfe62b9988f9b00c23de0b71acc1c6170edee9e/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala#L843-L850 https://github.com/apache/spark/blob/bcfe62b9988f9b00c23de0b71acc1c6170edee9e/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala#L1916-L1921 ### Does this PR introduce _any_ user-facing change? No, this is a test-only change. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48498 from dongjoon-hyun/SPARK-49057. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

Do not block the AQE loop when submitting query stages

8269081

github-actions bot added the SQL label Jul 30, 2024

cloud-fan commented Jul 30, 2024

View reviewed changes

yaooqinn reviewed Jul 30, 2024

View reviewed changes

ulysses-you reviewed Jul 30, 2024

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala Show resolved Hide resolved

address comments

fbed525

ulysses-you reviewed Aug 2, 2024

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala Outdated Show resolved Hide resolved

address comment

3c7fc0a

ulysses-you approved these changes Aug 5, 2024

View reviewed changes

cloud-fan closed this in f01eafd Aug 5, 2024

LuciferYang reviewed Oct 3, 2024

View reviewed changes

dongjoon-hyun mentioned this pull request Oct 16, 2024

[SPARK-49057][SQL][TESTS][FOLLOWUP] Handle _LEGACY_ERROR_TEMP_2235 error case #48498

Closed

[SPARK-49057][SQL] Do not block the AQE loop when submitting query stages #47533

[SPARK-49057][SQL] Do not block the AQE loop when submitting query stages #47533

Uh oh!

Conversation

cloud-fan commented Jul 30, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Jul 30, 2024

Uh oh!

yaooqinn Jul 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cloud-fan commented Aug 5, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Oct 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yaooqinn Jul 30, 2024 •

edited

Loading