Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Oct 16, 2024

What changes were proposed in this pull request?

This PR aims to fix a flaky test by handling _LEGACY_ERROR_TEMP_2235(multiple failures exception) in addition to the single exception.

Why are the changes needed?

After merging

The following failures were reported multiple times in the PR and today.

[info] - SPARK-47148: AQE should avoid to submit shuffle job on cancellation *** FAILED *** (6 seconds, 92 milliseconds)
[info]   "Multiple failures in stage materialization." did not contain "coalesce test error" (AdaptiveQueryExecSuite.scala:939)

The root cause is that AdaptiveSparkPlanExec.cleanUpAndThrowException throws two types of exceptions. When there are multiple errors, _LEGACY_ERROR_TEMP_2235 is thrown. We need to handle this too in the test case.

val e = if (originalErrors.size == 1) {
originalErrors.head
} else {
val se = QueryExecutionErrors.multiFailuresInStageMaterializationError(originalErrors.head)
originalErrors.tail.foreach(se.addSuppressed)
se
}
throw e

def multiFailuresInStageMaterializationError(error: Throwable): Throwable = {
new SparkException(
errorClass = "_LEGACY_ERROR_TEMP_2235",
messageParameters = Map.empty,
cause = error)
}

Does this PR introduce any user-facing change?

No, this is a test-only change.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Oct 16, 2024
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-49057][SQL][FOLLOWUP] Handle _LEGACY_ERROR_TEMP_2235 case [SPARK-49057][SQL][FOLLOWUP] Handle _LEGACY_ERROR_TEMP_2235 case Oct 16, 2024
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-49057][SQL][FOLLOWUP] Handle _LEGACY_ERROR_TEMP_2235 case [SPARK-49057][SQL][TESTS][FOLLOWUP] Handle _LEGACY_ERROR_TEMP_2235 case Oct 16, 2024
@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Oct 16, 2024

cc @cloud-fan , @ulysses-you , @yaooqinn , @LuciferYang from #47533

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-49057][SQL][TESTS][FOLLOWUP] Handle _LEGACY_ERROR_TEMP_2235 case [SPARK-49057][SQL][TESTS][FOLLOWUP] Handle _LEGACY_ERROR_TEMP_2235 error case Oct 16, 2024
Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

}
assert(error.getMessage() contains "coalesce test error")
assert(error.getMessage().contains("coalesce test error") ||
error.getMessage().contains("Multiple failures in stage materialization"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Multiple failures in stage materialization, we put each stage's failure into Exception#suppressed. Shall we find the coalesce test error in the suppressed errors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, @cloud-fan .

@dongjoon-hyun
Copy link
Member Author

Thank you, @LuciferYang and @cloud-fan . Could you review once more? I addressed all comments.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Oct 16, 2024

Could you review this too, @viirya ? This is a little very flaky in these days in CIs. Multiple times a day.

@dongjoon-hyun
Copy link
Member Author

Thank you, @LuciferYang , @cloud-fan , @viirya . Let me merge this to stabilize our CIs.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-49057 branch October 16, 2024 18:27
@ulysses-you
Copy link
Contributor

late lgtm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants