-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17644] [CORE] Do not add failedStages when abortStage for fetch failure #15213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
8e667f5
2bfa05b
d02cf93
7056cd6
1f7bd88
d92adfc
1127ca1
f91d86f
09077cb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2112,6 +2112,8 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with Timeou | |
| test("The failed stage never resubmitted due to abort stage in another thread") { | ||
| implicit val executorContext = ExecutionContext | ||
| .fromExecutorService(Executors.newFixedThreadPool(5)) | ||
| val duration = 60.seconds | ||
|
|
||
| val f1 = Future { | ||
|
||
| try { | ||
| val rdd1 = sc.makeRDD(Array(1, 2, 3, 4), 2).map(x => (x, 1)).groupByKey() | ||
|
|
@@ -2125,10 +2127,10 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with Timeou | |
| }.count() | ||
| } catch { | ||
| case e: Throwable => | ||
|
||
| logInfo("expected abort stage: " + e.getMessage) | ||
| logInfo("expected abort stage1: " + e.getMessage) | ||
| } | ||
| } | ||
| Thread.sleep(10000) | ||
| ThreadUtils.awaitResult(f1, duration) | ||
| val f2 = Future { | ||
|
||
| try { | ||
| val rdd2 = sc.makeRDD(Array(1, 2, 3, 4), 2).map(x => (x, 1)).groupByKey() | ||
|
|
@@ -2142,11 +2144,9 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with Timeou | |
| }.count() | ||
| } catch { | ||
| case e: Throwable => | ||
| println("expected abort stage2: " + e.getMessage) | ||
| logInfo("expected abort stage2: " + e.getMessage) | ||
| } | ||
| } | ||
|
|
||
| val duration = 60.seconds | ||
| ThreadUtils.awaitResult(f2, duration) | ||
| } | ||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of changes for this test:
(a) Earlier there was already some discussion around naming and the PR title has been updated, this test should be renamed as well since multiple threads really have nothing to do with it.
(b) I'd prefer if tests are named to indicate the positive behavior that we want to verify . So with the above, I'd suggest a name like "After one stage is aborted for too many failed attempts, subsequent stages still behave correctly on fetch failures"
(c) duplicated code can be cleaned up (at first when I read the code, I was looking for differences between the two calls, so though its only one copy-paste, the intent is a lot clearer if its just once).
(d) I'd think it would be nice to also include a job which succeeds after a fetch failure at the end (3 jobs total). Unfortunately this is a bit of a pain to do in a test right now since you don't have access to
stageAttemptId, but you can do it with something like this:... rdd1.map { case (x, _) if (x == 1) && FailThisAttempt._fail.getAndSet(false) => throw new FetchFailedException( BlockManagerId("1", "1", 1), shuffleHandle.shuffleId, 0, 0, "test") ...with helper