Skip to content

Conversation

@jiangxb1987
Copy link
Contributor

@jiangxb1987 jiangxb1987 commented Aug 1, 2018

What changes were proposed in this pull request?

Kill all running tasks when a task in a barrier stage fail in the middle. TaskScheduler.cancelTasks() will also fail the job, so we implemented a new method killAllTaskAttempts() to just kill all running tasks of a stage without cancel the stage/job.

How was this patch tested?

Add new test cases in TaskSchedulerImplSuite.

@holdensmagicalunicorn
Copy link

@jiangxb1987, thanks! I am a bot who has found some folks who might be able to help with the review:@gatorsmile, @mateiz and @kayousterhout

interruptThread: Boolean,
reason: String): Unit = synchronized {
logInfo(s"Killing all running tasks in stage $stageId: $reason")
taskSetsByStageIdAndAttempt.get(stageId).foreach { attempts =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is some dup code and we dropped the useful comments from cancelTasks. It would be great if we move the common code here with comment and let cancelTasks call this method.

@mengxr
Copy link
Contributor

mengxr commented Aug 1, 2018

@jiangxb1987 Could you add a test to the new method?

@SparkQA
Copy link

SparkQA commented Aug 1, 2018

Test build #93882 has finished for PR 21943 at commit 97bfba8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 1, 2018

Test build #93893 has finished for PR 21943 at commit b5fca7a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor Author

@mengxr Updated and added test cases, PTAL!

def submitTasks(taskSet: TaskSet): Unit

// Cancel a stage.
// Kill all the tasks in a stage and fail the stage and all the jobs that depend on the stage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it guaranteed to work for any backend like YARN, Mesos, K8s?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated comment to note that if the backend doesn't support kill a task then the method shall throw UnsupportedOperationException.

@mengxr
Copy link
Contributor

mengxr commented Aug 2, 2018

LGTM

@SparkQA
Copy link

SparkQA commented Aug 2, 2018

Test build #93951 has finished for PR 21943 at commit 7cca33f.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 2, 2018

Test build #93945 has finished for PR 21943 at commit bc77a8e.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor Author

retest this please

@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Aug 2, 2018

Test build #93969 has finished for PR 21943 at commit 7cca33f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Aug 2, 2018

Test build #93968 has finished for PR 21943 at commit 7cca33f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in 2754157 Aug 2, 2018
@jiangxb1987 jiangxb1987 deleted the killAllTasks branch August 2, 2018 13:01
@SparkQA
Copy link

SparkQA commented Aug 2, 2018

Test build #93996 has finished for PR 21943 at commit 7cca33f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants