-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23365][CORE] Do not adjust num executors when killing idle executors. #20604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -533,7 +533,8 @@ class SparkContext(config: SparkConf) extends Logging { | |
| schedulerBackend match { | ||
| case b: ExecutorAllocationClient => | ||
| Some(new ExecutorAllocationManager( | ||
| schedulerBackend.asInstanceOf[ExecutorAllocationClient], listenerBus, _conf)) | ||
| schedulerBackend.asInstanceOf[ExecutorAllocationClient], listenerBus, _conf, | ||
| _env.blockManager.master)) | ||
| case _ => | ||
| None | ||
| } | ||
|
|
@@ -1632,6 +1633,8 @@ class SparkContext(config: SparkConf) extends Logging { | |
| * :: DeveloperApi :: | ||
| * Request that the cluster manager kill the specified executors. | ||
| * | ||
| * This is not supported when dynamic allocation is turned on. | ||
| * | ||
| * @note This is an indication to the cluster manager that the application wishes to adjust | ||
| * its resource usage downwards. If the application wishes to replace the executors it kills | ||
| * through this method with new ones, it should follow up explicitly with a call to | ||
|
|
@@ -1643,7 +1646,10 @@ class SparkContext(config: SparkConf) extends Logging { | |
| def killExecutors(executorIds: Seq[String]): Boolean = { | ||
| schedulerBackend match { | ||
| case b: ExecutorAllocationClient => | ||
| b.killExecutors(executorIds, replace = false, force = true).nonEmpty | ||
| require(executorAllocationManager.isEmpty, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a developer api, so probably ok, but this is a change in behavior. Is it just not possible to support this with dynamic allocation?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What would calling this mean with dynamic allocation on? Note this api explicitly says its meant to adjust resource usage downwards. If you've got just one executor, and then you kill it, should your app sit with 0 executors? Or even if you've got 10 executors, and you kill one -- when is dynamic allocation allowed to bump the total back up? I can't think of useful clear semantics for this (though this is not necessary to fix the bug, I could pull this out and move to a discussion in a new jira)
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure why you'd use this with dynamic allocation, but it's been possible in the past. It's probably ok to change this though.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi, @squito , I'm quite questioned about the cases:
if app sit with 0 executors, then pending tasks increase, which lead to
for this case, to be honest, I really do not get your point. But, it must blame my poor English. And, what will happens if we use this method without see these several lines in Set Actually, I think this series methods, including WDYT?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My point in general is that the semantics of combining
Thats true -- but only when pending tasks increase. But if you've got 0 executors, how do you expect pending tasks to increase? That would only happen when another taskset gets submitted, but with no executors your spark program will probably just be blocked. In the other case, I'm just trying to point out strange interactions between user control and dynamic allocation control. Imagine this sequence: Dynamic Allocation: 1000 tasks, so 1000 executors
hmm, from a quick look, I think you're right. it doesn't seem that using
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @squito , thanks for your reply.
And for And I checked
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @squito any thoughts? |
||
| "killExecutors() unsupported with Dynamic Allocation turned on") | ||
| b.killExecutors(executorIds, adjustTargetNumExecutors = true, countFailures = false, | ||
| force = true).nonEmpty | ||
| case _ => | ||
| logWarning("Killing executors is not supported by current scheduler.") | ||
| false | ||
|
|
@@ -1681,7 +1687,8 @@ class SparkContext(config: SparkConf) extends Logging { | |
| private[spark] def killAndReplaceExecutor(executorId: String): Boolean = { | ||
| schedulerBackend match { | ||
| case b: ExecutorAllocationClient => | ||
| b.killExecutors(Seq(executorId), replace = true, force = true).nonEmpty | ||
| b.killExecutors(Seq(executorId), adjustTargetNumExecutors = false, countFailures = true, | ||
| force = true).nonEmpty | ||
| case _ => | ||
| logWarning("Killing executors is not supported by current scheduler.") | ||
| false | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a little confused about this parameter.
If
force = false, it's a no op. And all call sites I've seen seem to set this parameter tofalse. So is there something I'm missing?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops, I was supposed to set
countFailures = trueinsc.killAndReplaceExecutors, thanks for catching that.