-
Notifications
You must be signed in to change notification settings - Fork 29k
SPARK-4214. With dynamic allocation, avoid outstanding requests for more... #3204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #23210 has started for PR 3204 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire class is designed to not rely on internals of the scheduler - if you look it gets all of it state through the listener. Can you just ask for totalPendingTasks from the listener?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that, but couldn't figure out a better way to pass this through information through. SparkListener can give us the total pending tasks at the start of a stage, but, as far as I could tell, only the TaskSetManagers know the pending tasks as a stage progresses. Reconstituting this info from onTaskStart and onTaskEnd events isn't easy because tasks can need to be resubmitted. Am I missing anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @sryza it's possible to estimate the size of the queue pretty well from the data in ExecutorAllocationListener, we would just ignore the effect of speculated and re-submitted tasks (and this I think is a small enough margin that it's not a big deal). I think you can just add a function called getPendingTasks to ExecutorAllocationListener and it would go through each stage and subtract the number of distinct indices started from the number of tasks in the stage. This would retain the isolation of these two components and only sacrifice a small amount of accuracy.
|
Test build #23210 has finished for PR 3204 at commit
|
|
Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could become 0 if spark.task.cpus > spark.executor.cores, and you're dividing by this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a misconfiguration because executors wouldn't be able to fit a single task. Will add an exception to make it fail earlier.
49fc525 to
e163271
Compare
|
Updated patch addresses review comments |
|
Test build #23350 has started for PR 3204 at commit
|
|
Test build #23350 has finished for PR 3204 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unclear. We should add that the latter means we don't need more executors than the number of pending tasks.
…ore executors than pending tasks need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call this maxNumExecutorsToAdd
c4ed549 to
13b53df
Compare
|
LGTM pending the comment about rounding up |
|
Test build #23391 has started for PR 3204 at commit
|
|
Cool, just added that as well |
|
Test build #23392 has started for PR 3204 at commit
|
|
Test build #23391 has finished for PR 3204 at commit
|
|
Test PASSed. |
|
Alright I'm merging this into master and 1.2 thanks @sryza. |
|
Test build #23392 has finished for PR 3204 at commit
|
|
Test PASSed. |
…ore... ... executors than pending tasks need. WIP. Still need to add and fix tests. Author: Sandy Ryza <[email protected]> Closes #3204 from sryza/sandy-spark-4214 and squashes the following commits: 35cf0e0 [Sandy Ryza] Add comment 13b53df [Sandy Ryza] Review feedback 067465f [Sandy Ryza] Whitespace fix 6ae080c [Sandy Ryza] Add tests and get num pending tasks from ExecutorAllocationListener 531e2b6 [Sandy Ryza] SPARK-4214. With dynamic allocation, avoid outstanding requests for more executors than pending tasks need. (cherry picked from commit ad42b28) Signed-off-by: Andrew Or <[email protected]>
... executors than pending tasks need.
WIP. Still need to add and fix tests.