-
Notifications
You must be signed in to change notification settings - Fork 29k
Add a note about jobs running in FIFO order in the default pool #20881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not actually correct. There is no reason why you can't define a default pool that uses FAIR scheduling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you mean that the second sentence is incorrect? I drew that conclusion based from empirical observations +
spark/core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala
Lines 109 to 117 in 992447f
However, I might very well be missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to be missing a few somethings: 1) You can define your own default pool that does FAIR scheduling within that pool, so blanket statements about "the" default pool are dangerous; 2)
spark.scheduler.modecontrols the setup of the rootPool, not the scheduling within any pool; 3) If you don't define your own pool with a name corresponding to the DEFAULT_POOL_NAME (i.e. "default"), then you are going to get a default construction of "default", which does use FIFO scheduling within that pool.So, item 2) effectively means that
spark.scheduler.modecontrols whether fair scheduling is possible at all, and it also defines the kind of scheduling that is used among the shedulable entities contained in the root pool -- i.e. among the scheduling pools nested within rootPool. One of those nested pools will be DEFAULT_POOL_NAME/"default", which will use FIFO scheduling for schedulable entities within that pool if you haven't defined it to use fair scheduling.If you just want one scheduling pool that does fair scheduling among its schedulable entities, then you need to set
spark.scheduler.modeto "FAIR" in your SparkConf and also define in the pool configuration file a "default" pool to use schedulingMode FAIR. You could alternatively define such a fair-scheduling-inside pool named something other than "default" and then make sure that all of your jobs get assigned to that pool.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks @markhamstra I think I grasp what's going on now. Some form of your comment would be a useful addition to the documentation; rationale being that there seems to be a (common?) misunderstanding about how to schedule jobs in a
FAIRway, e.g. https://stackoverflow.com/a/37882686/2813687, or myself trying to do this leading to this very PR. After reading your comment, the current documentation makes sense, and obviously this PR is incorrect (at the very least it doesn't underscore all the caveats/config knobs at play here). I'll take another look at improving the doc such that the actual behavior is obvious to Spark users not familiar with Spark scheduling nitty gritty who merely want to run a few jobs concurrently.