-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19755][Mesos] Blacklist is always active for MesosCoarseGrainedSchedulerBackend #20640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
IgorBerman
wants to merge
14
commits into
apache:master
from
IgorBerman:SPARK-19755-Blacklist-is-always-active-for-MesosCoarseGrainedSchedulerBackend
+36
−16
Closed
Changes from 1 commit
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
636959a
Removed hardcoded blacklist functionality, must be controled by Black…
antiout f09faf7
Removed hardcoded blacklist functionality, must be controled by Black…
antiout e2ddc1b
SPARK-19755 declining offers from blacklisted slave by BlacklistTracker
IgorBerman 66ed5af
[SPARK-19755][Mesos] reverting logging on mesos slave task(executor) …
IgorBerman a7ff8cc
[SPARK-19755][Mesos] rewording log message and changing it to error l…
IgorBerman 2c47271
[SPARK-19755][Mesos] adding comment regarding failures of mesos task …
IgorBerman 5eda874
[SPARK-19755][Mesos] specifying that it's mesos task failing
IgorBerman 104d44f
Merge branch 'master' into SPARK-19755-Blacklist-is-always-active-for…
IgorBerman 95ca22c
SPARK-19755 fixing merge
IgorBerman cb8cb57
SPARK-19755 adding jira to testcase name
IgorBerman 83cabff
SPARK-19755 removing duplicated line
IgorBerman 2fc0288
alternative
dongjoon-hyun 04f931d
Recover comments
dongjoon-hyun dbe7d18
Merge pull request #1 from dongjoon-hyun/PR-20640-2
IgorBerman File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
[SPARK-19755][Mesos] adding comment regarding failures of mesos task …
…failures and linking to relevant jira
- Loading branch information
commit 2c47271176b82e4859667ede9bb02b28b8fba50e
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to make really sure everybody understands the big change in behavior here --
nodeBlacklist()currently only gets updated based on failures in spark tasks. If a mesos task fails to even start -- that is, if a spark executor fails to launch on a node --nodeBlacklistdoes not get updated. So you could have a node that is misconfigured somehow, and you might end up repeatedly trying to launch executors on it after this changed, with the executor even failing to start each time. That is even if you have blacklisting on.This is SPARK-16630 for the non-mesos case. That is being actively worked on now -- however the work there will probably have to be yarn-specific, so there will still be followup work to get the same thing for mesos after that is in.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@squito sounds reasonable. In the mean time we have to deal with a limitation at the mesos side where the value is hardcoded. So we can move with this incrementally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe comment on this in the code here and add a JIRA for tracking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This checking looks a little late. Can we decline more faster without calculating everything?