-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17454][MESOS] Use Mesos disk resources for executors. #23758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @aboten |
|
@mgummelt , would you please help reviewing this change? |
|
Thank you for your first contribution, @clems4ever. Currently, @HeartSaVioR is working on #23743 for Apache Spark 3.0.0. This PR seems to need to wait for that and to |
|
The PR is merged now. Could you rebase and follow the new style, @clems4ever ? |
|
Hello @dongjoon-hyun , sure. Thank you for noticing. |
705c823 to
0ad7902
Compare
|
@dongjoon-hyun , it's done. |
docs/running-on-mesos.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's at the default of 0, no behavior changes right? i.e. it's not required to set this in general, only in the case you cite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, as you can see in https://github.com/criteo-forks/mesos/blob/3de5efba936c8b7bd1bf88c2fd05006a93271b73/src/common/http.cpp#L725, the Mesos API returns a default value of 0 if no disk is provided.
So as far as I'm concerned it should be ok but since you asked let me do the fix to avoid providing any disk in the TaskInfo if not specified in the conf. That way we'll be sure that Spark remains compatible if the behavior changes on Mesos side (i.e., if it becomes "no value is not equivalent to 0").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually protected the code by inserting the disk amount only if it is > 0. But anyway I think it's better to distinguish between None and 0 because then the Mesos community can decide to change their mind at and this won't have any impact on Spark.
Fix is coming in a minute.
Before this change, there was no way to allocate a given amount of disk when using Mesos scheduler. It's good enough when using default isolation options but not when enabling the XFS isolator with hard limit in order to properly isolate all containers. In that case, the executor is killed by Mesos during the download of the Spark executor archive. Therefore, this change introduces a configuration flag, specific to Mesos, to declare the amount of disk required by the executors and therefore prevent Mesos from killing the container because the XFS hard limit has been exceeded.
0ad7902 to
fdca59a
Compare
|
@srowen , I updated the PR to treat your comment. |
|
Test build #4605 has finished for PR 23758 at commit
|
| private val useFetcherCache = conf.get(ENABLE_FETCHER_CACHE) | ||
|
|
||
| private val maxGpus = conf.get(MAX_GPUS) | ||
| private val diskPerExecutor = conf.get(EXECUTOR_DISK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this throws an exception if not set, and it's optional. You can either check .contains here and make this an Option, or have a default value of '0' or something that would indicate no reservation.
| res.asScala.filter(_.getName == name).map(_.getScalar.getValue).sum | ||
| } | ||
|
|
||
| def resourceExists(res: JList[Resource], name: String): Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is only used in test code, I'd move it there.
| var (remainingResources, resourcesToUse) = (nonPortResources, | ||
| cpuResourcesToUse ++ memResourcesToUse ++ portResourcesToUse ++ gpuResourcesToUse) | ||
|
|
||
| if (taskDisk.isDefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice and safe. I think it's OK to be consistent with how GPUs are handled -- which may mean it's good to copy your approach for GPU config too. You can use a default of 0 for disk too (I suppose that's nice as if someone sets it to 0, that should have a similar meaning as not set).
|
Can one of the admins verify this patch? |
|
We're closing this PR because it hasn't been updated in a while. If you'd like to revive this PR, please reopen it! |
What changes were proposed in this pull request?
Before this change, there was no way to allocate a given amount of
disk when using Mesos scheduler. It's good enough when using default isolation
options but not when enabling the XFS isolator with hard limit in order to
properly isolate all containers. In that case, the executor is killed by Mesos
during the download of the Spark executor archive.
Therefore, this change introduces a configuration flag, specific to Mesos, to
declare the amount of disk required by the executors and therefore prevent
Mesos from killing the container because the XFS hard limit has been exceeded.
How was this patch tested?
I added 3 unit tests and tested my built version of Spark against a real Mesos cluster.