Skip to content

Conversation

@clems4ever
Copy link

What changes were proposed in this pull request?

Before this change, there was no way to allocate a given amount of
disk when using Mesos scheduler. It's good enough when using default isolation
options but not when enabling the XFS isolator with hard limit in order to
properly isolate all containers. In that case, the executor is killed by Mesos
during the download of the Spark executor archive.

Therefore, this change introduces a configuration flag, specific to Mesos, to
declare the amount of disk required by the executors and therefore prevent
Mesos from killing the container because the XFS hard limit has been exceeded.

How was this patch tested?

I added 3 unit tests and tested my built version of Spark against a real Mesos cluster.

@clems4ever
Copy link
Author

cc @aboten

@clems4ever
Copy link
Author

@mgummelt , would you please help reviewing this change?

@dongjoon-hyun
Copy link
Member

Thank you for your first contribution, @clems4ever.

Currently, @HeartSaVioR is working on #23743 for Apache Spark 3.0.0. This PR seems to need to wait for that and to ConfigEntry for Spark 3.0.0, too.

@dongjoon-hyun
Copy link
Member

The PR is merged now. Could you rebase and follow the new style, @clems4ever ?

@clems4ever
Copy link
Author

Hello @dongjoon-hyun , sure. Thank you for noticing.

@clems4ever
Copy link
Author

@dongjoon-hyun , it's done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's at the default of 0, no behavior changes right? i.e. it's not required to set this in general, only in the case you cite?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, as you can see in https://github.com/criteo-forks/mesos/blob/3de5efba936c8b7bd1bf88c2fd05006a93271b73/src/common/http.cpp#L725, the Mesos API returns a default value of 0 if no disk is provided.

So as far as I'm concerned it should be ok but since you asked let me do the fix to avoid providing any disk in the TaskInfo if not specified in the conf. That way we'll be sure that Spark remains compatible if the behavior changes on Mesos side (i.e., if it becomes "no value is not equivalent to 0").

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually protected the code by inserting the disk amount only if it is > 0. But anyway I think it's better to distinguish between None and 0 because then the Mesos community can decide to change their mind at and this won't have any impact on Spark.
Fix is coming in a minute.

Before this change, there was no way to allocate a given amount of
disk when using Mesos scheduler. It's good enough when using default isolation
options but not when enabling the XFS isolator with hard limit in order to
properly isolate all containers. In that case, the executor is killed by Mesos
during the download of the Spark executor archive.

Therefore, this change introduces a configuration flag, specific to Mesos, to
declare the amount of disk required by the executors and therefore prevent
Mesos from killing the container because the XFS hard limit has been exceeded.
@clems4ever
Copy link
Author

@srowen , I updated the PR to treat your comment.

@SparkQA
Copy link

SparkQA commented Mar 11, 2019

Test build #4605 has finished for PR 23758 at commit fdca59a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

private val useFetcherCache = conf.get(ENABLE_FETCHER_CACHE)

private val maxGpus = conf.get(MAX_GPUS)
private val diskPerExecutor = conf.get(EXECUTOR_DISK)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this throws an exception if not set, and it's optional. You can either check .contains here and make this an Option, or have a default value of '0' or something that would indicate no reservation.

res.asScala.filter(_.getName == name).map(_.getScalar.getValue).sum
}

def resourceExists(res: JList[Resource], name: String): Boolean = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only used in test code, I'd move it there.

var (remainingResources, resourcesToUse) = (nonPortResources,
cpuResourcesToUse ++ memResourcesToUse ++ portResourcesToUse ++ gpuResourcesToUse)

if (taskDisk.isDefined) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice and safe. I think it's OK to be consistent with how GPUs are handled -- which may mean it's good to copy your approach for GPU config too. You can use a default of 0 for disk too (I suppose that's nice as if someone sets it to 0, that should have a similar meaning as not set).

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@github-actions
Copy link

github-actions bot commented Jan 2, 2020

We're closing this PR because it hasn't been updated in a while.
This isn't a judgement on the merit of the PR in any way. It's just
a way of keeping the PR queue manageable.

If you'd like to revive this PR, please reopen it!

@github-actions github-actions bot added the Stale label Jan 2, 2020
@github-actions github-actions bot closed this Jan 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants