Skip to content

Conversation

@sryza
Copy link
Contributor

@sryza sryza commented Nov 11, 2014

... executors than pending tasks need.

WIP. Still need to add and fix tests.

@SparkQA
Copy link

SparkQA commented Nov 11, 2014

Test build #23210 has started for PR 3204 at commit 49fc525.

  • This patch merges cleanly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire class is designed to not rely on internals of the scheduler - if you look it gets all of it state through the listener. Can you just ask for totalPendingTasks from the listener?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that, but couldn't figure out a better way to pass this through information through. SparkListener can give us the total pending tasks at the start of a stage, but, as far as I could tell, only the TaskSetManagers know the pending tasks as a stage progresses. Reconstituting this info from onTaskStart and onTaskEnd events isn't easy because tasks can need to be resubmitted. Am I missing anything?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @sryza it's possible to estimate the size of the queue pretty well from the data in ExecutorAllocationListener, we would just ignore the effect of speculated and re-submitted tasks (and this I think is a small enough margin that it's not a big deal). I think you can just add a function called getPendingTasks to ExecutorAllocationListener and it would go through each stage and subtract the number of distinct indices started from the number of tasks in the stage. This would retain the isolation of these two components and only sacrifice a small amount of accuracy.

@SparkQA
Copy link

SparkQA commented Nov 11, 2014

Test build #23210 has finished for PR 3204 at commit 49fc525.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23210/
Test FAILed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could become 0 if spark.task.cpus > spark.executor.cores, and you're dividing by this later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a misconfiguration because executors wouldn't be able to fit a single task. Will add an exception to make it fail earlier.

@sryza
Copy link
Contributor Author

sryza commented Nov 14, 2014

Updated patch addresses review comments

@SparkQA
Copy link

SparkQA commented Nov 14, 2014

Test build #23350 has started for PR 3204 at commit c4ed549.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 14, 2014

Test build #23350 has finished for PR 3204 at commit c4ed549.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23350/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unclear. We should add that the latter means we don't need more executors than the number of pending tasks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this maxNumExecutorsToAdd

@andrewor14
Copy link
Contributor

LGTM pending the comment about rounding up

@SparkQA
Copy link

SparkQA commented Nov 14, 2014

Test build #23391 has started for PR 3204 at commit 13b53df.

  • This patch merges cleanly.

@sryza
Copy link
Contributor Author

sryza commented Nov 14, 2014

Cool, just added that as well

@SparkQA
Copy link

SparkQA commented Nov 14, 2014

Test build #23392 has started for PR 3204 at commit 35cf0e0.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 14, 2014

Test build #23391 has finished for PR 3204 at commit 13b53df.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23391/
Test PASSed.

@andrewor14
Copy link
Contributor

Alright I'm merging this into master and 1.2 thanks @sryza.

@SparkQA
Copy link

SparkQA commented Nov 14, 2014

Test build #23392 has finished for PR 3204 at commit 35cf0e0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23392/
Test PASSed.

asfgit pushed a commit that referenced this pull request Nov 15, 2014
…ore...

... executors than pending tasks need.

WIP. Still need to add and fix tests.

Author: Sandy Ryza <[email protected]>

Closes #3204 from sryza/sandy-spark-4214 and squashes the following commits:

35cf0e0 [Sandy Ryza] Add comment
13b53df [Sandy Ryza] Review feedback
067465f [Sandy Ryza] Whitespace fix
6ae080c [Sandy Ryza] Add tests and get num pending tasks from ExecutorAllocationListener
531e2b6 [Sandy Ryza] SPARK-4214. With dynamic allocation, avoid outstanding requests for more executors than pending tasks need.

(cherry picked from commit ad42b28)
Signed-off-by: Andrew Or <[email protected]>
@asfgit asfgit closed this in ad42b28 Nov 15, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants