-
Notifications
You must be signed in to change notification settings - Fork 29k
SPARK-1714. Take advantage of AMRMClient APIs to simplify logic in YarnA... #3765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #24715 has started for PR 3765 at commit
|
|
Test build #24715 has finished for PR 3765 at commit
|
|
Test FAILed. |
|
Test build #24794 has started for PR 3765 at commit
|
|
Test build #24794 has finished for PR 3765 at commit
|
|
Test FAILed. |
|
Updated patch fixes the broken test, adds more comments, and simplifies even further. It also removes support for requesting containers based on locality, as this has been both inaccessible and internally broken for a while (the last time it worked was 0.9). I think it will be advantageous to have from a clean baseline and then revisit the approach when working on SPARK-4352. |
|
Test build #24798 has started for PR 3765 at commit
|
|
Test build #24798 has finished for PR 3765 at commit
|
|
Test PASSed. |
|
Test build #24847 has started for PR 3765 at commit
|
|
Test build #24847 has finished for PR 3765 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this value changed in the tests at all or is it for all tests we don't really want to launch containers? If the latter you could just use Utils.isTesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value differs between tests. E.g. in YarnClusterSuite we do want to launch containers, but in YarnAllocatorSuite, we don't.
|
@tgravescs are you able to take a look at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think there maybe is maxExecutors += requestedTotal. because maxExecutors is sum of current executors, including running and pending executors.
|
@sryza did you run this through a bunch of manual tests as well? Try cases like container dies/killed before all of the first ones allocated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't used anymore - you can remove it
86896b0 to
74f56dd
Compare
|
Test build #25850 has started for PR 3765 at commit
|
|
Test build #25850 has finished for PR 3765 at commit
|
|
Test PASSed. |
|
@tgravescs, uploaded a new patch that addresses your review comments. I just ran a bunch of manual tests on a 6-node, including
I noticed that YARN's |
|
Test build #25857 has started for PR 3765 at commit
|
|
Test build #25857 has finished for PR 3765 at commit
|
|
Test PASSed. |
|
looks good. Thanks @sryza |
…rnA... ...llocator The goal of this PR is to simplify YarnAllocator as much as possible and get it up to the level of code quality we see in the rest of Spark. In service of this, it does a few things: * Uses AMRMClient APIs for matching containers to requests. * Adds calls to AMRMClient.removeContainerRequest so that, when we use a container, we don't end up requesting it again. * Removes YarnAllocator's host->rack cache. YARN's RackResolver already does this caching, so this is redundant. * Adds tests for basic YarnAllocator functionality. * Breaks up the allocateResources method, which was previously nearly 300 lines. * A little bit of stylistic cleanup. * Fixes a bug that causes three times the requests to be filed when preferred host locations are given. The patch is lossy. In particular, it loses the logic for trying to avoid containers bunching up on nodes. As I understand it, the logic that's gone is: * If, in a single response from the RM, we receive a set of containers on a node, and prefer some number of containers on that node greater than 0 but less than the number we received, give back the delta between what we preferred and what we received. This seems like a weird way to avoid bunching E.g. it does nothing to avoid bunching when we don't request containers on particular nodes. Author: Sandy Ryza <[email protected]> Closes apache#3765 from sryza/sandy-spark-1714 and squashes the following commits: 32a5942 [Sandy Ryza] Muffle RackResolver logs 74f56dd [Sandy Ryza] Fix a couple comments and simplify requestTotalExecutors 60ea4bd [Sandy Ryza] Fix scalastyle ca35b53 [Sandy Ryza] Simplify further e9cf8a6 [Sandy Ryza] Fix YarnClusterSuite 257acf3 [Sandy Ryza] Remove locality stuff and more cleanup 59a3c5e [Sandy Ryza] Take out rack stuff 5f72fd5 [Sandy Ryza] Further documentation and cleanup 89edd68 [Sandy Ryza] SPARK-1714. Take advantage of AMRMClient APIs to simplify logic in YarnAllocator
...llocator
The goal of this PR is to simplify YarnAllocator as much as possible and get it up to the level of code quality we see in the rest of Spark.
In service of this, it does a few things:
The patch is lossy. In particular, it loses the logic for trying to avoid containers bunching up on nodes. As I understand it, the logic that's gone is:
This seems like a weird way to avoid bunching E.g. it does nothing to avoid bunching when we don't request containers on particular nodes.