-
Notifications
You must be signed in to change notification settings - Fork 29k
SPARK-3837. Warn when YARN kills containers for exceeding memory limits #2744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
QA tests have started for PR 2744 at commit
|
|
QA tests have finished for PR 2744 at commit
|
|
Test PASSed. |
|
Jenkins, test this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we don't print the diagnostics message here also?
af11185 to
858a268
Compare
|
Updated patch incorporates review comments |
|
looks good. thanks @sryza! |
|
This patch seems to have broken the build ? See https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/821/console for an example |
|
I'll look into it
|
|
I have a commit to just comment out the test at https://github.com/shivaram/spark-1/compare/fix-yarn-build?expand=1 -- But if you have a better fix, we can use that. |
|
@sryza I came across your PR while searching for ways to debug pyspark yarn containers running over yarn memory limits. This change will be very helpful; what are your thoughts on dumping the container process listing as well, like yarn MR tasks do? That's killer for figuring out which process in your container is out of control, like in the case of pyspark where you have a jvm and python processes to deal with. Example: |
|
@jdanbrown that seems reasonable. Mind filing a JIRA for it? |
I triggered the issue and verified the message gets printed on a pseudo-distributed cluster.