-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-12316] Wait a minutes to avoid cycle calling. #10475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #48321 has finished for PR 10475 at commit
|
|
CC: @harishreedharan @SaintBacchus could you add test cases for this change? |
|
This only work in cluster, but this is easy to reproduce in cluster
|
|
please see question in jira |
|
@SaintBacchus posting question here as well; you say "endless cycle call" do you mean the application master hangs? It seems like it should throw and if the application is done it should just exit anyway since the AM is just calling stop on it. I just want to clarify what is happening because I assume even if you wait a minute you could still hit the same condition once when its tearing down. |
|
Since the call to the executorUpdaterRunnable.run() is after we checked to see if new credentials were there I think this is ok to do. We just tried to update and they didn't so waiting 1 minute seems reasonable. Ideally we probably have some sort of number of retry logic in there where after a certain period we would just give up and kill the executor but since the case we've seen is shutdown I think its ok to just do this for now. |
| sparkConf, 0.8, UserGroupInformation.getCurrentUser.getCredentials) | ||
| if (timeFromNowToRenewal <= 0) { | ||
| executorUpdaterRunnable.run() | ||
| // Wait a minutes to avoid cycle calling this method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change this comment to be a bit more clear something like:
We just checked for new credentials but none were there, wait a minute and retry. This handles the shutdown case where the staging directory may have been removed.
|
Test build #51455 has finished for PR 10475 at commit
|
|
lgtm. leaving it open for a bit to see if anyone else has comments. |
|
I was going to merge into branch 1.6 but they are supposed to be starting 1.6.1 so I'll wait and get this in after that. |
When application end, AM will clean the staging dir. But if the driver trigger to update the delegation token, it will can't find the right token file and then it will endless cycle call the method 'updateCredentialsIfRequired'. Then it lead driver StackOverflowError. https://issues.apache.org/jira/browse/SPARK-12316 Author: huangzhaowei <[email protected]> Closes #10475 from SaintBacchus/SPARK-12316. (cherry picked from commit 5fcf4c2) Signed-off-by: Tom Graves <[email protected]>
When application end, AM will clean the staging dir.
But if the driver trigger to update the delegation token, it will can't find the right token file and then it will endless cycle call the method 'updateCredentialsIfRequired'.
Then it lead driver StackOverflowError.
https://issues.apache.org/jira/browse/SPARK-12316