Skip to content

Conversation

@SaintBacchus
Copy link
Contributor

When application end, AM will clean the staging dir.
But if the driver trigger to update the delegation token, it will can't find the right token file and then it will endless cycle call the method 'updateCredentialsIfRequired'.
Then it lead driver StackOverflowError.
https://issues.apache.org/jira/browse/SPARK-12316

@SparkQA
Copy link

SparkQA commented Dec 25, 2015

Test build #48321 has finished for PR 10475 at commit b1ba56b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sarutak
Copy link
Member

sarutak commented Dec 28, 2015

CC: @harishreedharan

@SaintBacchus could you add test cases for this change?

@SaintBacchus
Copy link
Contributor Author

This only work in cluster, but this is easy to reproduce in cluster

  1. Start-up a yarn-client spark application
  2. Remove the staging dir when AM finished write the token to HDFS but the driver had not read it.

@andrewor14
Copy link
Contributor

@tgravescs
Copy link
Contributor

please see question in jira

@tgravescs
Copy link
Contributor

@SaintBacchus posting question here as well;

you say "endless cycle call" do you mean the application master hangs? It seems like it should throw and if the application is done it should just exit anyway since the AM is just calling stop on it. I just want to clarify what is happening because I assume even if you wait a minute you could still hit the same condition once when its tearing down.

@tgravescs
Copy link
Contributor

Since the call to the executorUpdaterRunnable.run() is after we checked to see if new credentials were there I think this is ok to do. We just tried to update and they didn't so waiting 1 minute seems reasonable.

Ideally we probably have some sort of number of retry logic in there where after a certain period we would just give up and kill the executor but since the case we've seen is shutdown I think its ok to just do this for now.

sparkConf, 0.8, UserGroupInformation.getCurrentUser.getCredentials)
if (timeFromNowToRenewal <= 0) {
executorUpdaterRunnable.run()
// Wait a minutes to avoid cycle calling this method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change this comment to be a bit more clear something like:

We just checked for new credentials but none were there, wait a minute and retry. This handles the shutdown case where the staging directory may have been removed.

@SparkQA
Copy link

SparkQA commented Feb 18, 2016

Test build #51455 has finished for PR 10475 at commit af046ba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

lgtm. leaving it open for a bit to see if anyone else has comments.

@tgravescs
Copy link
Contributor

I was going to merge into branch 1.6 but they are supposed to be starting 1.6.1 so I'll wait and get this in after that.

asfgit pushed a commit that referenced this pull request Feb 25, 2016
When application end, AM will clean the staging dir.
But if the driver trigger to update the delegation token, it will can't find the right token file and then it will endless cycle call the method 'updateCredentialsIfRequired'.
Then it lead driver StackOverflowError.
https://issues.apache.org/jira/browse/SPARK-12316

Author: huangzhaowei <[email protected]>

Closes #10475 from SaintBacchus/SPARK-12316.

(cherry picked from commit 5fcf4c2)
Signed-off-by: Tom Graves <[email protected]>
@asfgit asfgit closed this in 5fcf4c2 Feb 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants