[SPARK-12316] Wait a minutes to avoid cycle calling. #10475

SaintBacchus · 2015-12-25T07:23:17Z

When application end, AM will clean the staging dir.
But if the driver trigger to update the delegation token, it will can't find the right token file and then it will endless cycle call the method 'updateCredentialsIfRequired'.
Then it lead driver StackOverflowError.
https://issues.apache.org/jira/browse/SPARK-12316

SparkQA · 2015-12-25T07:50:21Z

Test build #48321 has finished for PR 10475 at commit b1ba56b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sarutak · 2015-12-28T20:58:20Z

CC: @harishreedharan

@SaintBacchus could you add test cases for this change?

SaintBacchus · 2015-12-31T01:27:59Z

This only work in cluster, but this is easy to reproduce in cluster

Start-up a yarn-client spark application
Remove the staging dir when AM finished write the token to HDFS but the driver had not read it.

andrewor14 · 2016-02-01T22:20:13Z

@tgravescs @harishreedharan

tgravescs · 2016-02-08T14:35:33Z

please see question in jira

tgravescs · 2016-02-09T22:31:51Z

@SaintBacchus posting question here as well;

you say "endless cycle call" do you mean the application master hangs? It seems like it should throw and if the application is done it should just exit anyway since the AM is just calling stop on it. I just want to clarify what is happening because I assume even if you wait a minute you could still hit the same condition once when its tearing down.

tgravescs · 2016-02-17T14:53:25Z

Since the call to the executorUpdaterRunnable.run() is after we checked to see if new credentials were there I think this is ok to do. We just tried to update and they didn't so waiting 1 minute seems reasonable.

Ideally we probably have some sort of number of retry logic in there where after a certain period we would just give up and kill the executor but since the case we've seen is shutdown I think its ok to just do this for now.

tgravescs · 2016-02-17T14:58:34Z

yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorDelegationTokenUpdater.scala

          sparkConf, 0.8, UserGroupInformation.getCurrentUser.getCredentials)
      if (timeFromNowToRenewal <= 0) {
-        executorUpdaterRunnable.run()
+        // Wait a minutes to avoid cycle calling this method.


Can you change this comment to be a bit more clear something like:

We just checked for new credentials but none were there, wait a minute and retry. This handles the shutdown case where the staging directory may have been removed.

SparkQA · 2016-02-18T02:38:53Z

Test build #51455 has finished for PR 10475 at commit af046ba.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tgravescs · 2016-02-18T13:59:36Z

lgtm. leaving it open for a bit to see if anyone else has comments.

tgravescs · 2016-02-24T22:55:03Z

I was going to merge into branch 1.6 but they are supposed to be starting 1.6.1 so I'll wait and get this in after that.

When application end, AM will clean the staging dir. But if the driver trigger to update the delegation token, it will can't find the right token file and then it will endless cycle call the method 'updateCredentialsIfRequired'. Then it lead driver StackOverflowError. https://issues.apache.org/jira/browse/SPARK-12316 Author: huangzhaowei <[email protected]> Closes #10475 from SaintBacchus/SPARK-12316. (cherry picked from commit 5fcf4c2) Signed-off-by: Tom Graves <[email protected]>

Wait a minutes to avoid cycle calling.

b1ba56b

tgravescs reviewed Feb 17, 2016
View reviewed changes

Add more detail comment for this issue.

af046ba

asfgit closed this in 5fcf4c2 Feb 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-12316] Wait a minutes to avoid cycle calling. #10475

[SPARK-12316] Wait a minutes to avoid cycle calling. #10475

Uh oh!

SaintBacchus commented Dec 25, 2015

Uh oh!

SparkQA commented Dec 25, 2015

Uh oh!

sarutak commented Dec 28, 2015

Uh oh!

SaintBacchus commented Dec 31, 2015

Uh oh!

andrewor14 commented Feb 1, 2016

Uh oh!

tgravescs commented Feb 8, 2016

Uh oh!

tgravescs commented Feb 9, 2016

Uh oh!

tgravescs commented Feb 17, 2016

Uh oh!

tgravescs Feb 17, 2016

Uh oh!

SparkQA commented Feb 18, 2016

Uh oh!

tgravescs commented Feb 18, 2016

Uh oh!

tgravescs commented Feb 24, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-12316] Wait a minutes to avoid cycle calling. #10475

[SPARK-12316] Wait a minutes to avoid cycle calling. #10475

Uh oh!

Conversation

SaintBacchus commented Dec 25, 2015

Uh oh!

SparkQA commented Dec 25, 2015

Uh oh!

sarutak commented Dec 28, 2015

Uh oh!

SaintBacchus commented Dec 31, 2015

Uh oh!

andrewor14 commented Feb 1, 2016

Uh oh!

tgravescs commented Feb 8, 2016

Uh oh!

tgravescs commented Feb 9, 2016

Uh oh!

tgravescs commented Feb 17, 2016

Uh oh!

tgravescs Feb 17, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 18, 2016

Uh oh!

tgravescs commented Feb 18, 2016

Uh oh!

tgravescs commented Feb 24, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants