Skip to content

Conversation

@holdenk
Copy link
Contributor

@holdenk holdenk commented Sep 14, 2022

What changes were proposed in this pull request?

Fix the shutdown hook call through to CoarseGrainedSchedulerBackend

Why are the changes needed?

Sometimes if the driver shuts down abnormally resources may be left dangling.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests.

… since we've got zombie pods hanging around since the resource tie isn't perfect.
@github-actions github-actions bot added the CORE label Sep 14, 2022
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-40428][K8S][CORE][WIP] Add a shutdown hook in the CoarseGrainedSchedulerBackend [SPARK-40428][CORE][WIP] Add a shutdown hook in the CoarseGrainedSchedulerBackend Sep 14, 2022
private var _shutdownHookRef: AnyRef = _

_shutdownHookRef = ShutdownHookManager.addShutdownHook(
ShutdownHookManager.DEFAULT_SHUTDOWN_PRIORITY) { () =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. indentation?

@mridulm
Copy link
Contributor

mridulm commented Sep 15, 2022

SparkContext.stop will take care of this - and that is invoked from a shutdown hook already.

SparkContext.stop -> _dagScheduler.stop -> taskScheduler.stop -> backend.stop

@holdenk
Copy link
Contributor Author

holdenk commented Sep 15, 2022

hmm for whatever reason that doesn't seem to be triggering for me, let me take a look through and see where things might be getting lost in that call path @mridulm

…all stop since we've got zombie pods hanging around since the resource tie isn't perfect."

This reverts commit dfa8fc5.
@holdenk holdenk changed the title [SPARK-40428][CORE][WIP] Add a shutdown hook in the CoarseGrainedSchedulerBackend [SPARK-40428][CORE][WIP] Fix shutdown hook in the CoarseGrainedSchedulerBackend Sep 15, 2022
starvationTimer.cancel()
}
Utils.tryLogNonFatalError {
abortTimer.cancel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Timer.cancel() also require Utils.tryLogNonFatalError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point yeah we can back that part out it shouldn't throw unless it's null and it's a val so

@mridulm
Copy link
Contributor

mridulm commented Sep 18, 2022

Looks good to me - modulo @dongjoon-hyun's comment.
We might want to also wrap FallbackStorage.cleanUp in SparkContext.stop. It is not strictly required for ensuring this specific codepath goes through, but will help ensure stop goes through completely.

… don't throw anything and add one around fallbackstorage cleanup.
@holdenk holdenk changed the title [SPARK-40428][CORE][WIP] Fix shutdown hook in the CoarseGrainedSchedulerBackend [SPARK-40428][CORE] Fix shutdown hook in the CoarseGrainedSchedulerBackend Sep 30, 2022
@asfgit asfgit closed this in 505a8a0 Oct 5, 2022
@holdenk
Copy link
Contributor Author

holdenk commented Oct 5, 2022

Merged to current dev :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants