-
Notifications
You must be signed in to change notification settings - Fork 117
Delete Kubernetes resources when the client waits for and sees app completion #520
Delete Kubernetes resources when the client waits for and sees app completion #520
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other caveat is that we lose some state that might be useful for debugging the driver. For example, we lose the ConfigMap object that is required to set up the init-container, which is useful for knowing what the properties resolved to if the init-container fails. I'm not sure how problematic this will be in practice, but it is something to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ConfigMap is purely an implementation detail and really shouldn't be exposed to the users. For the purpose of debugging, I think it's better logging the ConfigMap contents instead of leaving the ConfigMap around after an application finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConfigMap aside, I think the notion of keeping pod for debugging make sense, I thought we had a config on that (it might be a different area/code path)
|
rerun integration tests please |
|
I want to re-iterate on this issue/PR. If we have concern around losing some objects like the ConfigMap for setting up the init-container, as I said above, we could log information stored in it for debugging purpose. This, IMO, is better than making the ConfigMap stick around just for debugging. Thoughts? |
|
Any thoughts on this ? It would be good to cleanup resources after completion. In a normal scenario, this is filling up a lot of services in completed state for example. |
felixcheung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey this sounds useful to have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConfigMap aside, I think the notion of keeping pod for debugging make sense, I thought we had a config on that (it might be a different area/code path)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warn if it will be GC'd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
Thanks @felixcheung for jumping on this :) |
|
hey where are we on this? |
|
@felixcheung Yes, I think we should go upstream. I created https://issues.apache.org/jira/browse/SPARK-23571. Also given that we are in the process of getting rid of the init-container, the ConfigMap for the init-container will be gone also. So it makes more sense to clean up after application completion. |
|
Sorry, didn't see this before. Same comment as in apache#20722 (comment). Why not do this during driver.stop()? - that way, 1) if we lose the driver, k8s garbage collection cleans up everything 2) if driver terminates, we clean up executors as well as auxiliary resources like configmaps etc. |
I agree. We can dump all k8s objects. My hunch is that it's not that useful, given it's a pretty deeply buried implementation detail. |
|
As discussed in apache#20722, we think the right solution is move resource management into the driver pod. This way, cleanup of auxiliary resources upon completion is guaranteed regardless of which deployment mode is used and whether the client waits for application to complete or not. |
…-on-k8s#520) Introduces the new Shuffle Writer API. Ported from #5.
…-on-k8s#520) Introduces the new Shuffle Writer API. Ported from #5.
…-on-k8s#520) Introduces the new Shuffle Writer API. Ported from #5.
What changes were proposed in this pull request?
This PR fixes #519 for the case where the submission client waits for the submitted application to finish. Upon completion of the application, the submission client deletes all Kubernetes resources created for the application to run.