Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Conversation

@liyinan926
Copy link
Member

What changes were proposed in this pull request?

This PR fixes #519 for the case where the submission client waits for the submitted application to finish. Upon completion of the application, the submission client deletes all Kubernetes resources created for the application to run.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other caveat is that we lose some state that might be useful for debugging the driver. For example, we lose the ConfigMap object that is required to set up the init-container, which is useful for knowing what the properties resolved to if the init-container fails. I'm not sure how problematic this will be in practice, but it is something to consider.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ConfigMap is purely an implementation detail and really shouldn't be exposed to the users. For the purpose of debugging, I think it's better logging the ConfigMap contents instead of leaving the ConfigMap around after an application finished.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConfigMap aside, I think the notion of keeping pod for debugging make sense, I thought we had a config on that (it might be a different area/code path)

@liyinan926
Copy link
Member Author

rerun integration tests please

@liyinan926
Copy link
Member Author

I want to re-iterate on this issue/PR. If we have concern around losing some objects like the ConfigMap for setting up the init-container, as I said above, we could log information stored in it for debugging purpose. This, IMO, is better than making the ConfigMap stick around just for debugging. Thoughts?

@mccheah @foxish

@dharmeshkakadia
Copy link

Any thoughts on this ? It would be good to cleanup resources after completion. In a normal scenario, this is filling up a lot of services in completed state for example.

Copy link

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey this sounds useful to have

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConfigMap aside, I think the notion of keeping pod for debugging make sense, I thought we had a config on that (it might be a different area/code path)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warn if it will be GC'd?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@dharmeshkakadia
Copy link

Thanks @felixcheung for jumping on this :)

@felixcheung
Copy link

hey where are we on this?
and how about going upstream?

@liyinan926
Copy link
Member Author

liyinan926 commented Mar 2, 2018

@felixcheung Yes, I think we should go upstream. I created https://issues.apache.org/jira/browse/SPARK-23571. Also given that we are in the process of getting rid of the init-container, the ConfigMap for the init-container will be gone also. So it makes more sense to clean up after application completion.

@foxish
Copy link
Member

foxish commented Mar 2, 2018

Sorry, didn't see this before. Same comment as in apache#20722 (comment). Why not do this during driver.stop()? - that way, 1) if we lose the driver, k8s garbage collection cleans up everything 2) if driver terminates, we clean up executors as well as auxiliary resources like configmaps etc.

@foxish
Copy link
Member

foxish commented Mar 2, 2018

I want to re-iterate on this issue/PR. If we have concern around losing some objects like the ConfigMap for setting up the init-container, as I said above, we could log information stored in it for debugging purpose. This, IMO, is better than making the ConfigMap stick around just for debugging. Thoughts?

I agree. We can dump all k8s objects. My hunch is that it's not that useful, given it's a pretty deeply buried implementation detail.

@liyinan926
Copy link
Member Author

As discussed in apache#20722, we think the right solution is move resource management into the driver pod. This way, cleanup of auxiliary resources upon completion is guaranteed regardless of which deployment mode is used and whether the client waits for application to complete or not.

@liyinan926 liyinan926 closed this May 24, 2018
ifilonenko pushed a commit to bloomberg/apache-spark-on-k8s that referenced this pull request Mar 28, 2019
ifilonenko pushed a commit to bloomberg/apache-spark-on-k8s that referenced this pull request Apr 4, 2019
ifilonenko pushed a commit to bloomberg/apache-spark-on-k8s that referenced this pull request Jul 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kubernetes resources created for an application should be deleted when the application finishes

5 participants