-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23571][K8S] Delete auxiliary Kubernetes resources upon application completion #20722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Test build #87903 has finished for PR 20722 at commit
|
|
Kubernetes integration test status success |
| logInfo(s"Waiting for application $appName to finish...") | ||
| watcher.awaitCompletion() | ||
| logInfo(s"Application $appName finished.") | ||
| try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we doing this in client code? Driver shutdown is the right place to perform cleanup right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also because this code path isn't invoked in fire-and-forget mode IIUC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about this before. The main reason is this requires giving the driver extra permissions to delete things. This was not a favorable idea. Do you have different thoughts now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! It was the RBAC rules and downscoping them that led us here. I'm concerned not all jobs will actually use this interactive mode of launching. What do you think of just granting more permissions to the driver and allowing cleanup there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the discussion offline, I think the right solution is move resource management to the driver pod. This way, resource cleanup is guaranteed regardless of the deployment mode and whether the client waits for completion or not.
|
so where are we on this? |
|
@felixcheung I think we want to move resource management to the driver such that the life time of the resources is bound to the life time of the driver pod. This works regardless of if the user chooses to wait for the application to finish or not. One tricky thing is the creation of the headless driver service. It must be created before the executors are created as otherwise the executors wouldn't be able to connect to the driver. I'm trying to figure out how we can achieve this order guarantee in the |
What changes were proposed in this pull request?
Spark on Kubernetes creates some auxiliary Kubernetes resources such as the headless driver service used by the executors to connect to the driver and the ConfigMap that carries configuration for the init-container. Such resources are no longer needed once an application completes, but they are still persisted by the API server and there should be deleted upon completion. This PR handles the cases when the submission waits for the application to complete, which is the default behavior as
spark.kubernetes.submission.waitAppCompletiondefaults totrue.Xref: apache-spark-on-k8s#520.
How was this patch tested?
Unit tested.
@felixcheung @foxish