[SPARK-23571][K8S] Delete auxiliary Kubernetes resources upon application completion #20722

liyinan926 · 2018-03-02T20:02:20Z

What changes were proposed in this pull request?

Spark on Kubernetes creates some auxiliary Kubernetes resources such as the headless driver service used by the executors to connect to the driver and the ConfigMap that carries configuration for the init-container. Such resources are no longer needed once an application completes, but they are still persisted by the API server and there should be deleted upon completion. This PR handles the cases when the submission waits for the application to complete, which is the default behavior as spark.kubernetes.submission.waitAppCompletion defaults to true.

Xref: apache-spark-on-k8s#520.

How was this patch tested?

Unit tested.

@felixcheung @foxish

…tion completion

SparkQA · 2018-03-02T20:16:36Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1232/

SparkQA · 2018-03-02T20:22:13Z

Test build #87903 has finished for PR 20722 at commit 9890ebc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-03-02T20:22:46Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1232/

foxish · 2018-03-02T21:32:53Z

...tes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala

        logInfo(s"Waiting for application $appName to finish...")
        watcher.awaitCompletion()
        logInfo(s"Application $appName finished.")
+        try {


Why are we doing this in client code? Driver shutdown is the right place to perform cleanup right?

(also because this code path isn't invoked in fire-and-forget mode IIUC)

We talked about this before. The main reason is this requires giving the driver extra permissions to delete things. This was not a favorable idea. Do you have different thoughts now?

I see! It was the RBAC rules and downscoping them that led us here. I'm concerned not all jobs will actually use this interactive mode of launching. What do you think of just granting more permissions to the driver and allowing cleanup there?

Per the discussion offline, I think the right solution is move resource management to the driver pod. This way, resource cleanup is guaranteed regardless of the deployment mode and whether the client waits for completion or not.

felixcheung · 2018-04-01T19:11:30Z

so where are we on this?

liyinan926 · 2018-04-02T16:50:33Z

@felixcheung I think we want to move resource management to the driver such that the life time of the resources is bound to the life time of the driver pod. This works regardless of if the user chooses to wait for the application to finish or not. One tricky thing is the creation of the headless driver service. It must be created before the executors are created as otherwise the executors wouldn't be able to connect to the driver. I'm trying to figure out how we can achieve this order guarantee in the KubernetesSchedulerBackend.

[SPARK-23571][K8S] Delete auxiliary Kubernetes resources upon applica…

9890ebc

…tion completion

foxish reviewed Mar 2, 2018

View reviewed changes

foxish mentioned this pull request Mar 2, 2018

Delete Kubernetes resources when the client waits for and sees app completion apache-spark-on-k8s/spark#520

Closed

liyinan926 closed this Jul 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-23571][K8S] Delete auxiliary Kubernetes resources upon application completion #20722

[SPARK-23571][K8S] Delete auxiliary Kubernetes resources upon application completion #20722

Uh oh!

liyinan926 commented Mar 2, 2018 •

edited

Loading

Uh oh!

SparkQA commented Mar 2, 2018

Uh oh!

SparkQA commented Mar 2, 2018

Uh oh!

SparkQA commented Mar 2, 2018

Uh oh!

foxish Mar 2, 2018

Uh oh!

foxish Mar 2, 2018

Uh oh!

liyinan926 Mar 2, 2018 •

edited

Loading

Uh oh!

foxish Mar 2, 2018

Uh oh!

liyinan926 Mar 2, 2018 •

edited

Loading

Uh oh!

felixcheung commented Apr 1, 2018

Uh oh!

liyinan926 commented Apr 2, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-23571][K8S] Delete auxiliary Kubernetes resources upon application completion #20722

[SPARK-23571][K8S] Delete auxiliary Kubernetes resources upon application completion #20722

Uh oh!

Conversation

liyinan926 commented Mar 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Mar 2, 2018

Uh oh!

SparkQA commented Mar 2, 2018

Uh oh!

SparkQA commented Mar 2, 2018

Uh oh!

foxish Mar 2, 2018

Choose a reason for hiding this comment

Uh oh!

foxish Mar 2, 2018

Choose a reason for hiding this comment

Uh oh!

liyinan926 Mar 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

foxish Mar 2, 2018

Choose a reason for hiding this comment

Uh oh!

liyinan926 Mar 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

felixcheung commented Apr 1, 2018

Uh oh!

liyinan926 commented Apr 2, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

liyinan926 commented Mar 2, 2018 •

edited

Loading

liyinan926 Mar 2, 2018 •

edited

Loading

liyinan926 Mar 2, 2018 •

edited

Loading