Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,19 @@ private[spark] class Client(
logInfo(s"Waiting for application $appName to finish...")
watcher.awaitCompletion()
logInfo(s"Application $appName finished.")
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing this in client code? Driver shutdown is the right place to perform cleanup right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also because this code path isn't invoked in fire-and-forget mode IIUC)

Copy link
Contributor Author

@liyinan926 liyinan926 Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this before. The main reason is this requires giving the driver extra permissions to delete things. This was not a favorable idea. Do you have different thoughts now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! It was the RBAC rules and downscoping them that led us here. I'm concerned not all jobs will actually use this interactive mode of launching. What do you think of just granting more permissions to the driver and allowing cleanup there?

Copy link
Contributor Author

@liyinan926 liyinan926 Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the discussion offline, I think the right solution is move resource management to the driver pod. This way, resource cleanup is guaranteed regardless of the deployment mode and whether the client waits for completion or not.

// Upon completion of the application, the client deletes all auxiliary Kubernetes
// resources the application depended on instead of relying on Kubernetes garbage
// collection to kick in on the resources, which only happens when the driver pod
// gets explicitly deleted. Auxiliary resources include the headless service for
// the driver and the ConfigMap for the init-container, for example. Such resources
// are no longer needed once an application completes, but they are still persisted
// by the API server and there should be deleted upon completion.
val otherKubernetesResources = currentDriverSpec.otherKubernetesResources
kubernetesClient.resourceList(otherKubernetesResources: _*).delete()
} catch {
case e: Throwable => logWarning("Failed to clean up auxiliary Kubernetes resources", e)
}
} else {
logInfo(s"Deployed Spark application $appName into Kubernetes.")
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,17 @@ class ClientSuite extends SparkFunSuite with BeforeAndAfter {
verify(loggingPodStatusWatcher).awaitCompletion()
}

test("Submission client should clean up Kubernetes resources upon application completion") {
val submissionClient = new Client(
submissionSteps,
new SparkConf(false),
kubernetesClient,
true,
"spark",
loggingPodStatusWatcher)
submissionClient.run()
verify(resourceList).delete()
}
}

private object FirstTestConfigurationStep extends DriverConfigurationStep {
Expand Down