Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
f6fdd6a
Spark on Kubernetes - basic scheduler backend
foxish Sep 15, 2017
75e31a9
Adding to modules.py and SparkBuild.scala
foxish Oct 17, 2017
cf82b21
Exclude from unidoc, update travis
foxish Oct 17, 2017
488c535
Address a bunch of style and other comments
foxish Oct 17, 2017
82b79a7
Fix some style concerns
foxish Oct 18, 2017
c052212
Clean up YARN constants, unit test updates
foxish Oct 20, 2017
c565c9f
Couple of more style comments
foxish Oct 20, 2017
2fb596d
Address CR comments.
mccheah Oct 25, 2017
992acbe
Extract initial executor count to utils class
mccheah Oct 25, 2017
b0a5839
Fix scalastyle
mccheah Oct 25, 2017
a4f9797
Fix more scalastyle
mccheah Oct 25, 2017
2b5dcac
Pin down app ID in tests. Fix test style.
mccheah Oct 26, 2017
018f4d8
Address comments.
mccheah Nov 1, 2017
4b32134
Various fixes to the scheduler
mccheah Nov 1, 2017
6cf4ed7
Address comments
mccheah Nov 4, 2017
1f271be
Update fabric8 client version to 3.0.0
foxish Nov 13, 2017
71a971f
Addressed more comments
liyinan926 Nov 13, 2017
0ab9ca7
One more round of comments
liyinan926 Nov 14, 2017
7f14b71
Added a comment regarding how failed executor pods are handled
liyinan926 Nov 15, 2017
7afce3f
Addressed more comments
liyinan926 Nov 21, 2017
b75b413
Fixed Scala style error
liyinan926 Nov 21, 2017
3b587b4
Removed unused parameter in parsePrefixedKeyValuePairs
liyinan926 Nov 22, 2017
cb12fec
Another round of comments
liyinan926 Nov 22, 2017
ae396cf
Addressed latest comments
liyinan926 Nov 27, 2017
f8e3249
Addressed comments around licensing on new dependencies
liyinan926 Nov 27, 2017
a44c29e
Fixed unit tests and made maximum executor lost reason checks configu…
liyinan926 Nov 27, 2017
4bed817
Removed default value for executor Docker image
liyinan926 Nov 27, 2017
c386186
Close the executor pod watcher before deleting the executor pods
liyinan926 Nov 27, 2017
b85cfc4
Addressed more comments
liyinan926 Nov 28, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Address comments.
  • Loading branch information
mccheah committed Nov 1, 2017
commit 018f4d8ffbbe33526a8273801169b99add38fc8f
Original file line number Diff line number Diff line change
Expand Up @@ -103,35 +103,11 @@ package object config extends Logging {
.longConf
.createWithDefault(1)

private[spark] val INIT_CONTAINER_JARS_DOWNLOAD_LOCATION =
ConfigBuilder("spark.kubernetes.mountdependencies.jarsDownloadDir")
.doc("Location to download jars to in the driver and executors. When using" +
" spark-submit, this directory must be empty and will be mounted as an empty directory" +
" volume on the driver and executor pod.")
.stringConf
.createWithDefault("/var/spark-data/spark-jars")

private[spark] val KUBERNETES_EXECUTOR_LIMIT_CORES =
ConfigBuilder("spark.kubernetes.executor.limit.cores")
.doc("Specify the hard cpu limit for a single executor pod")
.stringConf
.createOptional

private[spark] val KUBERNETES_NODE_SELECTOR_PREFIX = "spark.kubernetes.node.selector."

private[spark] def resolveK8sMaster(rawMasterString: String): String = {
if (!rawMasterString.startsWith("k8s://")) {
throw new IllegalArgumentException("Master URL should start with k8s:// in Kubernetes mode.")
}
val masterWithoutK8sPrefix = rawMasterString.replaceFirst("k8s://", "")
if (masterWithoutK8sPrefix.startsWith("http://")
|| masterWithoutK8sPrefix.startsWith("https://")) {
masterWithoutK8sPrefix
} else {
val resolvedURL = s"https://$masterWithoutK8sPrefix"
logInfo("No scheme specified for kubernetes master URL, so defaulting to https. Resolved" +
s" URL is $resolvedURL")
resolvedURL
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,6 @@ private[spark] class ExecutorPodFactoryImpl(sparkConf: SparkConf)

private val executorExtraClasspath = sparkConf.get(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

private val executorExtraClasspath =
  sparkConf.get(org.apache.spark.internal.config.EXECUTOR_CLASS_PATH)

org.apache.spark.internal.config.EXECUTOR_CLASS_PATH)
private val executorJarsDownloadDir = sparkConf.get(INIT_CONTAINER_JARS_DOWNLOAD_LOCATION)

private val executorLabels = ConfigurationUtils.parsePrefixedKeyValuePairs(
sparkConf,
Expand Down Expand Up @@ -94,7 +93,7 @@ private[spark] class ExecutorPodFactoryImpl(sparkConf: SparkConf)
MEMORY_OVERHEAD_MIN_MIB))
private val executorMemoryWithOverhead = executorMemoryMiB + memoryOverheadMiB

private val executorCores = sparkConf.getDouble("spark.executor.cores", 1d)
private val executorCores = sparkConf.getDouble("spark.executor.cores", 1)
private val executorLimitCores = sparkConf.getOption(KUBERNETES_EXECUTOR_LIMIT_CORES.key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sparkConf.get(KUBERNETES_EXECUTOR_LIMIT_CORES)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use sparkConf.get(KUBERNETES_EXECUTOR_LIMIT_CORES)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


override def createExecutorPod(
Expand All @@ -108,7 +107,7 @@ private[spark] class ExecutorPodFactoryImpl(sparkConf: SparkConf)

// hostname must be no longer than 63 characters, so take the last 63 characters of the pod
// name as the hostname. This preserves uniqueness since the end of name contains
// executorId and applicationId
// executorId
val hostname = name.substring(Math.max(0, name.length - 63))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is hostname used for here ?
Based on previous discussion, there is no support for hostname and kubernetes support is only IP based - given this, why not simply set it to some arbitrary random string and not depend on name ? (The comment about 63 chars is still incorrect btw - hostnames go upto 255, labels go upto 63 : I think I mentioned this before).
Or is this to keep the option open in future for dns support ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we might have been relying on it before but we certainly don't now - this should be removed. Probably want to try it on our fork first and run it through our integration tests to verify this is the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @mccheah that we should try it out in our fork. If it turns out we don't really need to set it, we can remove it in a follow-up PR so this won't block this PR from being merged. @mridulm What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with deferring this in a later PR as long as we track it somewhere : it might so happen that this is a requirement anyway and we cant use random names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val resolvedExecutorLabels = Map(
SPARK_EXECUTOR_ID_LABEL -> executorId,
Expand Down Expand Up @@ -139,15 +138,14 @@ private[spark] class ExecutorPodFactoryImpl(sparkConf: SparkConf)
new EnvVarBuilder().withName(s"$ENV_JAVA_OPT_PREFIX$index").withValue(opt).build()
}
}.getOrElse(Seq.empty[EnvVar])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this getting used ? I see it getting set, but not used anywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used in the executor Docker file included in #19717.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, somehow it did not show up in my searches.

val executorEnv = (Seq(
val executorEnv = Seq(
(ENV_EXECUTOR_PORT, executorPort.toString),
(ENV_DRIVER_URL, driverUrl),
// Executor backend expects integral value for executor cores, so round it up to an int.
(ENV_EXECUTOR_CORES, math.ceil(executorCores).toInt.toString),
(ENV_EXECUTOR_MEMORY, executorMemoryString),
(ENV_APPLICATION_ID, applicationId),
(ENV_EXECUTOR_ID, executorId),
(ENV_MOUNTED_CLASSPATH, s"$executorJarsDownloadDir/*")) ++ executorEnvs)
(ENV_EXECUTOR_ID, executorId))
.map(env => new EnvVarBuilder()
.withName(env._1)
.withValue(env._2)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import java.io.Closeable
import java.net.InetAddress
import java.util.concurrent.{ConcurrentHashMap, ExecutorService, ScheduledExecutorService, TimeUnit}
import java.util.concurrent.atomic.{AtomicInteger, AtomicLong, AtomicReference}
import javax.annotation.concurrent.GuardedBy

import io.fabric8.kubernetes.api.model._
import io.fabric8.kubernetes.client.{KubernetesClient, KubernetesClientException, Watcher}
Expand Down Expand Up @@ -49,9 +50,11 @@ private[spark] class KubernetesClusterSchedulerBackend(

private val EXECUTOR_ID_COUNTER = new AtomicLong(0L)
private val RUNNING_EXECUTOR_PODS_LOCK = new Object
// Indexed by executor IDs and guarded by RUNNING_EXECUTOR_PODS_LOCK.
// Indexed by executor IDs
@GuardedBy("RUNNING_EXECUTOR_PODS_LOCK")
private val runningExecutorsToPods = new mutable.HashMap[String, Pod]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could use GuardedBy instead of comment.

// Indexed by executor pod names and guarded by RUNNING_EXECUTOR_PODS_LOCK.
// Indexed by executor pod names
@GuardedBy("RUNNING_EXECUTOR_PODS_LOCK")
private val runningPodsToExecutors = new mutable.HashMap[String, String]
private val executorPodsByIPs = new ConcurrentHashMap[String, Pod]()
private val podsWithKnownExitReasons = new ConcurrentHashMap[String, ExecutorExited]()
Expand Down Expand Up @@ -105,21 +108,44 @@ private[spark] class KubernetesClusterSchedulerBackend(

override def run(): Unit = {
handleDisconnectedExecutors()
val executorsToAllocate = mutable.Map[String, Pod]()
val currentTotalRegisteredExecutors = totalRegisteredExecutors.get
val currentTotalExpectedExecutors = totalExpectedExecutors.get
val currentNodeToLocalTaskCount = getNodesWithLocalTaskCounts
if (currentTotalRegisteredExecutors < runningExecutorsToPods.size) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering we can access runningExecutorsToPods.size without guard by RUNNING_EXECUTOR_PODS_LOCK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should guard it by RUNNING_EXECUTOR_PODS_LOCK. Fixed.

logDebug("Waiting for pending executors before scaling")
} else if (currentTotalExpectedExecutors <= runningExecutorsToPods.size) {
logDebug("Maximum allowed executor limit reached. Not scaling up further.")
} else {
val nodeToLocalTaskCount = getNodesWithLocalTaskCounts
for (i <- 0 until math.min(
currentTotalExpectedExecutors - runningExecutorsToPods.size, podAllocationSize)) {
val executorId = EXECUTOR_ID_COUNTER.incrementAndGet().toString
val executorPod = executorPodFactory.createExecutorPod(
executorId,
applicationId(),
driverUrl,
conf.getExecutorEnv,
driverPod,
nodeToLocalTaskCount)
executorsToAllocate(executorId) = executorPod
logInfo(
s"Requesting a new executor, total executors is now ${runningExecutorsToPods.size}")
}
}
val allocatedExecutors = executorsToAllocate.mapValues { pod =>
Utils.tryLog {
kubernetesClient.pods().create(pod)
}
}
RUNNING_EXECUTOR_PODS_LOCK.synchronized {
if (totalRegisteredExecutors.get() < runningExecutorsToPods.size) {
logDebug("Waiting for pending executors before scaling")
} else if (totalExpectedExecutors.get() <= runningExecutorsToPods.size) {
logDebug("Maximum allowed executor limit reached. Not scaling up further.")
} else {
val nodeToLocalTaskCount = getNodesWithLocalTaskCounts
for (i <- 0 until math.min(
totalExpectedExecutors.get - runningExecutorsToPods.size, podAllocationSize)) {
val (executorId, pod) = allocateNewExecutorPod(nodeToLocalTaskCount)
runningExecutorsToPods.put(executorId, pod)
runningPodsToExecutors.put(pod.getMetadata.getName, executorId)
logInfo(
s"Requesting a new executor, total executors is now ${runningExecutorsToPods.size}")
}
allocatedExecutors.map {
case (executorId, attemptedAllocatedExecutor) =>
attemptedAllocatedExecutor.map { successfullyAllocatedExecutor =>
runningExecutorsToPods.put(executorId, successfullyAllocatedExecutor)
runningPodsToExecutors.put(
successfullyAllocatedExecutor.getMetadata.getName, executorId)
}
}
}
}
Expand All @@ -128,25 +154,25 @@ private[spark] class KubernetesClusterSchedulerBackend(
// For each disconnected executor, synchronize with the loss reasons that may have been found
// by the executor pod watcher. If the loss reason was discovered by the watcher,
// inform the parent class with removeExecutor.
disconnectedPodsByExecutorIdPendingRemoval.keys().asScala.foreach { case (executorId) =>
val executorPod = disconnectedPodsByExecutorIdPendingRemoval.get(executorId)
val knownExitReason = Option(podsWithKnownExitReasons.remove(
executorPod.getMetadata.getName))
knownExitReason.fold {
removeExecutorOrIncrementLossReasonCheckCount(executorId)
} { executorExited =>
logWarning(s"Removing executor $executorId with loss reason " + executorExited.message)
removeExecutor(executorId, executorExited)
// We keep around executors that have exit conditions caused by the application. This
// allows them to be debugged later on. Otherwise, mark them as to be deleted from the
// the API server.
if (!executorExited.exitCausedByApp) {
logInfo(s"Executor $executorId failed because of a framework error.")
deleteExecutorFromClusterAndDataStructures(executorId)
} else {
logInfo(s"Executor $executorId exited because of the application.")
disconnectedPodsByExecutorIdPendingRemoval.asScala.foreach {
case (executorId, executorPod) =>
val knownExitReason = Option(podsWithKnownExitReasons.remove(
executorPod.getMetadata.getName))
knownExitReason.fold {
removeExecutorOrIncrementLossReasonCheckCount(executorId)
} { executorExited =>
logWarning(s"Removing executor $executorId with loss reason " + executorExited.message)
removeExecutor(executorId, executorExited)
// We keep around executors that have exit conditions caused by the application. This
// allows them to be debugged later on. Otherwise, mark them as to be deleted from the
// the API server.
if (!executorExited.exitCausedByApp) {
logInfo(s"Executor $executorId failed because of a framework error.")
deleteExecutorFromClusterAndDataStructures(executorId)
} else {
logInfo(s"Executor $executorId exited because of the application.")
}
}
}
}
}

Expand All @@ -163,12 +189,17 @@ private[spark] class KubernetesClusterSchedulerBackend(
def deleteExecutorFromClusterAndDataStructures(executorId: String): Unit = {
disconnectedPodsByExecutorIdPendingRemoval.remove(executorId)
executorReasonCheckAttemptCounts -= executorId
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove from podsWithKnownExitReasons ?

RUNNING_EXECUTOR_PODS_LOCK.synchronized {
podsWithKnownExitReasons -= executorId
val maybeExecutorPodToDelete = RUNNING_EXECUTOR_PODS_LOCK.synchronized {
runningExecutorsToPods.remove(executorId).map { pod =>
kubernetesClient.pods().delete(pod)
runningPodsToExecutors.remove(pod.getMetadata.getName)
}.getOrElse(logWarning(s"Unable to remove pod for unknown executor $executorId"))
pod
}.orElse {
logWarning(s"Unable to remove pod for unknown executor $executorId")
None
}
}
maybeExecutorPodToDelete.foreach(pod => kubernetesClient.pods().delete(pod))
}
}

Expand Down Expand Up @@ -203,25 +234,23 @@ private[spark] class KubernetesClusterSchedulerBackend(
// TODO investigate why Utils.tryLogNonFatalError() doesn't work in this context.
// When using Utils.tryLogNonFatalError some of the code fails but without any logs or
// indication as to why.
try {
RUNNING_EXECUTOR_PODS_LOCK.synchronized {
runningExecutorsToPods.values.foreach(kubernetesClient.pods().delete(_))
Utils.tryLogNonFatalError {
val executorPodsToDelete = RUNNING_EXECUTOR_PODS_LOCK.synchronized {
val runningExecutorPodsCopy = Seq(runningExecutorsToPods.values.toSeq: _*)
runningExecutorsToPods.clear()
runningPodsToExecutors.clear()
runningExecutorPodsCopy
}
kubernetesClient.pods().delete(executorPodsToDelete: _*)
executorPodsByIPs.clear()
val resource = executorWatchResource.getAndSet(null)
if (resource != null) {
resource.close()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very sure of the semantics of watcher here - should we close watcher before executor deletes here ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The watcher will receive a DELETE event for each deleted executor pod, and the event is handled in ExecutorPodsWatcher.eventReceived.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the actual resource.close() deferred ? Will kubernetesClient.pods().delete() wait for watcher to be invoked before returning ?
I was under the impression that close will immediately return after marking watcher as closed - and delete will fire watcher event async; the actual delete event will probably get delivered after watcher is closed - causing it to be dropped.

Is this correct assumption ?
If yes, the watcher will probably not see DELETE anyway ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close should shut off the communications before returning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about kubernetesClient.pods().delete() ? Is it an async call ? Will it invoke watcher before returning ?
My impression was that watcher invocation would be deferred.

If this is the case, in the current code, we will not have watcher being notified of the nodes delete's.
Well, some could, some need not be - essentially a race condition.
Do we require the watcher to be notified ?
If yes, then we will need to make this more robust.
If we do not depend on it - then pushing resource.close() to before the delete will prevent a bunch of unnecessary work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok - looking up a bit I think we want the order to be reversed here. We should first close the watch to ensure we don't get any deleted events, then delete the pods themselves. We probably want to ensure the pods are deleted even if we fail to close the watch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it depends on if we require the watcher to receive and act on the DELETE events in this case. If not, moving close to above the deletes is fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reversed the order in c386186.

} catch {
case e: Throwable => logError("Uncaught exception while shutting down controllers.", e)
}
try {
Utils.tryLogNonFatalError {
logInfo("Closing kubernetes client")
kubernetesClient.close()
} catch {
case e: Throwable => logError("Uncaught exception closing Kubernetes client.", e)
}
}

Expand All @@ -231,7 +260,7 @@ private[spark] class KubernetesClusterSchedulerBackend(
*/
private def getNodesWithLocalTaskCounts() : Map[String, Int] = {
val nodeToLocalTaskCount = mutable.Map[String, Int]() ++
KubernetesClusterSchedulerBackend.this.synchronized {
synchronized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need the actual concatenation (and thus iteration over hostToLocalTaskCount) to happen inside the synchronized block?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the concatenation inside synchronized.

hostToLocalTaskCount
}
for (pod <- executorPodsByIPs.values().asScala) {
Expand All @@ -247,58 +276,31 @@ private[spark] class KubernetesClusterSchedulerBackend(
nodeToLocalTaskCount.toMap[String, Int]
}

/**
* Allocates a new executor pod
*
* @param nodeToLocalTaskCount A map of K8s cluster nodes to the number of tasks that could
* benefit from data locality if an executor launches on the cluster
* node.
* @return A tuple of the new executor name and the Pod data structure.
*/
private def allocateNewExecutorPod(nodeToLocalTaskCount: Map[String, Int]): (String, Pod) = {
val executorId = EXECUTOR_ID_COUNTER.incrementAndGet().toString
val executorPod = executorPodFactory.createExecutorPod(
executorId,
applicationId(),
driverUrl,
conf.getExecutorEnv,
driverPod,
nodeToLocalTaskCount)
try {
(executorId, kubernetesClient.pods.create(executorPod))
} catch {
case throwable: Throwable =>
logError("Failed to allocate executor pod.", throwable)
throw throwable
}
}

override def doRequestTotalExecutors(requestedTotal: Int): Future[Boolean] = Future[Boolean] {
totalExpectedExecutors.set(requestedTotal)
true
}

override def doKillExecutors(executorIds: Seq[String]): Future[Boolean] = Future[Boolean] {
val podsToDelete = mutable.Buffer[Pod]()
RUNNING_EXECUTOR_PODS_LOCK.synchronized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but I'd use a more Scala-ish version:

val podsToDelete = RUNNING_EXECUTOR_PODS_LOCK.synchronized {
  executorIds.flatMap { id =>
    runningExecutorsToPods.remove(executor) match {
      case Some(...) => ...
      case None => ...
    }
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor also, but I think matching on Option is discouraged? Should we use .map...getOrElse here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no! This can of worms again! :)

Lets just say that opinions vary both across the Scala community and within Spark development as to the best or most proper way to handle Options when you want to do something in both the Some and None cases. Within Spark code, use of fold with an Option is not allowed. As for other alternatives, we have no consistent rules or practice.

for (executor <- executorIds) {
val maybeRemovedExecutor = runningExecutorsToPods.remove(executor)
maybeRemovedExecutor.foreach { executorPod =>
kubernetesClient.pods().delete(executorPod)
disconnectedPodsByExecutorIdPendingRemoval.put(executor, executorPod)
runningPodsToExecutors.remove(executorPod.getMetadata.getName)
podsToDelete += executorPod
}
if (maybeRemovedExecutor.isEmpty) {
logWarning(s"Unable to remove pod for unknown executor $executor")
}
}
}
kubernetesClient.pods().delete(podsToDelete: _*)
true
}

def getExecutorPodByIP(podIP: String): Option[Pod] = {
// Note: Per https://github.com/databricks/scala-style-guide#concurrency, we don't
// want to be switching to scala.collection.concurrent.Map on
// executorPodsByIPs.
val pod = executorPodsByIPs.get(podIP)
Option(pod)
}
Expand Down