[SPARK-23285][K8S] Add a config property for specifying physical executor cores #20553

liyinan926 · 2018-02-08T22:35:58Z

What changes were proposed in this pull request?

As mentioned in SPARK-23285, this PR introduces a new configuration property spark.kubernetes.executor.cores for specifying the physical CPU cores requested for each executor pod. This is to avoid changing the semantics of spark.executor.cores and spark.task.cpus and their role in task scheduling, task parallelism, dynamic resource allocation, etc. The new configuration property only determines the physical CPU cores available to an executor. An executor can still run multiple tasks simultaneously by using appropriate values for spark.executor.cores and spark.task.cpus.

How was this patch tested?

Unit tests.

@felixcheung @srowen @jiangxb1987 @jerryshao @mccheah @foxish

SparkQA · 2018-02-08T22:46:34Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/724/

SparkQA · 2018-02-08T22:50:57Z

Test build #87234 has finished for PR 20553 at commit 50ebb50.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-02-08T22:52:29Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/724/

felixcheung

LGTM, minor comment.
Sorry, I missed this PR somehow/ Anyone else want to review this?

felixcheung · 2018-02-18T08:46:19Z

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala

nit: it was lower case "cpu" in other config, and no "core"

SparkQA · 2018-02-19T00:36:03Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/950/

SparkQA · 2018-02-19T00:40:54Z

Test build #87538 has finished for PR 20553 at commit 8b711ce.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-02-19T00:47:17Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/950/

liyinan926 · 2018-02-23T00:38:07Z

Any comment or concern on this? Is this good to merge?

foxish · 2018-02-23T00:41:24Z

docs/running-on-kubernetes.md

I think it reads better without "the amount of", i.e. "Specify a hard limit on CPU cores for the driver pod". Same comment for the below section as well.

foxish · 2018-02-23T00:45:20Z

...bernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala

The comment above is now outdated and can be removed.

foxish · 2018-02-23T00:46:20Z

docs/running-on-kubernetes.md

Should we mention that this value overrides spark.executor.cores?

SparkQA · 2018-02-23T05:31:18Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1008/

SparkQA · 2018-02-23T05:36:07Z

Test build #87625 has finished for PR 20553 at commit 11f94e2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-02-23T05:37:01Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1008/

SparkQA · 2018-02-23T05:40:16Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1009/

SparkQA · 2018-02-23T05:41:09Z

Test build #87626 has finished for PR 20553 at commit 8f457ce.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-02-23T05:46:33Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1009/

jiangxb1987 · 2018-02-23T15:16:04Z

How do we plan to support dynamic allocation with k8s? Should we read spark.executor.cores or spark.kubernetes.executor.cores ?

jiangxb1987 · 2018-02-23T15:16:19Z

also cc @cloud-fan @jerryshao

jiangxb1987 · 2018-02-23T15:20:38Z

IIUC the spark.kubernetes.executor.cores here is just a special case for spark.executor.cores, for k8s backend, you shall still have to handle float values if you're to read the value of spark.kubernetes.executor.cores, for instance, in dynamic allocation. If that is the case here, I don't see much benefit of bringing in a new conf here.

liyinan926 · 2018-02-23T16:59:03Z

spark.kubernetes.executor.cores has nothing to do with dynamic resource allocation. It's just a way of letting users specify a value for the cpu resource request that conforms to Kubernetes convention and is only read/used when determining the cpu request.

jerryshao · 2018-02-26T06:31:20Z

What is the default value if it is not configured, how do K8S control the CPU usage by default?

Also it seems that user may confuse about how to differentiate between k8s executor cores and executor cores.

BTW, do we also need similar k8s driver cores for cluster mode?

liyinan926 · 2018-02-26T06:34:57Z

The value of spark.executor.cores will be used to set cpu request for the executor pods if spark.kubernetes.executor.cores is not set. spark.driver.cores already allows fractional values so we don't have a Kubernetes specific property for the driver pod.

cloud-fan · 2018-02-26T07:20:46Z

This is to avoid changing the semantics of spark.executor.cores and spark.task.cpus and their role in task scheduling, task parallelism, dynamic resource allocation, etc. The new configuration property only determines the physical CPU cores available to an executor.

Do you mean spark.kubernetes.executor.cores will only be used with k8s for static allocation? It looks to me that if we wanna k8s work with Spark dynamic allocation better, we have to change the semantics of spark.executor.cores to support fraction. Or we introduce a new dynamic allocation module for k8s, which reads spark.kubernetes.executor.cores.

liyinan926 · 2018-02-26T17:52:14Z

Do you mean spark.kubernetes.executor.cores will only be used with k8s for static allocation? It looks to me that if we wanna k8s work with Spark dynamic allocation better, we have to change the semantics of spark.executor.cores to support fraction. Or we introduce a new dynamic allocation module for k8s, which reads spark.kubernetes.executor.cores.

spark.kubernetes.executor.cores is only used to define the physical cpu request for executor pods, so it can be used for both static and dynamic allocation in k8s mode. On the other hand, spark.executor.cores and spark.task.cpus are more for defining the cpu resource availability and demands in a virtual sense. An executor with 100m physical cpus allocated can still run 2 tasks if they fit in. This is equivalently to a scenario in which spark.executor.cores=2 and spark.task.cpus=1. Task parallelism per executor and dynamic resource allocation are still based on the semantics of spark.executor.cores and spark.task.cpus. So I don't think we need to change the semantics of these two properties and we don't need a new version of dynamic resource allocation for k8s. IMO, it's unfortunate that spark.executor.cores is both used for defining the physical cpu request and for defining task parallelism.

foxish · 2018-02-27T15:40:28Z

@cloud-fan @jiangxb1987 I assume you're referring to ExecutorAllocationManager.scala#L114-L118 when you refer to spark.executor.cores affecting dynamic allocation.

@liyinan926, while this might be the least obtrusive way to get Kubernetes mode to have fractional executor cores and not change other backends, it seems to me like separating the notion of a pod's CPU request and spark.executor.cores might prove confusing to users. We don't expect Spark users to necessarily understand pods or containers. Elaborating on @srowen's point on the JIRA, in theory, could we relax spark.executor.cores to accept doubles and then push the check down to the individual resource managers - force conversion to int if needed? It might require a similar change to spark.task.cpus also possibly. Thoughts?

foxish · 2018-02-27T15:49:54Z

Oops - knew we tried that before. Just paged in the discussion from #20460. Looks like that approach might need revisiting in the 2.4 context, given we're targeting dynamic allocation for the k8s backend as well.

liyinan926 · 2018-02-27T18:00:55Z

@foxish I think the confusion mostly comes from the property name. I don't think we need to change the semantics of spark.executor.cores nor the way dynamic resource allocation works in K8S mode. Like what I said above, spark.executor.cores/spark.task.cpus still determines the number of tasks that can run in an executor simultaneously given the cpus specified in spark.kubernetes.executor.cores as long as the tasks fit in. The problem now is very often an executor does not really need a full core in K8S to run a task or even multiple tasks at the same time.

dvogelbacher · 2018-03-30T20:19:28Z

Can we call this spark.kubernetes.executor.request.cores, similar to spark.kubernetes.executor.limit.cores ?
I think this would make the naming a bit less confusing.

mccheah · 2018-03-30T20:22:20Z

spark.executor.cores is the standard used by all the other cluster managers, so we have to use that.

mccheah · 2018-03-30T20:25:24Z

Ah never mind sorry - thought we were referring to changing thread count with spark.kubernetes.executor.request.cores. Think this config key makes sense to communicate the exact K8s semantics here.

liyinan926 · 2018-03-30T20:25:57Z

Agreed that spark.kubernetes.executor.request.cores is a better name. Will change it.

mccheah · 2018-03-30T20:28:55Z

docs/running-on-kubernetes.md

Precedence in what sense? This won't override spark.executor.cores when it comes to the number of concurrently running tasks. Think it's better to say that it's distinct from spark.executor.cores.

mccheah · 2018-03-30T20:29:04Z

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala

Can we use fallbackConf here?

Unfortunately, there's not a ConfigEntry for spark.executor.cores in core.

SparkQA · 2018-03-30T20:52:31Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1845/

SparkQA · 2018-03-30T20:56:36Z

Test build #88763 has finished for PR 20553 at commit cfe1eab.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-03-30T20:58:10Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1845/

liyinan926 · 2018-04-02T16:45:01Z

Is this OK to merge?

felixcheung · 2018-04-02T18:01:58Z

docs/running-on-kubernetes.md

this should be clear when spark.kubernetes.executor.request.cores is set spark.executor.cores is ignore?

…utor cores As discussed in SPARK-23285, this PR introduces a new configuation property `spark.kubernetes.executor.cores` for specifying the phyiscal CPU cores requested for each executor pod. This is to avoid changing the semantics of `spark.executor.cores` and `spark.task.cpus` and their role in task scheduling, task parallelism, dynamic resource allocation, etc. The new configuraiton property only determines the physical CPU cores available to an executor. An executor can still run multiple tasks simultaneously by using appropriate values for `spark.executor.cores` and `spark.task.cpus`.

foxish

Minor nit. Good to merge after that.

foxish · 2018-04-02T18:34:08Z

docs/running-on-kubernetes.md

The two statements appear a bit redundant - can do with just one probably.

Might be good to also add typical ways (1.5m, 2.0, 5, etc) in which one can specify this and additionally link to https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/#cpu-units as you've done.

SparkQA · 2018-04-02T18:43:00Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1879/

SparkQA · 2018-04-02T18:47:42Z

Test build #88826 has finished for PR 20553 at commit f103413.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-04-02T18:48:15Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1879/

foxish · 2018-04-02T18:55:12Z

Merging to master once tests pass. Thanks!

SparkQA · 2018-04-02T19:07:55Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1880/

SparkQA · 2018-04-02T19:07:59Z

Test build #88830 has finished for PR 20553 at commit a9db323.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-04-02T19:19:09Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1880/

…utor cores ## What changes were proposed in this pull request? As mentioned in SPARK-23285, this PR introduces a new configuration property `spark.kubernetes.executor.cores` for specifying the physical CPU cores requested for each executor pod. This is to avoid changing the semantics of `spark.executor.cores` and `spark.task.cpus` and their role in task scheduling, task parallelism, dynamic resource allocation, etc. The new configuration property only determines the physical CPU cores available to an executor. An executor can still run multiple tasks simultaneously by using appropriate values for `spark.executor.cores` and `spark.task.cpus`. ## How was this patch tested? Unit tests. felixcheung srowen jiangxb1987 jerryshao mccheah foxish Author: Yinan Li <[email protected]> Author: Yinan Li <[email protected]> Closes apache#20553 from liyinan926/master.

felixcheung approved these changes Feb 18, 2018

View reviewed changes

foxish reviewed Feb 23, 2018

View reviewed changes

mccheah reviewed Mar 30, 2018

View reviewed changes

felixcheung reviewed Apr 2, 2018

View reviewed changes

liyinan926 and others added 4 commits April 2, 2018 11:34

Fixed config property description

d4b59c6

Addressed comments

8ac5d5e

Renamed the new configuration property

761fc53

foxish approved these changes Apr 2, 2018

View reviewed changes

Updated documentation of the new property

a9db323

asfgit closed this in fe2b7a4 Apr 2, 2018

[SPARK-23285][K8S] Add a config property for specifying physical executor cores #20553

[SPARK-23285][K8S] Add a config property for specifying physical executor cores #20553

Uh oh!

Conversation

liyinan926 commented Feb 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Feb 8, 2018

Uh oh!

SparkQA commented Feb 8, 2018

Uh oh!

SparkQA commented Feb 8, 2018

Uh oh!

felixcheung left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 19, 2018

Uh oh!

SparkQA commented Feb 19, 2018

Uh oh!

SparkQA commented Feb 19, 2018

Uh oh!

liyinan926 commented Feb 23, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 23, 2018

Uh oh!

SparkQA commented Feb 23, 2018

Uh oh!

SparkQA commented Feb 23, 2018

Uh oh!

SparkQA commented Feb 23, 2018

Uh oh!

SparkQA commented Feb 23, 2018

Uh oh!

SparkQA commented Feb 23, 2018

Uh oh!

jiangxb1987 commented Feb 23, 2018

Uh oh!

jiangxb1987 commented Feb 23, 2018

Uh oh!

jiangxb1987 commented Feb 23, 2018

Uh oh!

liyinan926 commented Feb 23, 2018

Uh oh!

jerryshao commented Feb 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liyinan926 commented Feb 26, 2018

Uh oh!

cloud-fan commented Feb 26, 2018

Uh oh!

liyinan926 commented Feb 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

foxish commented Feb 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

foxish commented Feb 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liyinan926 commented Feb 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

liyinan926 commented Feb 8, 2018 •

edited

Loading

jerryshao commented Feb 26, 2018 •

edited

Loading

liyinan926 commented Feb 26, 2018 •

edited

Loading

foxish commented Feb 27, 2018 •

edited

Loading

foxish commented Feb 27, 2018 •

edited

Loading

liyinan926 commented Feb 27, 2018 •

edited

Loading

foxish Apr 2, 2018 •

edited

Loading

foxish Apr 2, 2018 •

edited

Loading

foxish commented Apr 2, 2018 •

edited

Loading