Skip to content

Conversation

@liyinan926
Copy link
Contributor

@liyinan926 liyinan926 commented Feb 8, 2018

What changes were proposed in this pull request?

As mentioned in SPARK-23285, this PR introduces a new configuration property spark.kubernetes.executor.cores for specifying the physical CPU cores requested for each executor pod. This is to avoid changing the semantics of spark.executor.cores and spark.task.cpus and their role in task scheduling, task parallelism, dynamic resource allocation, etc. The new configuration property only determines the physical CPU cores available to an executor. An executor can still run multiple tasks simultaneously by using appropriate values for spark.executor.cores and spark.task.cpus.

How was this patch tested?

Unit tests.

@felixcheung @srowen @jiangxb1987 @jerryshao @mccheah @foxish

@SparkQA
Copy link

SparkQA commented Feb 8, 2018

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/724/

@SparkQA
Copy link

SparkQA commented Feb 8, 2018

Test build #87234 has finished for PR 20553 at commit 50ebb50.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 8, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/724/

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor comment.
Sorry, I missed this PR somehow/ Anyone else want to review this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it was lower case "cpu" in other config, and no "core"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@SparkQA
Copy link

SparkQA commented Feb 19, 2018

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/950/

@SparkQA
Copy link

SparkQA commented Feb 19, 2018

Test build #87538 has finished for PR 20553 at commit 8b711ce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 19, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/950/

@liyinan926
Copy link
Contributor Author

Any comment or concern on this? Is this good to merge?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it reads better without "the amount of", i.e. "Specify a hard limit on CPU cores for the driver pod". Same comment for the below section as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above is now outdated and can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention that this value overrides spark.executor.cores?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@SparkQA
Copy link

SparkQA commented Feb 23, 2018

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1008/

@SparkQA
Copy link

SparkQA commented Feb 23, 2018

Test build #87625 has finished for PR 20553 at commit 11f94e2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 23, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1008/

@SparkQA
Copy link

SparkQA commented Feb 23, 2018

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1009/

@SparkQA
Copy link

SparkQA commented Feb 23, 2018

Test build #87626 has finished for PR 20553 at commit 8f457ce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 23, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1009/

@jiangxb1987
Copy link
Contributor

How do we plan to support dynamic allocation with k8s? Should we read spark.executor.cores or spark.kubernetes.executor.cores ?

@jiangxb1987
Copy link
Contributor

also cc @cloud-fan @jerryshao

@jiangxb1987
Copy link
Contributor

IIUC the spark.kubernetes.executor.cores here is just a special case for spark.executor.cores, for k8s backend, you shall still have to handle float values if you're to read the value of spark.kubernetes.executor.cores, for instance, in dynamic allocation. If that is the case here, I don't see much benefit of bringing in a new conf here.

@liyinan926
Copy link
Contributor Author

spark.kubernetes.executor.cores has nothing to do with dynamic resource allocation. It's just a way of letting users specify a value for the cpu resource request that conforms to Kubernetes convention and is only read/used when determining the cpu request.

@jerryshao
Copy link
Contributor

jerryshao commented Feb 26, 2018

What is the default value if it is not configured, how do K8S control the CPU usage by default?

Also it seems that user may confuse about how to differentiate between k8s executor cores and executor cores.

BTW, do we also need similar k8s driver cores for cluster mode?

@liyinan926
Copy link
Contributor Author

The value of spark.executor.cores will be used to set cpu request for the executor pods if spark.kubernetes.executor.cores is not set. spark.driver.cores already allows fractional values so we don't have a Kubernetes specific property for the driver pod.

@cloud-fan
Copy link
Contributor

This is to avoid changing the semantics of spark.executor.cores and spark.task.cpus and their role in task scheduling, task parallelism, dynamic resource allocation, etc. The new configuration property only determines the physical CPU cores available to an executor.

Do you mean spark.kubernetes.executor.cores will only be used with k8s for static allocation? It looks to me that if we wanna k8s work with Spark dynamic allocation better, we have to change the semantics of spark.executor.cores to support fraction. Or we introduce a new dynamic allocation module for k8s, which reads spark.kubernetes.executor.cores.

@liyinan926
Copy link
Contributor Author

liyinan926 commented Feb 26, 2018

Do you mean spark.kubernetes.executor.cores will only be used with k8s for static allocation? It looks to me that if we wanna k8s work with Spark dynamic allocation better, we have to change the semantics of spark.executor.cores to support fraction. Or we introduce a new dynamic allocation module for k8s, which reads spark.kubernetes.executor.cores.

spark.kubernetes.executor.cores is only used to define the physical cpu request for executor pods, so it can be used for both static and dynamic allocation in k8s mode. On the other hand, spark.executor.cores and spark.task.cpus are more for defining the cpu resource availability and demands in a virtual sense. An executor with 100m physical cpus allocated can still run 2 tasks if they fit in. This is equivalently to a scenario in which spark.executor.cores=2 and spark.task.cpus=1. Task parallelism per executor and dynamic resource allocation are still based on the semantics of spark.executor.cores and spark.task.cpus. So I don't think we need to change the semantics of these two properties and we don't need a new version of dynamic resource allocation for k8s. IMO, it's unfortunate that spark.executor.cores is both used for defining the physical cpu request and for defining task parallelism.

@foxish
Copy link
Contributor

foxish commented Feb 27, 2018

@cloud-fan @jiangxb1987 I assume you're referring to ExecutorAllocationManager.scala#L114-L118 when you refer to spark.executor.cores affecting dynamic allocation.

@liyinan926, while this might be the least obtrusive way to get Kubernetes mode to have fractional executor cores and not change other backends, it seems to me like separating the notion of a pod's CPU request and spark.executor.cores might prove confusing to users. We don't expect Spark users to necessarily understand pods or containers. Elaborating on @srowen's point on the JIRA, in theory, could we relax spark.executor.cores to accept doubles and then push the check down to the individual resource managers - force conversion to int if needed? It might require a similar change to spark.task.cpus also possibly. Thoughts?

@foxish
Copy link
Contributor

foxish commented Feb 27, 2018

Oops - knew we tried that before. Just paged in the discussion from #20460. Looks like that approach might need revisiting in the 2.4 context, given we're targeting dynamic allocation for the k8s backend as well.

@liyinan926
Copy link
Contributor Author

liyinan926 commented Feb 27, 2018

@foxish I think the confusion mostly comes from the property name. I don't think we need to change the semantics of spark.executor.cores nor the way dynamic resource allocation works in K8S mode. Like what I said above, spark.executor.cores/spark.task.cpus still determines the number of tasks that can run in an executor simultaneously given the cpus specified in spark.kubernetes.executor.cores as long as the tasks fit in. The problem now is very often an executor does not really need a full core in K8S to run a task or even multiple tasks at the same time.

@dvogelbacher
Copy link
Contributor

Can we call this spark.kubernetes.executor.request.cores, similar to spark.kubernetes.executor.limit.cores ?
I think this would make the naming a bit less confusing.

@mccheah
Copy link
Contributor

mccheah commented Mar 30, 2018

spark.executor.cores is the standard used by all the other cluster managers, so we have to use that.

@mccheah
Copy link
Contributor

mccheah commented Mar 30, 2018

Ah never mind sorry - thought we were referring to changing thread count with spark.kubernetes.executor.request.cores. Think this config key makes sense to communicate the exact K8s semantics here.

@liyinan926
Copy link
Contributor Author

Agreed that spark.kubernetes.executor.request.cores is a better name. Will change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precedence in what sense? This won't override spark.executor.cores when it comes to the number of concurrently running tasks. Think it's better to say that it's distinct from spark.executor.cores.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use fallbackConf here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, there's not a ConfigEntry for spark.executor.cores in core.

@SparkQA
Copy link

SparkQA commented Mar 30, 2018

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1845/

@SparkQA
Copy link

SparkQA commented Mar 30, 2018

Test build #88763 has finished for PR 20553 at commit cfe1eab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 30, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1845/

@liyinan926
Copy link
Contributor Author

Is this OK to merge?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be clear when spark.kubernetes.executor.request.cores is set spark.executor.cores is ignore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

liyinan926 and others added 4 commits April 2, 2018 11:34
…utor cores

As discussed in SPARK-23285, this PR introduces a new configuation property `spark.kubernetes.executor.cores` for specifying the phyiscal CPU cores requested for each executor pod. This is to avoid changing the semantics of `spark.executor.cores` and `spark.task.cpus` and their role in task scheduling, task parallelism, dynamic resource allocation, etc. The new configuraiton property only determines the physical CPU cores available to an executor. An executor can still run multiple tasks simultaneously by using appropriate values for `spark.executor.cores` and `spark.task.cpus`.
Copy link
Contributor

@foxish foxish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit. Good to merge after that.

Copy link
Contributor

@foxish foxish Apr 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two statements appear a bit redundant - can do with just one probably.

Copy link
Contributor

@foxish foxish Apr 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to also add typical ways (1.5m, 2.0, 5, etc) in which one can specify this and additionally link to https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/#cpu-units as you've done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@SparkQA
Copy link

SparkQA commented Apr 2, 2018

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1879/

@SparkQA
Copy link

SparkQA commented Apr 2, 2018

Test build #88826 has finished for PR 20553 at commit f103413.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 2, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1879/

@foxish
Copy link
Contributor

foxish commented Apr 2, 2018

Merging to master once tests pass. Thanks!

@SparkQA
Copy link

SparkQA commented Apr 2, 2018

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1880/

@SparkQA
Copy link

SparkQA commented Apr 2, 2018

Test build #88830 has finished for PR 20553 at commit a9db323.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 2, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1880/

@asfgit asfgit closed this in fe2b7a4 Apr 2, 2018
robert3005 pushed a commit to palantir/spark that referenced this pull request Apr 4, 2018
…utor cores

## What changes were proposed in this pull request?

As mentioned in SPARK-23285, this PR introduces a new configuration property `spark.kubernetes.executor.cores` for specifying the physical CPU cores requested for each executor pod. This is to avoid changing the semantics of `spark.executor.cores` and `spark.task.cpus` and their role in task scheduling, task parallelism, dynamic resource allocation, etc. The new configuration property only determines the physical CPU cores available to an executor. An executor can still run multiple tasks simultaneously by using appropriate values for `spark.executor.cores` and `spark.task.cpus`.

## How was this patch tested?

Unit tests.

felixcheung srowen jiangxb1987 jerryshao mccheah foxish

Author: Yinan Li <[email protected]>
Author: Yinan Li <[email protected]>

Closes apache#20553 from liyinan926/master.
mshtelma pushed a commit to mshtelma/spark that referenced this pull request Apr 5, 2018
…utor cores

## What changes were proposed in this pull request?

As mentioned in SPARK-23285, this PR introduces a new configuration property `spark.kubernetes.executor.cores` for specifying the physical CPU cores requested for each executor pod. This is to avoid changing the semantics of `spark.executor.cores` and `spark.task.cpus` and their role in task scheduling, task parallelism, dynamic resource allocation, etc. The new configuration property only determines the physical CPU cores available to an executor. An executor can still run multiple tasks simultaneously by using appropriate values for `spark.executor.cores` and `spark.task.cpus`.

## How was this patch tested?

Unit tests.

felixcheung srowen jiangxb1987 jerryshao mccheah foxish

Author: Yinan Li <[email protected]>
Author: Yinan Li <[email protected]>

Closes apache#20553 from liyinan926/master.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants