Skip to content

Conversation

@ScrapCodes
Copy link
Member

@ScrapCodes ScrapCodes commented Feb 28, 2020

What changes were proposed in this pull request?

This is an improvement, we mount all the user specific configuration files(except the templates and spark properties files) from SPARK_CONF_DIR at the point of spark-submit, to both executor and driver pods. Currently, only spark.properties is mounted, only on driver.

Why are the changes needed?

SPARK_CONF_DIR hosts several configuration files, for example,

  1. spark-defaults.conf - containing all the spark properties.
  2. log4j.properties - Logger configuration.
  3. core-site.xml - Hadoop related configuration.
  4. fairscheduler.xml - Spark's fair scheduling policy at the job level.
  5. metrics.properties - Spark metrics.
  6. Any user specific - library or framework specific configuration file.

At the moment, we can cannot propagate these files to the driver and executor configuration directory.

There is a design doc, with more details, and this patch is currently providing a reference implementation. Please take a look at the doc and comment, how we can improve. google docs link to the doc

Further scope

Support user defined configMaps.

Does this PR introduce any user-facing change?

Yes, previously the user configuration files(e.g. hdfs-site.xml, log4j.properties etc...) were not propagated by default, now after this patch it is propagated to driver and executor pods' SPARK_CONF_DIR.

How was this patch tested?

Added tests.

Also manually tested, by deploying it to a minikube cluster and observing the additional configuration files were present, and taking effect. For example, changes to log4j.properties was properly applied to executors.

@SparkQA
Copy link

SparkQA commented Feb 28, 2020

Test build #119087 has finished for PR 27735 at commit 2b71a77.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 28, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/23831/

@SparkQA
Copy link

SparkQA commented Feb 28, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/23831/

@ScrapCodes
Copy link
Member Author

Integration test decommission suite passed locally, however it seems to be a flaky one.

retest please

@tgravescs
Copy link
Contributor

ill let the user specific configuration files be mounted on the driver and executor pods'

Can you expand on this, what does "mount" mean here in a distributed environment. The place $SPARK_CONF_DIR is pointing has to be on distributed filesystem. if its pointing to local disk how do you mount that on remote pod.

@ScrapCodes
Copy link
Member Author

ScrapCodes commented Mar 2, 2020

ill let the user specific configuration files be mounted on the driver and executor pods'

Can you expand on this, what does "mount" mean here in a distributed environment. The place $SPARK_CONF_DIR is pointing has to be on distributed filesystem. if its pointing to local disk how do you mount that on remote pod.

@tgravescs, Hi Thomas Graves, the mount here refers to the kubernetes feature that it lets us mount ConfigMaps as Volumes. More details here. https://kubernetes.io/docs/concepts/storage/volumes/#configmap. SPARK_CONF_DIR is not required to be on a distributed filesystem.

I have also updated the description with more details how the Kubernetes config map will work here.

Thanks a lot for taking a look though, do you have more comments? can you please help review this patch !

@SparkQA
Copy link

SparkQA commented Mar 2, 2020

Test build #119155 has finished for PR 27735 at commit f137617.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 2, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/23897/

@ScrapCodes ScrapCodes force-pushed the SPARK-30985/spark-conf-k8s-propagate branch from f137617 to fe10c37 Compare March 2, 2020 08:30
@SparkQA
Copy link

SparkQA commented Mar 2, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/23897/

@SparkQA
Copy link

SparkQA commented Mar 2, 2020

Test build #119159 has finished for PR 27735 at commit fe10c37.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 2, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/23901/

@SparkQA
Copy link

SparkQA commented Mar 2, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/23901/

@ScrapCodes
Copy link
Member Author

@holdenk Hi Holden, Would you mind taking a look? I am not sure why DecommissionSuite fails on jenkins ? It passed locally though. Do you think this patch has broken the suite, in some way? Thanks !

@ScrapCodes
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 3, 2020

Test build #119222 has finished for PR 27735 at commit fe10c37.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 3, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/23961/

@SparkQA
Copy link

SparkQA commented Mar 3, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/23961/

@ScrapCodes
Copy link
Member Author

@skonto @ifilonenko, can you please review this PR?

@ScrapCodes ScrapCodes changed the title [SPARK-30985][SPARK-25065][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods. [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods. Jun 11, 2020
@ScrapCodes ScrapCodes force-pushed the SPARK-30985/spark-conf-k8s-propagate branch from fe10c37 to edea4b3 Compare June 11, 2020 08:14
@ScrapCodes ScrapCodes force-pushed the SPARK-30985/spark-conf-k8s-propagate branch from edea4b3 to fb31e84 Compare June 11, 2020 08:16
@SparkQA
Copy link

SparkQA commented Jun 11, 2020

Test build #123835 has finished for PR 27735 at commit fb31e84.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 11, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/28460/

@SparkQA
Copy link

SparkQA commented Jun 11, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/28460/

@ScrapCodes ScrapCodes changed the title [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods. [WIP][SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods. Jun 12, 2020
@ScrapCodes ScrapCodes force-pushed the SPARK-30985/spark-conf-k8s-propagate branch from fb31e84 to a596378 Compare July 1, 2020 11:47
@SparkQA
Copy link

SparkQA commented Nov 6, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35326/

@SparkQA
Copy link

SparkQA commented Nov 6, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35326/

@ScrapCodes
Copy link
Member Author

Do we need to rename SecretVolumeUtils.scala to K8sVolumeTestUtils? Actually, the code is still using SecretVolumeUtils.podHasVolume because we didn't change the class name. If you don't mind, shall we keep the original file name?

Hi @dongjoon-hyun , I have already reverted this change. Actually, I did this change initially, because SecretVolumeUtils.podHasVolume is used at more places than it was initially intended and the name Secret has no relevance anymore. But, we can tackle this in a separate patch.

I did another pass to see if all the comments are addressed. Do you think, I am missing something?

Thanks !

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @ScrapCodes .
Sorry for the long delay. I believe this PR is almost ready for merge. Could you resolve the conflicts once more?

object Fabric8Aliases {
type PODS = MixedOperation[Pod, PodList, DoneablePod, PodResource[Pod, DoneablePod]]
type CONFIG_MAPS = MixedOperation[ConfigMap,
ConfigMapList, DoneableConfigMap, Resource[ConfigMap, DoneableConfigMap]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind, shall we use like the following

-  type CONFIG_MAPS = MixedOperation[ConfigMap,
-    ConfigMapList, DoneableConfigMap, Resource[ConfigMap, DoneableConfigMap]]
+  type CONFIG_MAPS = MixedOperation[
+    ConfigMap, ConfigMapList, DoneableConfigMap, Resource[ConfigMap, DoneableConfigMap]]

type LABELED_PODS = FilterWatchListDeletable[
Pod, PodList, java.lang.Boolean, Watch, Watcher[Pod]]
type LABELED_CONFIG_MAPS = FilterWatchListDeletable[ConfigMap, ConfigMapList,
java.lang.Boolean, Watch, Watcher[ConfigMap]]
Copy link
Member

@dongjoon-hyun dongjoon-hyun Nov 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-  type LABELED_CONFIG_MAPS = FilterWatchListDeletable[ConfigMap, ConfigMapList,
-    java.lang.Boolean, Watch, Watcher[ConfigMap]]
+  type LABELED_CONFIG_MAPS = FilterWatchListDeletable[
+    ConfigMap, ConfigMapList, java.lang.Boolean, Watch, Watcher[ConfigMap]]

mapping
}.toMap
} else {
logInfo(s"Spark configuration directory is not detected, please set env:$ENV_SPARK_CONF_DIR")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the existing users, I guess we don't need to give this additional direction because they don't need this.

@dongjoon-hyun
Copy link
Member

Hi, @holdenk and @tgravescs . Do you have any concern on this proposal?

@ScrapCodes ScrapCodes force-pushed the SPARK-30985/spark-conf-k8s-propagate branch from 4d66b37 to ba93111 Compare November 16, 2020 06:49
@SparkQA
Copy link

SparkQA commented Nov 16, 2020

Test build #131135 has finished for PR 27735 at commit ba93111.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 16, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35738/

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM, @ScrapCodes .

I verified this manually too.

$ k get cm
NAME                                   DATA   AGE
spark-drv-b6d61375d00e2207-conf-map    3      9s
spark-exec-6fef2f75cf696182-conf-map   2      2s

$ k get pod
NAME                                                        READY   STATUS    RESTARTS   AGE
org-apache-spark-examples-sparkpi-3aca7e75d00e1f8f-driver   1/1     Running   0          15s
spark-pi-b2de0775cf695c4a-exec-1                            1/1     Running   0          8s

$ k exec -it org-apache-spark-examples-sparkpi-3aca7e75d00e1f8f-driver -- ls -al /opt/spark/conf
total 12
drwxrwxrwx 3 root root 4096 Nov 16 04:56 .
drwxr-xr-x 1 root root 4096 Nov 16 04:56 ..
drwxr-xr-x 2 root root 4096 Nov 16 04:56 ..2020_11_16_04_56_26.581426662
lrwxrwxrwx 1 root root   31 Nov 16 04:56 ..data -> ..2020_11_16_04_56_26.581426662
lrwxrwxrwx 1 root root   23 Nov 16 04:56 log4j.properties -> ..data/log4j.properties
lrwxrwxrwx 1 root root   25 Nov 16 04:56 metrics.properties -> ..data/metrics.properties
lrwxrwxrwx 1 root root   23 Nov 16 04:56 spark.properties -> ..data/spark.properties

$ k exec -it spark-pi-b2de0775cf695c4a-exec-1 -- ls -al /opt/spark/conf
total 16
drwxrwxrwx 3 root root 4096 Nov 16 04:56 .
drwxr-xr-x 1 root root 4096 Nov 16 04:56 ..
drwxr-xr-x 2 root root 4096 Nov 16 04:56 ..2020_11_16_04_56_31.214047175
lrwxrwxrwx 1 root root   31 Nov 16 04:56 ..data -> ..2020_11_16_04_56_31.214047175
lrwxrwxrwx 1 root root   23 Nov 16 04:56 log4j.properties -> ..data/log4j.properties
lrwxrwxrwx 1 root root   25 Nov 16 04:56 metrics.properties -> ..data/metrics.properties

$ k exec -it spark-pi-b2de0775cf695c4a-exec-1 -- cat /opt/spark/conf/log4j.properties
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with

@dongjoon-hyun
Copy link
Member

The current running K8s IT has one irrelevant failure. We can ignore it.

- Test basic decommissioning *** FAILED ***

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 3.1!

@SparkQA
Copy link

SparkQA commented Nov 16, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35738/

@ScrapCodes ScrapCodes deleted the SPARK-30985/spark-conf-k8s-propagate branch November 16, 2020 08:47
@ScrapCodes
Copy link
Member Author

@dongjoon-hyun Thanks a lot, for reviewing and helping this move forward. :)

@dongjoon-hyun
Copy link
Member

Sorry for the delay, @ScrapCodes . This feature and the related JIRAs are targeting at Apache Spark 3.1, right?

@ScrapCodes
Copy link
Member Author

ScrapCodes commented Nov 17, 2020

@dongjoon-hyun Yes !

dongjoon-hyun added a commit that referenced this pull request Feb 2, 2021
### What changes were proposed in this pull request?

This PR aims to add a new configuration `spark.kubernetes.executor.disableConfigMap`.

### Why are the changes needed?

This can be use to disable config map creating for executor pods due to #27735 .

### Does this PR introduce _any_ user-facing change?

No. By default, this doesn't change AS-IS behavior.
This is a new feature to add an ability to disable SPARK-30985.

### How was this patch tested?

Pass the newly added UT.

Closes #31428 from dongjoon-hyun/SPARK-34316.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
skestle pushed a commit to skestle/spark that referenced this pull request Feb 3, 2021
### What changes were proposed in this pull request?

This PR aims to add a new configuration `spark.kubernetes.executor.disableConfigMap`.

### Why are the changes needed?

This can be use to disable config map creating for executor pods due to apache#27735 .

### Does this PR introduce _any_ user-facing change?

No. By default, this doesn't change AS-IS behavior.
This is a new feature to add an ability to disable SPARK-30985.

### How was this patch tested?

Pass the newly added UT.

Closes apache#31428 from dongjoon-hyun/SPARK-34316.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
### What changes were proposed in this pull request?

This PR aims to add a new configuration `spark.kubernetes.executor.disableConfigMap`.

### Why are the changes needed?

This can be use to disable config map creating for executor pods due to apache#27735 .

### Does this PR introduce _any_ user-facing change?

No. By default, this doesn't change AS-IS behavior.
This is a new feature to add an ability to disable SPARK-30985.

### How was this patch tested?

Pass the newly added UT.

Closes apache#31428 from dongjoon-hyun/SPARK-34316.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit f66e38c)
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants