-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods. #27735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods. #27735
Conversation
|
Test build #119087 has finished for PR 27735 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Integration test decommission suite passed locally, however it seems to be a flaky one. retest please |
Can you expand on this, what does "mount" mean here in a distributed environment. The place $SPARK_CONF_DIR is pointing has to be on distributed filesystem. if its pointing to local disk how do you mount that on remote pod. |
@tgravescs, Hi Thomas Graves, the mount here refers to the kubernetes feature that it lets us mount ConfigMaps as Volumes. More details here. https://kubernetes.io/docs/concepts/storage/volumes/#configmap. I have also updated the description with more details how the Kubernetes config map will work here. Thanks a lot for taking a look though, do you have more comments? can you please help review this patch ! |
|
Test build #119155 has finished for PR 27735 at commit
|
|
Kubernetes integration test starting |
f137617 to
fe10c37
Compare
|
Kubernetes integration test status failure |
|
Test build #119159 has finished for PR 27735 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
@holdenk Hi Holden, Would you mind taking a look? I am not sure why |
|
retest this please |
|
Test build #119222 has finished for PR 27735 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
@skonto @ifilonenko, can you please review this PR? |
fe10c37 to
edea4b3
Compare
edea4b3 to
fb31e84
Compare
|
Test build #123835 has finished for PR 27735 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
fb31e84 to
a596378
Compare
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
Hi @dongjoon-hyun , I have already reverted this change. Actually, I did this change initially, because I did another pass to see if all the comments are addressed. Do you think, I am missing something? Thanks ! |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @ScrapCodes .
Sorry for the long delay. I believe this PR is almost ready for merge. Could you resolve the conflicts once more?
...managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala
Show resolved
Hide resolved
...managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala
Show resolved
Hide resolved
| object Fabric8Aliases { | ||
| type PODS = MixedOperation[Pod, PodList, DoneablePod, PodResource[Pod, DoneablePod]] | ||
| type CONFIG_MAPS = MixedOperation[ConfigMap, | ||
| ConfigMapList, DoneableConfigMap, Resource[ConfigMap, DoneableConfigMap]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't mind, shall we use like the following
- type CONFIG_MAPS = MixedOperation[ConfigMap,
- ConfigMapList, DoneableConfigMap, Resource[ConfigMap, DoneableConfigMap]]
+ type CONFIG_MAPS = MixedOperation[
+ ConfigMap, ConfigMapList, DoneableConfigMap, Resource[ConfigMap, DoneableConfigMap]]| type LABELED_PODS = FilterWatchListDeletable[ | ||
| Pod, PodList, java.lang.Boolean, Watch, Watcher[Pod]] | ||
| type LABELED_CONFIG_MAPS = FilterWatchListDeletable[ConfigMap, ConfigMapList, | ||
| java.lang.Boolean, Watch, Watcher[ConfigMap]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- type LABELED_CONFIG_MAPS = FilterWatchListDeletable[ConfigMap, ConfigMapList,
- java.lang.Boolean, Watch, Watcher[ConfigMap]]
+ type LABELED_CONFIG_MAPS = FilterWatchListDeletable[
+ ConfigMap, ConfigMapList, java.lang.Boolean, Watch, Watcher[ConfigMap]]| mapping | ||
| }.toMap | ||
| } else { | ||
| logInfo(s"Spark configuration directory is not detected, please set env:$ENV_SPARK_CONF_DIR") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the existing users, I guess we don't need to give this additional direction because they don't need this.
|
Hi, @holdenk and @tgravescs . Do you have any concern on this proposal? |
4d66b37 to
ba93111
Compare
|
Test build #131135 has finished for PR 27735 at commit
|
|
Kubernetes integration test starting |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM, @ScrapCodes .
I verified this manually too.
$ k get cm
NAME DATA AGE
spark-drv-b6d61375d00e2207-conf-map 3 9s
spark-exec-6fef2f75cf696182-conf-map 2 2s
$ k get pod
NAME READY STATUS RESTARTS AGE
org-apache-spark-examples-sparkpi-3aca7e75d00e1f8f-driver 1/1 Running 0 15s
spark-pi-b2de0775cf695c4a-exec-1 1/1 Running 0 8s
$ k exec -it org-apache-spark-examples-sparkpi-3aca7e75d00e1f8f-driver -- ls -al /opt/spark/conf
total 12
drwxrwxrwx 3 root root 4096 Nov 16 04:56 .
drwxr-xr-x 1 root root 4096 Nov 16 04:56 ..
drwxr-xr-x 2 root root 4096 Nov 16 04:56 ..2020_11_16_04_56_26.581426662
lrwxrwxrwx 1 root root 31 Nov 16 04:56 ..data -> ..2020_11_16_04_56_26.581426662
lrwxrwxrwx 1 root root 23 Nov 16 04:56 log4j.properties -> ..data/log4j.properties
lrwxrwxrwx 1 root root 25 Nov 16 04:56 metrics.properties -> ..data/metrics.properties
lrwxrwxrwx 1 root root 23 Nov 16 04:56 spark.properties -> ..data/spark.properties
$ k exec -it spark-pi-b2de0775cf695c4a-exec-1 -- ls -al /opt/spark/conf
total 16
drwxrwxrwx 3 root root 4096 Nov 16 04:56 .
drwxr-xr-x 1 root root 4096 Nov 16 04:56 ..
drwxr-xr-x 2 root root 4096 Nov 16 04:56 ..2020_11_16_04_56_31.214047175
lrwxrwxrwx 1 root root 31 Nov 16 04:56 ..data -> ..2020_11_16_04_56_31.214047175
lrwxrwxrwx 1 root root 23 Nov 16 04:56 log4j.properties -> ..data/log4j.properties
lrwxrwxrwx 1 root root 25 Nov 16 04:56 metrics.properties -> ..data/metrics.properties
$ k exec -it spark-pi-b2de0775cf695c4a-exec-1 -- cat /opt/spark/conf/log4j.properties
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
|
The current running K8s IT has one irrelevant failure. We can ignore it. |
|
Merged to master for Apache Spark 3.1! |
|
Kubernetes integration test status failure |
|
@dongjoon-hyun Thanks a lot, for reviewing and helping this move forward. :) |
|
Sorry for the delay, @ScrapCodes . This feature and the related JIRAs are targeting at Apache Spark 3.1, right? |
|
@dongjoon-hyun Yes ! |
### What changes were proposed in this pull request? This PR aims to add a new configuration `spark.kubernetes.executor.disableConfigMap`. ### Why are the changes needed? This can be use to disable config map creating for executor pods due to #27735 . ### Does this PR introduce _any_ user-facing change? No. By default, this doesn't change AS-IS behavior. This is a new feature to add an ability to disable SPARK-30985. ### How was this patch tested? Pass the newly added UT. Closes #31428 from dongjoon-hyun/SPARK-34316. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? This PR aims to add a new configuration `spark.kubernetes.executor.disableConfigMap`. ### Why are the changes needed? This can be use to disable config map creating for executor pods due to apache#27735 . ### Does this PR introduce _any_ user-facing change? No. By default, this doesn't change AS-IS behavior. This is a new feature to add an ability to disable SPARK-30985. ### How was this patch tested? Pass the newly added UT. Closes apache#31428 from dongjoon-hyun/SPARK-34316. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? This PR aims to add a new configuration `spark.kubernetes.executor.disableConfigMap`. ### Why are the changes needed? This can be use to disable config map creating for executor pods due to apache#27735 . ### Does this PR introduce _any_ user-facing change? No. By default, this doesn't change AS-IS behavior. This is a new feature to add an ability to disable SPARK-30985. ### How was this patch tested? Pass the newly added UT. Closes apache#31428 from dongjoon-hyun/SPARK-34316. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit f66e38c) Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This is an improvement, we mount all the user specific configuration files(except the templates and spark properties files) from
SPARK_CONF_DIRat the point of spark-submit, to both executor and driver pods. Currently, onlyspark.propertiesis mounted, only on driver.Why are the changes needed?
SPARK_CONF_DIRhosts several configuration files, for example,spark-defaults.conf- containing all the spark properties.log4j.properties- Logger configuration.core-site.xml- Hadoop related configuration.fairscheduler.xml- Spark's fair scheduling policy at the job level.metrics.properties- Spark metrics.At the moment, we can cannot propagate these files to the driver and executor configuration directory.
There is a design doc, with more details, and this patch is currently providing a reference implementation. Please take a look at the doc and comment, how we can improve. google docs link to the doc
Further scope
Support user defined configMaps.
Does this PR introduce any user-facing change?
Yes, previously the user configuration files(e.g. hdfs-site.xml, log4j.properties etc...) were not propagated by default, now after this patch it is propagated to driver and executor pods'
SPARK_CONF_DIR.How was this patch tested?
Added tests.
Also manually tested, by deploying it to a minikube cluster and observing the additional configuration files were present, and taking effect. For example, changes to log4j.properties was properly applied to executors.