Skip to content

Conversation

@cutiechi
Copy link

@cutiechi cutiechi commented Jul 8, 2021

What changes were proposed in this pull request?

Fix executor pod hadoop conf mount.

Why are the changes needed?

Arg --conf spark.kubernetes.hadoop.configMapName for executor pod not working.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UT.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@dongjoon-hyun
Copy link
Member

Hi, @cutiechi . Thank you for making a PR. Apache Spark community is using the contributor's GitHub Action resource. Please allow GitHub Action on your fork and make it up-to-date. The following failure is happening on your fork.

Screen Shot 2021-07-18 at 3 31 31 PM

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-36039][KUBERNETES] Fix executor pod hadoop conf mount [SPARK-36039][K8S] Fix executor pod hadoop conf mount Jul 18, 2021
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please add a test case.

cc @ScrapCodes

@cutiechi
Copy link
Author

Hi, @cutiechi . Thank you for making a PR. Apache Spark community is using the contributor's GitHub Action resource. Please allow GitHub Action on your fork and make it up-to-date. The following failure is happening on your fork.

Screen Shot 2021-07-18 at 3 31 31 PM

Hello, I have already done this, but I don’t know why I can’t pass the detection of this action

图片

@cutiechi
Copy link
Author

Also, please add a test case.

cc @ScrapCodes

Ok

@ScrapCodes
Copy link
Member

Hi @cutiechi,

Thank you for the PR!

There is a way to mount arbitrary hadoop configuration on executors, i.e. by spark conf propagate implemented in [SPARK-30985]. Place all hadoop configuration files in the SPARK_HOME/conf dir and it will be loaded on the executors and driver as well. This happens internally by creating a configMap, one for driver and executor each. At the moment these configMaps are not fully user configurable.

IMO, If we make these configMap user configurable, then that solution will apply to all the frameworks not specific to hadoop. [SPARK-32223]

In the mean time, we can have this. But we would need a k8s integration test.

@ScrapCodes
Copy link
Member

Or may be the integration test is not absolutely necessary, just the Unit test is enough. e.g. this test

@cutiechi cutiechi requested a review from dongjoon-hyun July 19, 2021 14:52
@cutiechi
Copy link
Author

@dongjoon-hyun @ScrapCodes

Please help me review again, thx

@tbcdns
Copy link

tbcdns commented Jul 21, 2021

Thanks for the fix, I am also facing this issue.

FYI, I think the same issue is present for the kerberos config. When specifying the prop spark.kubernetes.kerberos.krb5.configMapName, it is provisioned only to the driver but not the executors...

@cutiechi
Copy link
Author

@dongjoon-hyun @ScrapCodes

Please help me review again, thx

@ScrapCodes
Copy link
Member

While testing your PR,

I am getting the same error again.

export HADOOP_CONF_DIR=`pwd`/conf


./bin/spark-submit \
    --master <IP>:<port> \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=2 \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf spark.kubernetes.container.image=scrapcodes/spark:3.3.0-SNAPSHOT \
    local:///opt/spark/examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar

Executors are crash looping with:

  Warning  FailedMount  23s (x7 over 55s)  kubelet, 10.240.128.22  MountVolume.SetUp failed for volume "hadoop-properties" : configmap "spark-pi-4c4e757aeca6de9b-hadoop-config" not found

This happens because, configmap does not get created in the executor step. And the current code is designed that way. It will work, if we use a user provided configmap.

@cutiechi
Copy link
Author

I recompiled the code of the current branch and re-tested according to the following command:

@dongjoon-hyun @ScrapCodes

export HADOOP_CONF_DIR=`pwd`/conf

./bin/spark-submit  \
  --master k8s://https://172.16.102.10:8443 \
  --deploy-mode cluster \
  --name java-queue-stream \
  --class org.apache.spark.examples.streaming.JavaQueueStream \
  --conf spark.executor.instances=3  \
  --conf spark.kubernetes.container.image=spark:testing \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
  local:///opt/spark/examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar

But the result is different from yours, all pods are running normally

Pods:

图片

One of the executor pod:

图片

Hadoop properties ConfigMap of this Pod:

图片

Did you pull the latest code and recompile and test it? Please help me review again, thx

@cutiechi
Copy link
Author

@dongjoon-hyun @ScrapCodes

This video was recorded during my test:

CleanShot.2021-07-28.at.22.12.29.mp4

@ScrapCodes
Copy link
Member

ScrapCodes commented Jul 29, 2021

Interesting!

Can you run with spark.logConf=true ? and show the output?

EDIT: for this to work.

  1. cp conf/log4j.properties.template conf/log4j.properties
  2. configure DEBUG logging in conf/log4j.properties
./bin/spark-submit  \
  --master k8s://https://172.16.102.10:8443 \
  --deploy-mode cluster \
  --name java-queue-stream \
  --class org.apache.spark.examples.streaming.JavaQueueStream \
  --conf spark.executor.instances=3  \
  --conf spark.logConf=true \
  --conf spark.kubernetes.container.image=spark:testing \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
  local:///opt/spark/examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar

Another question: What happens when you run in deploy mode as client?
i.e. --deploy-mode client

@cutiechi
Copy link
Author

Interesting!

Can you run with spark.logConf=true ? and show the output?

Ok

@cutiechi
Copy link
Author

cutiechi commented Jul 29, 2021

@ScrapCodes Log in driver pod, there is no such log in executor pod :

图片

@cutiechi
Copy link
Author

@ScrapCodes Client Mode:

./bin/spark-submit  \
  --master k8s://https://172.16.102.10:8443 \
  --deploy-mode client \
  --name java-queue-stream \
  --class org.apache.spark.examples.streaming.JavaQueueStream \
  --conf spark.driver.ip=172.16.102.1 \
  --conf spark.executor.instances=3  \
  --conf spark.logConf=true \
  --conf spark.kubernetes.container.image=spark:testing \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
  examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar
CleanShot.2021-07-29.at.21.05.56.mp4

@cutiechi
Copy link
Author

cutiechi commented Aug 2, 2021

@dongjoon-hyun @ScrapCodes

Please help me review again, thx

@cutiechi
Copy link
Author

@ScrapCodes @dongjoon-hyun
Please help me review again, thx

@holdenk
Copy link
Contributor

holdenk commented Sep 13, 2021

Ping @ScrapCodes / @dongjoon-hyun , I can also take a look if y'all are busy.

@ScrapCodes
Copy link
Member

Hi @holdenk , with resources(i.e. a k8s cluster on IBM Cloud) I have, I was unable to get this patch to work. So, I could not make progress. Feel free to take a look ! (your and @dongjoon-hyun's approval will be final)

@cutiechi
Copy link
Author

I recompiled and tested it again, I'm sure there is no problem @ScrapCodes @holdenk @dongjoon-hyun

@cutiechi
Copy link
Author

@ScrapCodes Did you rebuild the image?

@cutiechi
Copy link
Author

@ScrapCodes ping

@github-actions
Copy link

github-actions bot commented Apr 4, 2022

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Apr 4, 2022
@github-actions github-actions bot closed this Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants