[SPARK-36039][K8S] Fix executor pod hadoop conf mount #33257

cutiechi · 2021-07-08T06:33:34Z

What changes were proposed in this pull request?

Fix executor pod hadoop conf mount.

Why are the changes needed?

Arg --conf spark.kubernetes.hadoop.configMapName for executor pod not working.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UT.

AmplabJenkins · 2021-07-08T07:06:50Z

Can one of the admins verify this patch?

dongjoon-hyun · 2021-07-18T22:32:03Z

Hi, @cutiechi . Thank you for making a PR. Apache Spark community is using the contributor's GitHub Action resource. Please allow GitHub Action on your fork and make it up-to-date. The following failure is happening on your fork.

dongjoon-hyun

Also, please add a test case.

cc @ScrapCodes

cutiechi · 2021-07-19T01:06:57Z

Hi, @cutiechi . Thank you for making a PR. Apache Spark community is using the contributor's GitHub Action resource. Please allow GitHub Action on your fork and make it up-to-date. The following failure is happening on your fork.

Hello, I have already done this, but I don’t know why I can’t pass the detection of this action

cutiechi · 2021-07-19T01:07:53Z

Also, please add a test case.

cc @ScrapCodes

Ok

ScrapCodes · 2021-07-19T06:53:14Z

Hi @cutiechi,

Thank you for the PR!

There is a way to mount arbitrary hadoop configuration on executors, i.e. by spark conf propagate implemented in [SPARK-30985]. Place all hadoop configuration files in the SPARK_HOME/conf dir and it will be loaded on the executors and driver as well. This happens internally by creating a configMap, one for driver and executor each. At the moment these configMaps are not fully user configurable.

IMO, If we make these configMap user configurable, then that solution will apply to all the frameworks not specific to hadoop. [SPARK-32223]

In the mean time, we can have this. But we would need a k8s integration test.

ScrapCodes · 2021-07-19T11:18:46Z

Or may be the integration test is not absolutely necessary, just the Unit test is enough. e.g. this test

cutiechi · 2021-07-19T14:53:40Z

@dongjoon-hyun @ScrapCodes

Please help me review again, thx

...core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala

tbcdns · 2021-07-21T14:54:19Z

Thanks for the fix, I am also facing this issue.

FYI, I think the same issue is present for the kerberos config. When specifying the prop spark.kubernetes.kerberos.krb5.configMapName, it is provisioned only to the driver but not the executors...

…test

cutiechi · 2021-07-28T04:12:13Z

@dongjoon-hyun @ScrapCodes

Please help me review again, thx

ScrapCodes · 2021-07-28T10:30:54Z

While testing your PR,

I am getting the same error again.

export HADOOP_CONF_DIR=`pwd`/conf


./bin/spark-submit \
    --master <IP>:<port> \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=2 \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf spark.kubernetes.container.image=scrapcodes/spark:3.3.0-SNAPSHOT \
    local:///opt/spark/examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar

Executors are crash looping with:

  Warning  FailedMount  23s (x7 over 55s)  kubelet, 10.240.128.22  MountVolume.SetUp failed for volume "hadoop-properties" : configmap "spark-pi-4c4e757aeca6de9b-hadoop-config" not found

This happens because, configmap does not get created in the executor step. And the current code is designed that way. It will work, if we use a user provided configmap.

cutiechi · 2021-07-28T14:30:06Z

I recompiled the code of the current branch and re-tested according to the following command:

@dongjoon-hyun @ScrapCodes

export HADOOP_CONF_DIR=`pwd`/conf

./bin/spark-submit  \
  --master k8s://https://172.16.102.10:8443 \
  --deploy-mode cluster \
  --name java-queue-stream \
  --class org.apache.spark.examples.streaming.JavaQueueStream \
  --conf spark.executor.instances=3  \
  --conf spark.kubernetes.container.image=spark:testing \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
  local:///opt/spark/examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar

But the result is different from yours, all pods are running normally

Pods:

One of the executor pod:

Hadoop properties ConfigMap of this Pod:

Did you pull the latest code and recompile and test it? Please help me review again, thx

cutiechi · 2021-07-28T14:33:28Z

@dongjoon-hyun @ScrapCodes

This video was recorded during my test:

CleanShot.2021-07-28.at.22.12.29.mp4

ScrapCodes · 2021-07-29T12:31:48Z

Interesting!

Can you run with spark.logConf=true ? and show the output?

EDIT: for this to work.

cp conf/log4j.properties.template conf/log4j.properties
configure DEBUG logging in conf/log4j.properties

./bin/spark-submit  \
  --master k8s://https://172.16.102.10:8443 \
  --deploy-mode cluster \
  --name java-queue-stream \
  --class org.apache.spark.examples.streaming.JavaQueueStream \
  --conf spark.executor.instances=3  \
  --conf spark.logConf=true \
  --conf spark.kubernetes.container.image=spark:testing \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
  local:///opt/spark/examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar

Another question: What happens when you run in deploy mode as client?
i.e. --deploy-mode client

cutiechi · 2021-07-29T12:37:13Z

Interesting!

Can you run with spark.logConf=true ? and show the output?

Ok

cutiechi · 2021-07-29T13:06:06Z

@ScrapCodes Log in driver pod, there is no such log in executor pod :

cutiechi · 2021-07-29T13:09:06Z

@ScrapCodes Client Mode:

./bin/spark-submit  \
  --master k8s://https://172.16.102.10:8443 \
  --deploy-mode client \
  --name java-queue-stream \
  --class org.apache.spark.examples.streaming.JavaQueueStream \
  --conf spark.driver.ip=172.16.102.1 \
  --conf spark.executor.instances=3  \
  --conf spark.logConf=true \
  --conf spark.kubernetes.container.image=spark:testing \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
  examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar

CleanShot.2021-07-29.at.21.05.56.mp4

cutiechi · 2021-08-02T02:50:37Z

@dongjoon-hyun @ScrapCodes

Please help me review again, thx

cutiechi · 2021-08-10T12:47:42Z

@ScrapCodes @dongjoon-hyun
Please help me review again, thx

holdenk · 2021-09-13T19:02:51Z

Ping @ScrapCodes / @dongjoon-hyun , I can also take a look if y'all are busy.

ScrapCodes · 2021-09-14T10:53:06Z

Hi @holdenk , with resources(i.e. a k8s cluster on IBM Cloud) I have, I was unable to get this patch to work. So, I could not make progress. Feel free to take a look ! (your and @dongjoon-hyun's approval will be final)

cutiechi · 2021-10-12T08:37:51Z

I recompiled and tested it again, I'm sure there is no problem @ScrapCodes @holdenk @dongjoon-hyun

cutiechi · 2021-10-12T08:42:58Z

@ScrapCodes Did you rebuild the image?

cutiechi · 2021-12-24T07:43:05Z

@ScrapCodes ping

github-actions · 2022-04-04T00:16:26Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions bot added the KUBERNETES label Jul 8, 2021

dongjoon-hyun changed the title ~~[SPARK-36039][KUBERNETES] Fix executor pod hadoop conf mount~~ [SPARK-36039][K8S] Fix executor pod hadoop conf mount Jul 18, 2021

dongjoon-hyun requested changes Jul 18, 2021

View reviewed changes

cutiechi requested a review from dongjoon-hyun July 19, 2021 14:52

ScrapCodes suggested changes Jul 20, 2021

View reviewed changes

...core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala Outdated Show resolved Hide resolved

cutiechi added 4 commits July 28, 2021 09:11

Add hadoop conf executor feature step to kubernetes executor builder

ab865a7

Add hadoop conf executor feature step to kubernetes executor builder …

61c9372

…test

Fix executor pod hadoop conf config map create

2a91196

Fix executor pod create or replace NPE

daee42b

github-actions bot added the Stale label Apr 4, 2022

github-actions bot closed this Apr 5, 2022

[SPARK-36039][K8S] Fix executor pod hadoop conf mount #33257

[SPARK-36039][K8S] Fix executor pod hadoop conf mount #33257

Uh oh!

Conversation

cutiechi commented Jul 8, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

AmplabJenkins commented Jul 8, 2021

Uh oh!

dongjoon-hyun commented Jul 18, 2021

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

cutiechi commented Jul 19, 2021

Uh oh!

cutiechi commented Jul 19, 2021

Uh oh!

ScrapCodes commented Jul 19, 2021

Uh oh!

ScrapCodes commented Jul 19, 2021

Uh oh!

cutiechi commented Jul 19, 2021

Uh oh!

Uh oh!

tbcdns commented Jul 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cutiechi commented Jul 28, 2021

Uh oh!

ScrapCodes commented Jul 28, 2021

Uh oh!

cutiechi commented Jul 28, 2021

Uh oh!

cutiechi commented Jul 28, 2021

Uh oh!

ScrapCodes commented Jul 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cutiechi commented Jul 29, 2021

Uh oh!

cutiechi commented Jul 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cutiechi commented Jul 29, 2021

Uh oh!

cutiechi commented Aug 2, 2021

Uh oh!

cutiechi commented Aug 10, 2021

Uh oh!

holdenk commented Sep 13, 2021

Uh oh!

ScrapCodes commented Sep 14, 2021

Uh oh!

cutiechi commented Oct 12, 2021

Uh oh!

cutiechi commented Oct 12, 2021

Uh oh!

cutiechi commented Dec 24, 2021

Uh oh!

github-actions bot commented Apr 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

tbcdns commented Jul 21, 2021 •

edited

Loading

ScrapCodes commented Jul 29, 2021 •

edited

Loading

cutiechi commented Jul 29, 2021 •

edited

Loading