[SPARK-52481] Add `Spark History Server` example #249

dongjoon-hyun · 2025-06-14T20:36:11Z

What changes were proposed in this pull request?

Add Spark History Server example.

Why are the changes needed?

Since Apache Spark 4.0, Spark rolls the event logs by default and compressed them by default.

However, we still need more configurations to allow SHS manages the event log directories. This PR aims to provide an example of Spark History Server with the configuration.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual review.

Was this patch authored or co-authored using generative AI tooling?

No.

dongjoon-hyun · 2025-06-14T20:45:01Z

examples/spark-history-server.yaml

+  applicationTolerations:
+    restartConfig:
+      restartPolicy: Always
+      maxRestartAttempts: 9223372036854775807


Since the default value of maxRestartAttempts is 3 (for normal jobs), I set this as Long.MAX_VALUE because SHS is supposed to run forever.

$ jshell | Welcome to JShell -- Version 21.0.7 | For an introduction type: /help intro jshell> Long.MAX_VALUE $1 ==> 9223372036854775807

dongjoon-hyun · 2025-06-14T23:19:11Z

Thank you, @viirya . Merged to main.

melin · 2025-06-15T01:39:15Z

The Spark event log cannot be directly written to s3. There are the following errors:

WARN S3ABlockOutputStream: Application invoked the Syncable API against stream writing to logs/spark/events/eventlog_v2_spark-61bc831c6f774629a1cd1bc95d6a7288/events_1_spark-61bc831c6f774629a1cd1bc95d6a7288. This is unsupported

dongjoon-hyun · 2025-06-16T17:04:41Z

@melin , it's a warning, not an error.

The Spark event log cannot be directly written to s3. There are the following errors:

FYI, Apache Spark 3.2+ has the following by default.

[SPARK-35868][CORE] Add fs.s3a.downgrade.syncable.exceptions if not set spark#33044

melin · 2025-06-17T00:42:28Z

@melin , it's a warning, not an error.

The Spark event log cannot be directly written to s3. There are the following errors:

FYI, Apache Spark 3.2+ has the following by default.

[SPARK-35868][CORE] Add fs.s3a.downgrade.syncable.exceptions if not set spark#33044

spark.eventLog.dir cannot be directly set to s3 path, which is not very convenient. There are two solutions:

Write directly to the nfs shared storage
First, write to the pod locally. Once the task is completed, upload it to s3.

dongjoon-hyun · 2025-06-17T15:28:04Z

@melin , it's a warning, not an error.

The Spark event log cannot be directly written to s3. There are the following errors:

FYI, Apache Spark 3.2+ has the following by default.

[SPARK-35868][CORE] Add fs.s3a.downgrade.syncable.exceptions if not set spark#33044

spark.eventLog.dir cannot be directly set to s3 path, which is not very convenient. There are two solutions:

Write directly to the nfs shared storage

First, write to the pod locally. Once the task is completed, upload it to s3.

It seems that the above are not questions.

So, to be clear, No, your personal assessments and proposed solutions sounds wrong to me, @melin . It doesn't make sense at all for long running jobs and streaming jobs. In other words, that's not a community recommendation.

Apache Spark community has been using S3 (and S3-compatible object storage) directly as an Apache Spark event log directory for a long time.

BTW, Apache Spark PR is not a good place for this kind of Q&A. I'd recommend you to send to [email protected] if you have any difficulties or you want to discuss this.

https://spark.apache.org/community.html

dongjoon-hyun · 2025-06-17T16:22:18Z

FYI, @melin . Here is the example.

[SPARK-52512] Add pi-with-eventlog example #250

Ferdinanddb · 2025-06-30T19:42:02Z

Hi @dongjoon-hyun , thank you for your work here.

I am trying to deploy a spark-history-server pod using your example, and I see it as running but I cannot port-forward it since the port 18080 is not being used by the pod (and so not used by the service).

Could you please tell me what should I do to be able to fix my issue?

I have the following config:

apiVersion: spark.apache.org/v1
kind: SparkApplication
metadata:
  name: spark-history-server
  namespace: spark-operator
spec:
  mainClass: "org.apache.spark.deploy.history.HistoryServer"
  sparkConf:
    # spark.jars.packages: "org.apache.hadoop:hadoop-aws:3.4.1"
    spark.jars: "https://repo1.maven.org/maven2/com/google/cloud/bigdataoss/gcs-connector/3.1.1/gcs-connector-3.1.1-shaded.jar"
    spark.jars.ivy: "/tmp/.ivy2.5.2"
    spark.driver.memory: "2g"
    spark.kubernetes.namespace: spark-operator
    spark.kubernetes.authenticate.driver.serviceAccountName: "spark"
    spark.kubernetes.container.image: "apache/spark:4.0.0-java21-scala"
    spark.history.fs.cleaner.enabled: "true"
    spark.history.fs.cleaner.maxAge: "30d"
    spark.history.fs.cleaner.maxNum: "100"
    spark.history.fs.eventLog.rolling.maxFilesToRetain: "10"
    
    spark.hadoop.fs.gs.impl: "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem"
    spark.hadoop.fs.AbstractFileSystem.gs.impl: "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS"
    spark.hadoop.fs.gs.auth.service.account.enable: "true"
    # spark.hadoop.fs.gs.auth.type: "COMPUTE_ENGINE"
    spark.hadoop.fs.gs.project.id: "rcim-prod-data-core-0"
    spark.hadoop.fs.defaultFS: "gs://SOME-BUCKET-GCS-spark-logs"
    spark.history.fs.logDirectory: "gs://SOME-BUCKET-GCS-spark-logs"
  runtimeVersions:
    sparkVersion: "4.0.0"
  applicationTolerations:
    restartConfig:
      restartPolicy: Always
      maxRestartAttempts: 9223372036854775807

I changed the value of my GCS bucket name, but again: I don't see any errors in the pod's logs.

When I execute the following port-forward command I get this:

kubectl port-forward svc/spark-history-server-0-driver-svc 8080:spark-ui --namespace spark-operator
Forwarding from 127.0.0.1:8080 -> 4040
Forwarding from [::1]:8080 -> 4040
Handling connection for 8080
E0630 21:06:51.716348 2639361 portforward.go:424] "Unhandled Error" err="an error occurred forwarding 8080 -> 4040: error forwarding port 4040 to pod 76c12381718e2542ae6e371d8c28992dc9e541439cd53b832df5aa5aaaeddbac, uid : failed to execute portforward in network namespace \"/var/run/netns/cni-5208164a-41a3-2769-bfaf-6b6f31eb1c80\": failed to connect to localhost:4040 inside namespace \"76c12381718e2542ae6e371d8c28992dc9e541439cd53b832df5aa5aaaeddbac\", IPv4: dial tcp4 127.0.0.1:4040: connect: connection refused IPv6 dial tcp6 [::1]:4040: connect: cannot assign requested address "

Thank you very much if you can help!

dongjoon-hyun · 2025-06-30T19:51:11Z

You need to use spark.ui.port. Let me revise the example for you.

dongjoon-hyun · 2025-06-30T19:54:48Z

Here is the PR, @Ferdinanddb .

[SPARK-52623] Add spark.ui.port to Spark History Server example #264

Ferdinanddb · 2025-06-30T20:01:20Z

@dongjoon-hyun thanks for the taking the time for this! I confirm that it now works :).

[SPARK-52481] Add Spark History Server example

241ce46

github-actions bot added the EXAMPLES label Jun 14, 2025

Add SHS clean-up configuration

ac946e2

dongjoon-hyun commented Jun 14, 2025

View reviewed changes

viirya approved these changes Jun 14, 2025

View reviewed changes

dongjoon-hyun closed this in 24f66ec Jun 14, 2025

dongjoon-hyun deleted the SPARK-52481 branch June 14, 2025 23:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-52481] Add `Spark History Server` example #249

[SPARK-52481] Add `Spark History Server` example #249

Uh oh!

dongjoon-hyun commented Jun 14, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun Jun 14, 2025

Uh oh!

dongjoon-hyun commented Jun 14, 2025

Uh oh!

melin commented Jun 15, 2025

Uh oh!

dongjoon-hyun commented Jun 16, 2025 •

edited

Loading

Uh oh!

melin commented Jun 17, 2025

Uh oh!

dongjoon-hyun commented Jun 17, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun commented Jun 17, 2025

Uh oh!

Ferdinanddb commented Jun 30, 2025

Uh oh!

dongjoon-hyun commented Jun 30, 2025

Uh oh!

dongjoon-hyun commented Jun 30, 2025

Uh oh!

Ferdinanddb commented Jun 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-52481] Add Spark History Server example #249

[SPARK-52481] Add Spark History Server example #249

Uh oh!

Conversation

dongjoon-hyun commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun Jun 14, 2025

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jun 14, 2025

Uh oh!

melin commented Jun 15, 2025

Uh oh!

dongjoon-hyun commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

melin commented Jun 17, 2025

Uh oh!

dongjoon-hyun commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Jun 17, 2025

Uh oh!

Ferdinanddb commented Jun 30, 2025

Uh oh!

dongjoon-hyun commented Jun 30, 2025

Uh oh!

dongjoon-hyun commented Jun 30, 2025

Uh oh!

Ferdinanddb commented Jun 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-52481] Add `Spark History Server` example #249

[SPARK-52481] Add `Spark History Server` example #249

dongjoon-hyun commented Jun 14, 2025 •

edited

Loading

dongjoon-hyun commented Jun 16, 2025 •

edited

Loading

dongjoon-hyun commented Jun 17, 2025 •

edited

Loading