Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Nov 2, 2023

What changes were proposed in this pull request?

This PR aims to enable spark.eventLog.rolling.enabled by default for Apache Spark 4.0.0.

Why are the changes needed?

Since Apache Spark 3.0.0, we have been using event log rolling not only for long-running jobs, but also for some failed jobs to archive the partial event logs incrementally.

Does this PR introduce any user-facing change?

  • No because spark.eventLog.enabled is disabled by default.
  • For the users with spark.eventLog.enabled=true, yes, spark-events directory will have different layouts. However, all 3.3+ Spark History Server can read both old and new event logs. I believe that the event log users are already using this configuration to avoid the loss of event logs for long-running jobs and some failed jobs.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the CORE label Nov 2, 2023
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-45771][CORE] Enable spark.eventLog.rolling.enabled by default [SPARK-45771][CORE] Enable spark.eventLog.rolling.enabled by default Nov 2, 2023
@github-actions github-actions bot added the DOCS label Nov 2, 2023
@dongjoon-hyun dongjoon-hyun marked this pull request as draft November 2, 2023 16:29
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is consistent with the existing function description.

* Get a SparkConf with event logging enabled. It doesn't enable rolling event logs, so caller

@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review November 2, 2023 17:14
@dongjoon-hyun
Copy link
Member Author

AppVeyor failure (SparkR) is irrelevant to this PR.

Could you review this PR when you have some time, @viirya?

buildWriterAndVerify(conf, classOf[SingleEventLogFileWriter])
buildWriterAndVerify(conf, classOf[RollingEventLogFilesWriter])

conf.set(EVENT_LOG_ENABLE_ROLLING, true)
Copy link
Member

@viirya viirya Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it redundant then?

Suggested change
conf.set(EVENT_LOG_ENABLE_ROLLING, true)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we want to:

Suggested change
conf.set(EVENT_LOG_ENABLE_ROLLING, true)
conf.set(EVENT_LOG_ENABLE_ROLLING, false)
buildWriterAndVerify(conf, classOf[SingleEventLogFileWriter])

buildWriterAndVerify(conf, classOf[RollingEventLogFilesWriter])

conf.set(EVENT_LOG_ENABLE_ROLLING, true)
buildWriterAndVerify(conf, classOf[RollingEventLogFilesWriter])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
buildWriterAndVerify(conf, classOf[RollingEventLogFilesWriter])

test("SPARK-31764: isBarrier should be logged in event log") {
val conf = new SparkConf()
conf.set(EVENT_LOG_ENABLED, true)
conf.set(EVENT_LOG_ENABLE_ROLLING, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it failed without setting to false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this test case tries to read the event log file.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Nov 2, 2023

Just for your confirmation. I keep the existing test structure.

    buildWriterAndVerify(conf, classOf[RollingEventLogFilesWriter])

    conf.set(EVENT_LOG_ENABLE_ROLLING, true)
    buildWriterAndVerify(conf, classOf[RollingEventLogFilesWriter])

    conf.set(EVENT_LOG_ENABLE_ROLLING, false)
    buildWriterAndVerify(conf, classOf[SingleEventLogFileWriter])

Do you mean to simplify like the following.

    buildWriterAndVerify(conf, classOf[RollingEventLogFilesWriter])

    conf.set(EVENT_LOG_ENABLE_ROLLING, false)
    buildWriterAndVerify(conf, classOf[SingleEventLogFileWriter])

@viirya
Copy link
Member

viirya commented Nov 2, 2023

Oh, got it, existing one looks good. From the diff, I cannot see it so I thought SingleEventLogFileWriter isn't tested.

@dongjoon-hyun
Copy link
Member Author

Thank you for your confirmation! Merged to master for Apache Spark 4.0.0.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-45771 branch November 2, 2023 21:07
szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Feb 7, 2024
### What changes were proposed in this pull request?

This PR aims to enable `spark.eventLog.rolling.enabled` by default for Apache Spark 4.0.0.

### Why are the changes needed?

Since Apache Spark 3.0.0, we have been using event log rolling not only for **long-running jobs**, but also for **some failed jobs** to archive the partial event logs incrementally.
- apache#25670

### Does this PR introduce _any_ user-facing change?

- No because `spark.eventLog.enabled` is disabled by default.
- For the users with `spark.eventLog.enabled=true`, yes, `spark-events` directory will have different layouts. However, all 3.3+ `Spark History Server` can read both old and new event logs. I believe that the event log users are already using this configuration to avoid the loss of event logs for long-running jobs and some failed jobs.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#43638 from dongjoon-hyun/SPARK-45771.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun added a commit to apache/spark-kubernetes-operator that referenced this pull request Jun 14, 2025
### What changes were proposed in this pull request?

Add `Spark History Server` example.

### Why are the changes needed?

Since Apache Spark 4.0, Spark rolls the event logs by default and compressed them by default.
- apache/spark#43638
- apache/spark#43036

However, we still need more configurations to allow SHS manages the event log directories. This PR aims to provide an example of `Spark History Server` with the configuration.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #249 from dongjoon-hyun/SPARK-52481.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun added a commit that referenced this pull request Jul 23, 2025
…in `History Server`

### What changes were proposed in this pull request?

This PR aims to support `On-Demand Log Loading` in `History Server` by looking up the **rolling event log locations** even Spark listing didn't finish to load the event log files.

```scala
val EVENT_LOG_ROLLING_ON_DEMAND_LOAD_ENABLED =
  ConfigBuilder("spark.history.fs.eventLog.rolling.onDemandLoadEnabled")
    .doc("Whether to look up rolling event log locations on demand manner before listing files.")
    .version("4.1.0")
    .booleanConf
    .createWithDefault(true)
```

Previously, Spark History Server will show `Application ... Not Found` page if a job is requested before scanning it even if the file exists in the correct location. So, this PR doesn't introduce any regressions because this aims to introduce a kind of fallback logic to improve error handling .

<img width="686" height="359" alt="Screenshot 2025-07-22 at 14 08 21" src="https://github.com/user-attachments/assets/fccb413c-5a57-4918-86c0-28ae81d54873" />

### Why are the changes needed?

Since Apache Spark 3.0, we have been using event log rolling not only for **long-running jobs**, but also for **some failed jobs** to archive the partial event logs incrementally.
- #25670

Since Apache Spark 4.0, event log rolling is enabled by default.
- #43638

On top of that, this PR aims to reduce storage cost at Apache Spark 4.1. By supporting `On-Demand Loading for rolled event logs`, we can use larger values for `spark.history.fs.update.interval` instead of the default `10s`. Although Spark History logs are consumed in various ways, It has a big benefit because most of successful Spark jobs's logs are not visited by the users.

### Does this PR introduce _any_ user-facing change?

No. This is a new feature.

### How was this patch tested?

Pass the CIs with newly added test case.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #51604 from dongjoon-hyun/SPARK-52914.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants