Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Sep 21, 2023

What changes were proposed in this pull request?

This PR aims to enable spark.eventLog.compress by default for Apache Spark 4.0.0.

Why are the changes needed?

  • To save the event log storage cost by compressing the logs with ZStandard codec by default

Does this PR introduce any user-facing change?

Although we added a migration guide, the old Spark history servers are able to read the compressed logs.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the DOCS label Sep 21, 2023
Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have done this long ago !

@dongjoon-hyun
Copy link
Member Author

Thank you for review and approval, @mridulm ! 😄

Copy link
Member

@yaooqinn yaooqinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one minor revison

@dongjoon-hyun
Copy link
Member Author

The last commit is a doc-only change and the previous commit passed core module unit tests.
Screenshot 2023-09-21 at 8 08 12 PM

Thank you all! Merged to master.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-45257 branch September 22, 2023 03:09
viirya pushed a commit to viirya/spark-1 that referenced this pull request Oct 19, 2023
### What changes were proposed in this pull request?

This PR aims to enable `spark.eventLog.compress` by default for Apache Spark 4.0.0.

### Why are the changes needed?

- To save the event log storage cost by compressing the logs with ZStandard codec by default

### Does this PR introduce _any_ user-facing change?

Although we added a migration guide, the old Spark history servers are able to read the compressed logs.

### How was this patch tested?

 Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#43036 from dongjoon-hyun/SPARK-45257.

Lead-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun added a commit to apache/spark-kubernetes-operator that referenced this pull request Jun 14, 2025
### What changes were proposed in this pull request?

Add `Spark History Server` example.

### Why are the changes needed?

Since Apache Spark 4.0, Spark rolls the event logs by default and compressed them by default.
- apache/spark#43638
- apache/spark#43036

However, we still need more configurations to allow SHS manages the event log directories. This PR aims to provide an example of `Spark History Server` with the configuration.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #249 from dongjoon-hyun/SPARK-52481.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants