-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-52426][CORE] Support redirecting stdout/stderr to logging system #51130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| "`org.apache.spark.deploy.ConsoleRedirectPlugin`.") | ||
| .version("4.1.0") | ||
| .booleanConf | ||
| .createWithDefault(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .createWithDefault(false) | |
| .createWithDefault(true) |
| "`org.apache.spark.deploy.ConsoleRedirectPlugin`.") | ||
| .version("4.1.0") | ||
| .booleanConf | ||
| .createWithDefault(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .createWithDefault(false) | |
| .createWithDefault(true) |
| "`org.apache.spark.deploy.ConsoleRedirectPlugin`.") | ||
| .version("4.1.0") | ||
| .booleanConf | ||
| .createWithDefault(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .createWithDefault(false) | |
| .createWithDefault(true) |
| "`org.apache.spark.deploy.ConsoleRedirectPlugin`.") | ||
| .version("4.1.0") | ||
| .booleanConf | ||
| .createWithDefault(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .createWithDefault(false) | |
| .createWithDefault(true) |
|
Shall we simplify the implementation by:
|
|
@yaooqinn thanks for reviewing, made changes based on your suggestion:
|
core/src/main/scala/org/apache/spark/deploy/RedirectConsolePlugin.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/deploy/RedirectConsolePlugin.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/deploy/RedirectConsolePlugin.scala
Outdated
Show resolved
Hide resolved
|
Can we also have a unit test or manual test with the console progress bar which supports inline updates for stage professes instead of printing a wall of logs |
| // Delay to show up a progress bar, in milliseconds | ||
| private val firstDelayMSec = 500L | ||
| // Get the stderr (which is console for spark-shell) before installing RedirectConsolePlugin | ||
| private val console = System.err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yaooqinn then it does not affect shell console progress bar, I also pasted the manual test result in PR description.
| if (sys.env.get("SPARK_CONNECT_MODE").contains("1")) "connect" else "classic") | ||
|
|
||
| private[spark] val DRIVER_REDIRECT_CONSOLE_TO_LOG_ENABLED = | ||
| ConfigBuilder("spark.driver.log.redirectConsole.enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, we can still configure separately for stderr/stdout if we have a config intaking a subset of {stderr, stdout}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, refactored the config to accept a list of console output kinds
spark.driver.log.redirectConsoleOutputs
| .transform(_.toLowerCase(Locale.ROOT)) | ||
| .toSequence | ||
| .checkValue(v => v.forall(Set("stdout", "stderr").contains), | ||
| "The value only can be one or more of 'stdout, stderr'.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the user-facing error message would be like
org.apache.spark.SparkIllegalArgumentException: [INVALID_CONF_VALUE.REQUIREMENT] The value 'stdin' in the config "spark.driver.log.redirectConsoleOutputs" is invalid. The value only can be one or more of 'stdout, stderr'. SQLSTATE: 22022
|
Merged to master. thank you @pan3793 |
…oleOutputs` ### What changes were proposed in this pull request? This PR aims to rename `spark.executor.log.redirectConsoleOutputs` to `spark.executor.logs.redirectConsoleOutputs` to be consistent with the existing executor log configurations like the following. ``` spark.executor.logs.rolling.strategy spark.executor.logs.rolling.time.interval spark.executor.logs.rolling.maxSize spark.executor.logs.rolling.maxRetainedFiles spark.executor.logs.rolling.enableCompression spark.executor.logs.redirectConsoleOutputs ``` ### Why are the changes needed? Although SPARK-52426 introduced two configurations consistently in the PR, Apache Spark has different namespaces for Driver and Executor which are `spark.driver.log.*` and `spark.executor.logs.*` respectively. Instead of introducing a new config namespace, `spark.executor.log`, for this single new configuration, we had better follow the existing naming rule. - #51130 ``` spark.driver.log.redirectConsoleOutputs spark.executor.log.redirectConsoleOutputs ``` ### Does this PR introduce _any_ user-facing change? No this is not released yet. ### How was this patch tested? Pass the CIs. Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52624 from dongjoon-hyun/SPARK-53923. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? This PR aims to document newly added `core` module configurations as a part of Apache Spark 4.1.0 preparation. ### Why are the changes needed? To help the users use new features easily. - #47856 - #51130 - #51163 - #51604 - #51630 - #51708 - #51885 - #52091 - #52382 ### Does this PR introduce _any_ user-facing change? No behavior change because this is a documentation update. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52626 from dongjoon-hyun/SPARK-53926. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…oleOutputs` ### What changes were proposed in this pull request? This PR aims to rename `spark.executor.log.redirectConsoleOutputs` to `spark.executor.logs.redirectConsoleOutputs` to be consistent with the existing executor log configurations like the following. ``` spark.executor.logs.rolling.strategy spark.executor.logs.rolling.time.interval spark.executor.logs.rolling.maxSize spark.executor.logs.rolling.maxRetainedFiles spark.executor.logs.rolling.enableCompression spark.executor.logs.redirectConsoleOutputs ``` ### Why are the changes needed? Although SPARK-52426 introduced two configurations consistently in the PR, Apache Spark has different namespaces for Driver and Executor which are `spark.driver.log.*` and `spark.executor.logs.*` respectively. Instead of introducing a new config namespace, `spark.executor.log`, for this single new configuration, we had better follow the existing naming rule. - apache#51130 ``` spark.driver.log.redirectConsoleOutputs spark.executor.log.redirectConsoleOutputs ``` ### Does this PR introduce _any_ user-facing change? No this is not released yet. ### How was this patch tested? Pass the CIs. Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52624 from dongjoon-hyun/SPARK-53923. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? This PR aims to document newly added `core` module configurations as a part of Apache Spark 4.1.0 preparation. ### Why are the changes needed? To help the users use new features easily. - apache#47856 - apache#51130 - apache#51163 - apache#51604 - apache#51630 - apache#51708 - apache#51885 - apache#52091 - apache#52382 ### Does this PR introduce _any_ user-facing change? No behavior change because this is a documentation update. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52626 from dongjoon-hyun/SPARK-53926. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Though directly printing messages to stdout/stderr is discouraged, but as a framework, Spark does not forbid users from doing that.
This PR adds a RedirectConsolePlugin to allow Spark to redirect stdout/stderr to the logging system (Log4J2 via Slf4J API).
Apache Flink also has the same capability, see FLINK-31234
Why are the changes needed?
This is especially useful for integrating with external logging services, for example, use Kafka Log4J appender to send logs to Kafka, then drain to ElasticSearch.
Does this PR introduce any user-facing change?
Yes, new feature.
How was this patch tested?
I don't write UT as it will affect other cases run in the same JVM, I test it manually.
Console progress bar (without console redirection)
Console progress bar (with console redirection)
Was this patch authored or co-authored using generative AI tooling?
No.