[SPARK-40675][DOCS] Supplement undocumented spark configurations in `configuration.md` #38131

dcoliversun · 2022-10-06T13:05:44Z

What changes were proposed in this pull request?

This PR aims to supplement missing spark configurations in org.apache.spark.internal.config in configuration.md.

Why are the changes needed?

Help users to confirm configuration through documentation instead of code.

Does this PR introduce any user-facing change?

Yes, more configurations in documentation.

How was this patch tested?

Pass the GitHub Actions.

dcoliversun · 2022-10-06T13:12:04Z

docs/configuration.md

spark/core/src/main/scala/org/apache/spark/internal/config/package.scala

Lines 195 to 203 in 2248316

private[spark] val EVENT_LOG_GC_METRICS_YOUNG_GENERATION_GARBAGE_COLLECTORS =

ConfigBuilder("spark.eventLog.gcMetrics.youngGenerationGarbageCollectors")

.doc("Names of supported young generation garbage collector. A name usually is " +

" the return of GarbageCollectorMXBean.getName. The built-in young generation garbage " +

s"collectors are ${GarbageCollectionMetrics.YOUNG_GENERATION_BUILTIN_GARBAGE_COLLECTORS}")

.version("3.0.0")

.stringConf

.toSequence

.createWithDefault(GarbageCollectionMetrics.YOUNG_GENERATION_BUILTIN_GARBAGE_COLLECTORS)

dcoliversun · 2022-10-06T13:12:31Z

docs/configuration.md

spark/core/src/main/scala/org/apache/spark/internal/config/package.scala

Lines 205 to 213 in 2248316

private[spark] val EVENT_LOG_GC_METRICS_OLD_GENERATION_GARBAGE_COLLECTORS =

ConfigBuilder("spark.eventLog.gcMetrics.oldGenerationGarbageCollectors")

.doc("Names of supported old generation garbage collector. A name usually is " +

"the return of GarbageCollectorMXBean.getName. The built-in old generation garbage " +

s"collectors are ${GarbageCollectionMetrics.OLD_GENERATION_BUILTIN_GARBAGE_COLLECTORS}")

.version("3.0.0")

.stringConf

.toSequence

.createWithDefault(GarbageCollectionMetrics.OLD_GENERATION_BUILTIN_GARBAGE_COLLECTORS)

dcoliversun · 2022-10-06T15:18:15Z

docs/configuration.md

spark/core/src/main/scala/org/apache/spark/internal/config/package.scala

Lines 2206 to 2211 in 2248316

private[spark] val EXECUTOR_ALLOW_SPARK_CONTEXT =

ConfigBuilder("spark.executor.allowSparkContext")

.doc("If set to true, SparkContext can be created in executors.")

.version("3.0.1")

.booleanConf

.createWithDefault(false)

BTW, I guess we don't want to expose this, @dcoliversun . There is no good for users.

OK. Better to mark as internal configuration? If so, I will make new PR to solve it.

dcoliversun · 2022-10-06T15:19:10Z

docs/configuration.md

spark/core/src/main/scala/org/apache/spark/internal/config/package.scala

Lines 2133 to 2144 in 2248316

private[spark] val DECOMMISSION_ENABLED =

ConfigBuilder("spark.decommission.enabled")

.doc("When decommission enabled, Spark will try its best to shutdown the executor " +

s"gracefully. Spark will try to migrate all the RDD blocks (controlled by " +

s"${STORAGE_DECOMMISSION_RDD_BLOCKS_ENABLED.key}) and shuffle blocks (controlled by " +

s"${STORAGE_DECOMMISSION_SHUFFLE_BLOCKS_ENABLED.key}) from the decommissioning " +

s"executor to a remote executor when ${STORAGE_DECOMMISSION_ENABLED.key} is enabled. " +

s"With decommission enabled, Spark will also decommission an executor instead of " +

s"killing when ${DYN_ALLOCATION_ENABLED.key} enabled.")

.version("3.1.0")

.booleanConf

.createWithDefault(false)

Oh, I didn't realize that this is still undocumented. Thanks.

dcoliversun · 2022-10-06T15:19:36Z

docs/configuration.md

spark/core/src/main/scala/org/apache/spark/internal/config/package.scala

Lines 2146 to 2156 in 2248316

private[spark] val EXECUTOR_DECOMMISSION_KILL_INTERVAL =

ConfigBuilder("spark.executor.decommission.killInterval")

.doc("Duration after which a decommissioned executor will be killed forcefully " +

"*by an outside* (e.g. non-spark) service. " +

"This config is useful for cloud environments where we know in advance when " +

"an executor is going to go down after decommissioning signal i.e. around 2 mins " +

"in aws spot nodes, 1/2 hrs in spot block nodes etc. This config is currently " +

"used to decide what tasks running on decommission executors to speculate.")

.version("3.1.0")

.timeConf(TimeUnit.SECONDS)

.createOptional

dcoliversun · 2022-10-06T15:19:59Z

docs/configuration.md

spark/core/src/main/scala/org/apache/spark/internal/config/package.scala

Lines 2158 to 2165 in 2248316

private[spark] val EXECUTOR_DECOMMISSION_FORCE_KILL_TIMEOUT =

ConfigBuilder("spark.executor.decommission.forceKillTimeout")

.doc("Duration after which a Spark will force a decommissioning executor to exit." +

" this should be set to a high value in most situations as low values will prevent " +

" block migrations from having enough time to complete.")

.version("3.2.0")

.timeConf(TimeUnit.SECONDS)

.createOptional

dcoliversun · 2022-10-06T15:20:19Z

docs/configuration.md

spark/core/src/main/scala/org/apache/spark/internal/config/package.scala

Lines 2167 to 2172 in 2248316

private[spark] val EXECUTOR_DECOMMISSION_SIGNAL =

ConfigBuilder("spark.executor.decommission.signal")

.doc("The signal that used to trigger the executor to start decommission.")

.version("3.2.0")

.stringConf

.createWithDefaultString("PWR")

dcoliversun · 2022-10-06T15:29:59Z

docs/configuration.md

spark/core/src/main/scala/org/apache/spark/internal/config/package.scala

Lines 512 to 517 in 2248316

private[spark] val STORAGE_DECOMMISSION_FALLBACK_STORAGE_CLEANUP =

ConfigBuilder("spark.storage.decommission.fallbackStorage.cleanUp")

.doc("If true, Spark cleans up its fallback storage data during shutting down.")

.version("3.2.0")

.booleanConf

.createWithDefault(false)

dcoliversun · 2022-10-06T15:30:19Z

docs/configuration.md

spark/core/src/main/scala/org/apache/spark/internal/config/package.scala

Lines 519 to 528 in 2248316

private[spark] val STORAGE_DECOMMISSION_SHUFFLE_MAX_DISK_SIZE =

ConfigBuilder("spark.storage.decommission.shuffleBlocks.maxDiskSize")

.doc("Maximum disk space to use to store shuffle blocks before rejecting remote " +

"shuffle blocks. Rejecting remote shuffle blocks means that an executor will not receive " +

"any shuffle migrations, and if there are no other executors available for migration " +

"then shuffle blocks will be lost unless " +

s"${STORAGE_DECOMMISSION_FALLBACK_STORAGE_PATH.key} is configured.")

.version("3.2.0")

.bytesConf(ByteUnit.BYTE)

.createOptional

dcoliversun · 2022-10-06T15:30:42Z

docs/configuration.md

spark/core/src/main/scala/org/apache/spark/internal/config/package.scala

Lines 2197 to 2204 in 2248316

private[spark] val STANDALONE_SUBMIT_WAIT_APP_COMPLETION =

ConfigBuilder("spark.standalone.submit.waitAppCompletion")

.doc("In standalone cluster mode, controls whether the client waits to exit until the " +

"application completes. If set to true, the client process will stay alive polling " +

"the driver's status. Otherwise, the client process will exit after submission.")

.version("3.1.0")

.booleanConf

.createWithDefault(false)

dcoliversun · 2022-10-06T15:31:12Z

docs/configuration.md

spark/core/src/main/scala/org/apache/spark/internal/config/package.scala

Lines 2301 to 2308 in 2248316

private[spark] val SHUFFLE_NUM_PUSH_THREADS =

ConfigBuilder("spark.shuffle.push.numPushThreads")

.doc("Specify the number of threads in the block pusher pool. These threads assist " +

"in creating connections and pushing blocks to remote external shuffle services. By" +

" default, the threadpool size is equal to the number of spark executor cores.")

.version("3.2.0")

.intConf

.createOptional

dcoliversun · 2022-10-06T15:31:36Z

docs/configuration.md

spark/core/src/main/scala/org/apache/spark/internal/config/package.scala

Lines 2330 to 2338 in 2248316

private[spark] val PUSH_BASED_SHUFFLE_MERGE_FINALIZE_THREADS =

ConfigBuilder("spark.shuffle.push.merge.finalizeThreads")

.doc("Number of threads used by driver to finalize shuffle merge. Since it could" +

" potentially take seconds for a large shuffle to finalize, having multiple threads helps" +

" driver to handle concurrent shuffle merge finalize requests when push-based" +

" shuffle is enabled.")

.version("3.3.0")

.intConf

.createWithDefault(8)

dcoliversun · 2022-10-06T15:46:44Z

cc @HyukjinKwon @dongjoon-hyun
It would be nice if you have a time to review this PR :)

dongjoon-hyun

Thank you for working on this, @dcoliversun .

AmplabJenkins · 2022-10-06T21:37:24Z

Can one of the admins verify this patch?

HyukjinKwon

Took a quick look, LGTM. Thanks for working on this.

dcoliversun · 2022-10-10T01:25:06Z

Thanks for your help @srowen @HyukjinKwon @dongjoon-hyun

dcoliversun marked this pull request as draft October 6, 2022 13:05

dcoliversun commented Oct 6, 2022

View reviewed changes

dcoliversun force-pushed the SPARK-40675 branch from 5802516 to 3a5bee0 Compare October 6, 2022 15:17

dcoliversun commented Oct 6, 2022

View reviewed changes

dcoliversun changed the title ~~[SPARK-40675][DOCS] Add missing spark configuration to configuration.md~~ [SPARK-40675][DOCS] Add missing spark configuration to configuration.md (part 1) Oct 6, 2022

dcoliversun changed the title ~~[SPARK-40675][DOCS] Add missing spark configuration to configuration.md (part 1)~~ [SPARK-40675][DOCS] Supplement missing spark configurations in configuration.md (part 1) Oct 6, 2022

dcoliversun marked this pull request as ready for review October 6, 2022 15:42

dongjoon-hyun reviewed Oct 6, 2022

View reviewed changes

dcoliversun requested a review from dongjoon-hyun October 6, 2022 23:18

HyukjinKwon approved these changes Oct 7, 2022

View reviewed changes

dcoliversun added 3 commits October 7, 2022 09:54

[SPARK-40675][DOCS] Add missing spark configuration to configuration.md

042445b

remove internal configuration from doc

99a647a

remove spark.plugins and spark.executor.allowSparkContext from doc

fc98e73

dcoliversun force-pushed the SPARK-40675 branch from c23d54c to fc98e73 Compare October 7, 2022 01:55

github-actions bot added the DOCS label Oct 7, 2022

dcoliversun changed the title ~~[SPARK-40675][DOCS] Supplement missing spark configurations in configuration.md (part 1)~~ [SPARK-40675][DOCS] Supplement undocumented spark configurations in configuration.md Oct 8, 2022

srowen closed this in cd7ca92 Oct 9, 2022

dcoliversun deleted the SPARK-40675 branch October 10, 2022 01:25

	private[spark] val EVENT_LOG_GC_METRICS_YOUNG_GENERATION_GARBAGE_COLLECTORS =
	ConfigBuilder("spark.eventLog.gcMetrics.youngGenerationGarbageCollectors")
	.doc("Names of supported young generation garbage collector. A name usually is " +
	" the return of GarbageCollectorMXBean.getName. The built-in young generation garbage " +
	s"collectors are ${GarbageCollectionMetrics.YOUNG_GENERATION_BUILTIN_GARBAGE_COLLECTORS}")
	.version("3.0.0")
	.stringConf
	.toSequence
	.createWithDefault(GarbageCollectionMetrics.YOUNG_GENERATION_BUILTIN_GARBAGE_COLLECTORS)

	private[spark] val EVENT_LOG_GC_METRICS_OLD_GENERATION_GARBAGE_COLLECTORS =
	ConfigBuilder("spark.eventLog.gcMetrics.oldGenerationGarbageCollectors")
	.doc("Names of supported old generation garbage collector. A name usually is " +
	"the return of GarbageCollectorMXBean.getName. The built-in old generation garbage " +
	s"collectors are ${GarbageCollectionMetrics.OLD_GENERATION_BUILTIN_GARBAGE_COLLECTORS}")
	.version("3.0.0")
	.stringConf
	.toSequence
	.createWithDefault(GarbageCollectionMetrics.OLD_GENERATION_BUILTIN_GARBAGE_COLLECTORS)

	private[spark] val EXECUTOR_ALLOW_SPARK_CONTEXT =
	ConfigBuilder("spark.executor.allowSparkContext")
	.doc("If set to true, SparkContext can be created in executors.")
	.version("3.0.1")
	.booleanConf
	.createWithDefault(false)

	private[spark] val DECOMMISSION_ENABLED =
	ConfigBuilder("spark.decommission.enabled")
	.doc("When decommission enabled, Spark will try its best to shutdown the executor " +
	s"gracefully. Spark will try to migrate all the RDD blocks (controlled by " +
	s"${STORAGE_DECOMMISSION_RDD_BLOCKS_ENABLED.key}) and shuffle blocks (controlled by " +
	s"${STORAGE_DECOMMISSION_SHUFFLE_BLOCKS_ENABLED.key}) from the decommissioning " +
	s"executor to a remote executor when ${STORAGE_DECOMMISSION_ENABLED.key} is enabled. " +
	s"With decommission enabled, Spark will also decommission an executor instead of " +
	s"killing when ${DYN_ALLOCATION_ENABLED.key} enabled.")
	.version("3.1.0")
	.booleanConf
	.createWithDefault(false)

	private[spark] val EXECUTOR_DECOMMISSION_KILL_INTERVAL =
	ConfigBuilder("spark.executor.decommission.killInterval")
	.doc("Duration after which a decommissioned executor will be killed forcefully " +
	"by an outside (e.g. non-spark) service. " +
	"This config is useful for cloud environments where we know in advance when " +
	"an executor is going to go down after decommissioning signal i.e. around 2 mins " +
	"in aws spot nodes, 1/2 hrs in spot block nodes etc. This config is currently " +
	"used to decide what tasks running on decommission executors to speculate.")
	.version("3.1.0")
	.timeConf(TimeUnit.SECONDS)
	.createOptional

	private[spark] val EXECUTOR_DECOMMISSION_FORCE_KILL_TIMEOUT =
	ConfigBuilder("spark.executor.decommission.forceKillTimeout")
	.doc("Duration after which a Spark will force a decommissioning executor to exit." +
	" this should be set to a high value in most situations as low values will prevent " +
	" block migrations from having enough time to complete.")
	.version("3.2.0")
	.timeConf(TimeUnit.SECONDS)
	.createOptional

	private[spark] val EXECUTOR_DECOMMISSION_SIGNAL =
	ConfigBuilder("spark.executor.decommission.signal")
	.doc("The signal that used to trigger the executor to start decommission.")
	.version("3.2.0")
	.stringConf
	.createWithDefaultString("PWR")

	private[spark] val STORAGE_DECOMMISSION_FALLBACK_STORAGE_CLEANUP =
	ConfigBuilder("spark.storage.decommission.fallbackStorage.cleanUp")
	.doc("If true, Spark cleans up its fallback storage data during shutting down.")
	.version("3.2.0")
	.booleanConf
	.createWithDefault(false)

	private[spark] val STORAGE_DECOMMISSION_SHUFFLE_MAX_DISK_SIZE =
	ConfigBuilder("spark.storage.decommission.shuffleBlocks.maxDiskSize")
	.doc("Maximum disk space to use to store shuffle blocks before rejecting remote " +
	"shuffle blocks. Rejecting remote shuffle blocks means that an executor will not receive " +
	"any shuffle migrations, and if there are no other executors available for migration " +
	"then shuffle blocks will be lost unless " +
	s"${STORAGE_DECOMMISSION_FALLBACK_STORAGE_PATH.key} is configured.")
	.version("3.2.0")
	.bytesConf(ByteUnit.BYTE)
	.createOptional

	private[spark] val STANDALONE_SUBMIT_WAIT_APP_COMPLETION =
	ConfigBuilder("spark.standalone.submit.waitAppCompletion")
	.doc("In standalone cluster mode, controls whether the client waits to exit until the " +
	"application completes. If set to true, the client process will stay alive polling " +
	"the driver's status. Otherwise, the client process will exit after submission.")
	.version("3.1.0")
	.booleanConf
	.createWithDefault(false)

	private[spark] val SHUFFLE_NUM_PUSH_THREADS =
	ConfigBuilder("spark.shuffle.push.numPushThreads")
	.doc("Specify the number of threads in the block pusher pool. These threads assist " +
	"in creating connections and pushing blocks to remote external shuffle services. By" +
	" default, the threadpool size is equal to the number of spark executor cores.")
	.version("3.2.0")
	.intConf
	.createOptional

	private[spark] val PUSH_BASED_SHUFFLE_MERGE_FINALIZE_THREADS =
	ConfigBuilder("spark.shuffle.push.merge.finalizeThreads")
	.doc("Number of threads used by driver to finalize shuffle merge. Since it could" +
	" potentially take seconds for a large shuffle to finalize, having multiple threads helps" +
	" driver to handle concurrent shuffle merge finalize requests when push-based" +
	" shuffle is enabled.")
	.version("3.3.0")
	.intConf
	.createWithDefault(8)

[SPARK-40675][DOCS] Supplement undocumented spark configurations in configuration.md #38131

[SPARK-40675][DOCS] Supplement undocumented spark configurations in configuration.md #38131

Uh oh!

Conversation

dcoliversun commented Oct 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dcoliversun commented Oct 6, 2022

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Oct 6, 2022

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

dcoliversun commented Oct 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-40675][DOCS] Supplement undocumented spark configurations in `configuration.md` #38131

[SPARK-40675][DOCS] Supplement undocumented spark configurations in `configuration.md` #38131

dcoliversun commented Oct 6, 2022 •

edited

Loading