Skip to content

Conversation

@dcoliversun
Copy link
Contributor

@dcoliversun dcoliversun commented Oct 6, 2022

What changes were proposed in this pull request?

This PR aims to supplement missing spark configurations in org.apache.spark.internal.config in configuration.md.

Why are the changes needed?

Help users to confirm configuration through documentation instead of code.

Does this PR introduce any user-facing change?

Yes, more configurations in documentation.

How was this patch tested?

Pass the GitHub Actions.

@dcoliversun dcoliversun marked this pull request as draft October 6, 2022 13:05
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private[spark] val EVENT_LOG_GC_METRICS_YOUNG_GENERATION_GARBAGE_COLLECTORS =
ConfigBuilder("spark.eventLog.gcMetrics.youngGenerationGarbageCollectors")
.doc("Names of supported young generation garbage collector. A name usually is " +
" the return of GarbageCollectorMXBean.getName. The built-in young generation garbage " +
s"collectors are ${GarbageCollectionMetrics.YOUNG_GENERATION_BUILTIN_GARBAGE_COLLECTORS}")
.version("3.0.0")
.stringConf
.toSequence
.createWithDefault(GarbageCollectionMetrics.YOUNG_GENERATION_BUILTIN_GARBAGE_COLLECTORS)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private[spark] val EVENT_LOG_GC_METRICS_OLD_GENERATION_GARBAGE_COLLECTORS =
ConfigBuilder("spark.eventLog.gcMetrics.oldGenerationGarbageCollectors")
.doc("Names of supported old generation garbage collector. A name usually is " +
"the return of GarbageCollectorMXBean.getName. The built-in old generation garbage " +
s"collectors are ${GarbageCollectionMetrics.OLD_GENERATION_BUILTIN_GARBAGE_COLLECTORS}")
.version("3.0.0")
.stringConf
.toSequence
.createWithDefault(GarbageCollectionMetrics.OLD_GENERATION_BUILTIN_GARBAGE_COLLECTORS)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private[spark] val EXECUTOR_ALLOW_SPARK_CONTEXT =
ConfigBuilder("spark.executor.allowSparkContext")
.doc("If set to true, SparkContext can be created in executors.")
.version("3.0.1")
.booleanConf
.createWithDefault(false)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I guess we don't want to expose this, @dcoliversun . There is no good for users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Better to mark as internal configuration? If so, I will make new PR to solve it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private[spark] val DECOMMISSION_ENABLED =
ConfigBuilder("spark.decommission.enabled")
.doc("When decommission enabled, Spark will try its best to shutdown the executor " +
s"gracefully. Spark will try to migrate all the RDD blocks (controlled by " +
s"${STORAGE_DECOMMISSION_RDD_BLOCKS_ENABLED.key}) and shuffle blocks (controlled by " +
s"${STORAGE_DECOMMISSION_SHUFFLE_BLOCKS_ENABLED.key}) from the decommissioning " +
s"executor to a remote executor when ${STORAGE_DECOMMISSION_ENABLED.key} is enabled. " +
s"With decommission enabled, Spark will also decommission an executor instead of " +
s"killing when ${DYN_ALLOCATION_ENABLED.key} enabled.")
.version("3.1.0")
.booleanConf
.createWithDefault(false)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't realize that this is still undocumented. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private[spark] val EXECUTOR_DECOMMISSION_KILL_INTERVAL =
ConfigBuilder("spark.executor.decommission.killInterval")
.doc("Duration after which a decommissioned executor will be killed forcefully " +
"*by an outside* (e.g. non-spark) service. " +
"This config is useful for cloud environments where we know in advance when " +
"an executor is going to go down after decommissioning signal i.e. around 2 mins " +
"in aws spot nodes, 1/2 hrs in spot block nodes etc. This config is currently " +
"used to decide what tasks running on decommission executors to speculate.")
.version("3.1.0")
.timeConf(TimeUnit.SECONDS)
.createOptional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private[spark] val EXECUTOR_DECOMMISSION_FORCE_KILL_TIMEOUT =
ConfigBuilder("spark.executor.decommission.forceKillTimeout")
.doc("Duration after which a Spark will force a decommissioning executor to exit." +
" this should be set to a high value in most situations as low values will prevent " +
" block migrations from having enough time to complete.")
.version("3.2.0")
.timeConf(TimeUnit.SECONDS)
.createOptional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private[spark] val EXECUTOR_DECOMMISSION_SIGNAL =
ConfigBuilder("spark.executor.decommission.signal")
.doc("The signal that used to trigger the executor to start decommission.")
.version("3.2.0")
.stringConf
.createWithDefaultString("PWR")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private[spark] val STORAGE_DECOMMISSION_FALLBACK_STORAGE_CLEANUP =
ConfigBuilder("spark.storage.decommission.fallbackStorage.cleanUp")
.doc("If true, Spark cleans up its fallback storage data during shutting down.")
.version("3.2.0")
.booleanConf
.createWithDefault(false)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private[spark] val STORAGE_DECOMMISSION_SHUFFLE_MAX_DISK_SIZE =
ConfigBuilder("spark.storage.decommission.shuffleBlocks.maxDiskSize")
.doc("Maximum disk space to use to store shuffle blocks before rejecting remote " +
"shuffle blocks. Rejecting remote shuffle blocks means that an executor will not receive " +
"any shuffle migrations, and if there are no other executors available for migration " +
"then shuffle blocks will be lost unless " +
s"${STORAGE_DECOMMISSION_FALLBACK_STORAGE_PATH.key} is configured.")
.version("3.2.0")
.bytesConf(ByteUnit.BYTE)
.createOptional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private[spark] val STANDALONE_SUBMIT_WAIT_APP_COMPLETION =
ConfigBuilder("spark.standalone.submit.waitAppCompletion")
.doc("In standalone cluster mode, controls whether the client waits to exit until the " +
"application completes. If set to true, the client process will stay alive polling " +
"the driver's status. Otherwise, the client process will exit after submission.")
.version("3.1.0")
.booleanConf
.createWithDefault(false)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private[spark] val SHUFFLE_NUM_PUSH_THREADS =
ConfigBuilder("spark.shuffle.push.numPushThreads")
.doc("Specify the number of threads in the block pusher pool. These threads assist " +
"in creating connections and pushing blocks to remote external shuffle services. By" +
" default, the threadpool size is equal to the number of spark executor cores.")
.version("3.2.0")
.intConf
.createOptional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private[spark] val PUSH_BASED_SHUFFLE_MERGE_FINALIZE_THREADS =
ConfigBuilder("spark.shuffle.push.merge.finalizeThreads")
.doc("Number of threads used by driver to finalize shuffle merge. Since it could" +
" potentially take seconds for a large shuffle to finalize, having multiple threads helps" +
" driver to handle concurrent shuffle merge finalize requests when push-based" +
" shuffle is enabled.")
.version("3.3.0")
.intConf
.createWithDefault(8)

@dcoliversun dcoliversun changed the title [SPARK-40675][DOCS] Add missing spark configuration to configuration.md [SPARK-40675][DOCS] Add missing spark configuration to configuration.md (part 1) Oct 6, 2022
@dcoliversun dcoliversun changed the title [SPARK-40675][DOCS] Add missing spark configuration to configuration.md (part 1) [SPARK-40675][DOCS] Supplement missing spark configurations in configuration.md (part 1) Oct 6, 2022
@dcoliversun dcoliversun marked this pull request as ready for review October 6, 2022 15:42
@dcoliversun
Copy link
Contributor Author

cc @HyukjinKwon @dongjoon-hyun
It would be nice if you have a time to review this PR :)

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this, @dcoliversun .

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a quick look, LGTM. Thanks for working on this.

@github-actions github-actions bot added the DOCS label Oct 7, 2022
@dcoliversun dcoliversun changed the title [SPARK-40675][DOCS] Supplement missing spark configurations in configuration.md (part 1) [SPARK-40675][DOCS] Supplement undocumented spark configurations in configuration.md Oct 8, 2022
@srowen srowen closed this in cd7ca92 Oct 9, 2022
@dcoliversun
Copy link
Contributor Author

Thanks for your help @srowen @HyukjinKwon @dongjoon-hyun

@dcoliversun dcoliversun deleted the SPARK-40675 branch October 10, 2022 01:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants