[SPARK-48861][SQL] Enable shuffle file removal/skipMigration for all SQL executions #47360
+42
−48
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR follows #45930 and #46302 (which is open as of the creation of this PR) to enable shuffle file cleanup for all SQL executions, not just Spark Connect.
The prior PR #45930 introduces two new configs:
spark.sql.shuffleDependency.skipMigration.enabledandspark.sql.shuffleDependency.fileCleanup.enabled. These two configs are not specifically namespaced to Spark Connect and I'd like to make sure we can use them from all QueryExecutions. Before this PR, only Spark Connect could enable it.My change is to move the check for
shuffleCleanupModeinside ofQueryExecution, instead of having that be passed to this class in the constructor. I also am explicitly turning on these features in the tests, rather than usingUtils.isTesting.I would love to hear any concerns on why we shouldn't do this or what testing you want to see. I have run Standalone tests (note I needed #46302) and can run other tests if required or can code them.