Skip to content

Commit 7182f8c

Browse files
AngersZhuuuuMaxGekk
authored andcommitted
[SPARK-35360][SQL] RepairTableCommand respects spark.sql.addPartitionInBatch.size too
### What changes were proposed in this pull request? RepairTableCommand respects `spark.sql.addPartitionInBatch.size` too ### Why are the changes needed? Make RepairTableCommand add partition batch size configurable. ### Does this PR introduce _any_ user-facing change? User can use `spark.sql.addPartitionInBatch.size` to change batch size when repair table. ### How was this patch tested? Not need Closes #32489 from AngersZhuuuu/SPARK-35360. Authored-by: Angerszhuuuu <[email protected]> Signed-off-by: Max Gekk <[email protected]>
1 parent d808956 commit 7182f8c

File tree

2 files changed

+4
-3
lines changed
  • sql
    • catalyst/src/main/scala/org/apache/spark/sql/internal
    • core/src/main/scala/org/apache/spark/sql/execution/command

2 files changed

+4
-3
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2894,8 +2894,9 @@ object SQLConf {
28942894
buildConf("spark.sql.addPartitionInBatch.size")
28952895
.internal()
28962896
.doc("The number of partitions to be handled in one turn when use " +
2897-
"`AlterTableAddPartitionCommand` to add partitions into table. The smaller " +
2898-
"batch size is, the less memory is required for the real handler, e.g. Hive Metastore.")
2897+
"`AlterTableAddPartitionCommand` or `RepairTableCommand` to add partitions into table. " +
2898+
"The smaller batch size is, the less memory is required for the real handler, e.g. " +
2899+
"Hive Metastore.")
28992900
.version("3.0.0")
29002901
.intConf
29012902
.checkValue(_ > 0, "The value of spark.sql.addPartitionInBatch.size must be positive")

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -771,7 +771,7 @@ case class RepairTableCommand(
771771
// Hive metastore may not have enough memory to handle millions of partitions in single RPC,
772772
// we should split them into smaller batches. Since Hive client is not thread safe, we cannot
773773
// do this in parallel.
774-
val batchSize = 100
774+
val batchSize = spark.conf.get(SQLConf.ADD_PARTITION_BATCH_SIZE)
775775
partitionSpecsAndLocs.toIterator.grouped(batchSize).foreach { batch =>
776776
val now = MILLISECONDS.toSeconds(System.currentTimeMillis())
777777
val parts = batch.map { case (spec, location) =>

0 commit comments

Comments
 (0)