[SPARK-35360][SQL] RepairTableCommand respects spark.sql.addPartitionInBatch.size too

AngersZhuuuu · MaxGekk · commit 7182f8cece0b · 2021-05-10T14:53:31.000+05:00
### What changes were proposed in this pull request? RepairTableCommand respects `spark.sql.addPartitionInBatch.size` too ### Why are the changes needed? Make RepairTableCommand add partition batch size configurable. ### Does this PR introduce _any_ user-facing change? User can use `spark.sql.addPartitionInBatch.size` to change batch size when repair table. ### How was this patch tested? Not need Closes #32489 from AngersZhuuuu/SPARK-35360. Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -2894,8 +2894,9 @@ object SQLConf {
     buildConf("spark.sql.addPartitionInBatch.size")
       .internal()
       .doc("The number of partitions to be handled in one turn when use " +
-        "`AlterTableAddPartitionCommand` to add partitions into table. The smaller " +
-        "batch size is, the less memory is required for the real handler, e.g. Hive Metastore.")
+        "`AlterTableAddPartitionCommand` or `RepairTableCommand` to add partitions into table. " +
+        "The smaller batch size is, the less memory is required for the real handler, e.g. " +
+        "Hive Metastore.")
       .version("3.0.0")
       .intConf
       .checkValue(_ > 0, "The value of spark.sql.addPartitionInBatch.size must be positive")
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
@@ -771,7 +771,7 @@ case class RepairTableCommand(
     // Hive metastore may not have enough memory to handle millions of partitions in single RPC,
     // we should split them into smaller batches. Since Hive client is not thread safe, we cannot
     // do this in parallel.
-    val batchSize = 100
+    val batchSize = spark.conf.get(SQLConf.ADD_PARTITION_BATCH_SIZE)
     partitionSpecsAndLocs.toIterator.grouped(batchSize).foreach { batch =>
       val now = MILLISECONDS.toSeconds(System.currentTimeMillis())
       val parts = batch.map { case (spec, location) =>