Skip to content
Next Next commit
[SPARK-12546][SQL] Change default number of open parquet files
A common problem that users encounter with Spark 1.6.0 is that writing to a partitioned parquet table OOMs.  The root cause is that parquet allocates a significant amount of memory that is not accounted for by our own mechanisms.  As a workaround, we can ensure that only a single file is open per task unless the user explicitly asks for more.

Author: Michael Armbrust <[email protected]>

Closes apache#11308 from marmbrus/parquetWriteOOM.

(cherry picked from commit 173aa94)
Signed-off-by: Michael Armbrust <[email protected]>
  • Loading branch information
marmbrus committed Feb 22, 2016
commit 699644c692472e5b78baa56a1a6c44d8d174e70e
2 changes: 1 addition & 1 deletion sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala
Original file line number Diff line number Diff line change
Expand Up @@ -396,7 +396,7 @@ private[spark] object SQLConf {

val PARTITION_MAX_FILES =
intConf("spark.sql.sources.maxConcurrentWrites",
defaultValue = Some(5),
defaultValue = Some(1),
doc = "The maximum number of concurrent files to open before falling back on sorting when " +
"writing out files using dynamic partitioning.")

Expand Down