Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
[SPARK-16006][SQL] Attemping to write empty DataFrame with no fields …
…throw non-intuitive exception
  • Loading branch information
dongjoon-hyun committed Jun 28, 2016
commit c4458d46fa9a2859f9ef5111ce0b53234c19d7b1
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,9 @@ private[sql] object PartitioningUtils {
private val upCastingOrder: Seq[DataType] =
Seq(NullType, IntegerType, LongType, FloatType, DoubleType, StringType)

/**
* Validate partition columns for writing executions.
*/
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since validatePartitionColumn is used for only writing-related classes, I added this description for clarification.
We can update the function description if the usage pattern is changed in the future.

def validatePartitionColumn(
schema: StructType,
partitionColumns: Seq[String],
Expand All @@ -351,8 +354,10 @@ private[sql] object PartitioningUtils {
}
}

if (partitionColumns.size == schema.fields.size) {
throw new AnalysisException(s"Cannot use all columns for partition columns")
if (schema.fields.isEmpty) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its weird that a check like this is in the PartitioningUtils. This check seems nothing to do with partitioning, as its basically certain file formats that do not support writing a DF with no columns. Is there somewhere earlier where you can check this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review, @tdas.
Yes. Indeed. This is beyond of the scope of PartitioningUtils.
Actually, this logic is used 3 classes, PreWriteCheck., DataSource, FileStreamSinkWriter
I'll try to move this.

throw new AnalysisException("Cannot write dataset with no fields")
} else if (partitionColumns.size == schema.fields.length) {
throw new AnalysisException("Cannot use all columns for partition columns")
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -214,12 +214,18 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be
test("prevent all column partitioning") {
withTempDir { dir =>
val path = dir.getCanonicalPath
intercept[AnalysisException] {
var e = intercept[AnalysisException] {
spark.emptyDataFrame.write.format("text").mode("overwrite").save(path)
}
assert(e.getMessage.contains("Cannot write dataset with no fields"))
e = intercept[AnalysisException] {
spark.range(10).write.format("parquet").mode("overwrite").partitionBy("id").save(path)
}
intercept[AnalysisException] {
spark.range(10).write.format("orc").mode("overwrite").partitionBy("id").save(path)
assert(e.getMessage.contains("Cannot use all columns for partition columns"))
e = intercept[AnalysisException] {
spark.range(10).write.format("csv").mode("overwrite").partitionBy("id").save(path)
}
assert(e.getMessage.contains("Cannot use all columns for partition columns"))
}
}

Expand Down