-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16006][SQL] Attemping to write empty DataFrame with no fields throws non-intuitive exception #13730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since validatePartitionColumn is used for only writing-related classes, I added this description for clarification.
We can update the function description if the usage pattern is changed in the future.
|
Hi, @tdas . |
|
Test build #60686 has finished for PR 13730 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its weird that a check like this is in the PartitioningUtils. This check seems nothing to do with partitioning, as its basically certain file formats that do not support writing a DF with no columns. Is there somewhere earlier where you can check this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for review, @tdas.
Yes. Indeed. This is beyond of the scope of PartitioningUtils.
Actually, this logic is used 3 classes, PreWriteCheck., DataSource, FileStreamSinkWriter
I'll try to move this.
|
Hi, @tdas . (Anyway, I will update this PR for further discussion) |
|
Test build #60704 has finished for PR 13730 at commit
|
|
Hi, @tdas . |
|
Hi, @rxin . |
|
Oh, sorry. The master was changed. |
|
I will recheck this PR again. |
|
Yep. The case still exists for |
|
Hi, @tdas . |
|
Test build #61014 has finished for PR 13730 at commit
|
|
Hi, @tdas . |
|
Rebased. |
|
Test build #61183 has finished for PR 13730 at commit
|
|
Test build #61254 has finished for PR 13730 at commit
|
|
Ping @tdas |
…throw non-intuitive exception
|
Test build #61391 has started for PR 13730 at commit |
|
Retest this please. |
|
Test build #61397 has finished for PR 13730 at commit
|
|
Hi, @tdas . |
|
Thanks - merging in master/2.0. |
…throws non-intuitive exception
## What changes were proposed in this pull request?
This PR allows `emptyDataFrame.write` since the user didn't specify any partition columns.
**Before**
```scala
scala> spark.emptyDataFrame.write.parquet("/tmp/t1")
org.apache.spark.sql.AnalysisException: Cannot use all columns for partition columns;
scala> spark.emptyDataFrame.write.csv("/tmp/t1")
org.apache.spark.sql.AnalysisException: Cannot use all columns for partition columns;
```
After this PR, there occurs no exceptions and the created directory has only one file, `_SUCCESS`, as expected.
## How was this patch tested?
Pass the Jenkins tests including updated test cases.
Author: Dongjoon Hyun <[email protected]>
Closes #13730 from dongjoon-hyun/SPARK-16006.
(cherry picked from commit 9b1b3ae)
Signed-off-by: Reynold Xin <[email protected]>
|
Thank you for merging, @rxin . |
What changes were proposed in this pull request?
This PR allows
emptyDataFrame.writesince the user didn't specify any partition columns.Before
After this PR, there occurs no exceptions and the created directory has only one file,
_SUCCESS, as expected.How was this patch tested?
Pass the Jenkins tests including updated test cases.