-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32508][SQL] Disallow empty part col values in partition spec before static partition writing #29316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32508][SQL] Disallow empty part col values in partition spec before static partition writing #29316
Changes from 8 commits
224b979
221564b
e012e7a
ab4e04c
232a835
a7b9a17
65f781a
965ed5a
b969c73
c8ac369
76acc07
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -866,6 +866,28 @@ class InsertSuite extends DataSourceTest with SharedSparkSession { | |
| }.getMessage | ||
| assert(message.contains("LOCAL is supported only with file: scheme")) | ||
| } | ||
|
|
||
| test("SPARK-32508 " + | ||
| "Disallow empty part col values in partition spec before static partition writing") { | ||
| withTable("insertTable") { | ||
| sql( | ||
| """ | ||
| |CREATE TABLE insertTable(i int, part1 string, part2 string) USING PARQUET | ||
| |PARTITIONED BY (part1, part2) | ||
| """.stripMargin) | ||
| val msg = "Partition spec is invalid" | ||
| assert(intercept[AnalysisException] { | ||
| sql("INSERT INTO TABLE insertTable PARTITION(part1=1, part2='') SELECT 1") | ||
| }.getMessage.contains(msg)) | ||
| assert(intercept[AnalysisException] { | ||
| sql("INSERT INTO TABLE insertTable PARTITION(part1='', part2) SELECT 1 ,'' AS part2") | ||
| }.getMessage.contains(msg)) | ||
|
|
||
| sql("INSERT INTO TABLE insertTable PARTITION(part1='1', part2='2') SELECT 1") | ||
| sql("INSERT INTO TABLE insertTable PARTITION(part1='1', part2) SELECT 1 ,'2' AS part2") | ||
| sql("INSERT INTO TABLE insertTable PARTITION(part1='1', part2) SELECT 1 ,'' AS part2") | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So the partition column can be empty string if it's dynamic. Shall we convert the empty string/null in partition spec to
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Generally speaking, it is meaningless for the partition value to be empty, so the static partition value is not allowed to be empty.
spark-sql> show partitions inserttable ;
part1=1/part2=__HIVE_DEFAULT_PARTITION__
Time taken: 0.2 seconds, Fetched 1 row(s)
spark-sql> desc formatted inserttable partition(part1='1',part2='');
Error in query: Partition spec is invalid. The spec ([part1=1, part2=]) contains an empty partition column value;
spark-sql> desc formatted inserttable partition(part1='1',part2='__HIVE_DEFAULT_PARTITION__');
col_name data_type comment
...
Time taken: 0.348 seconds, Fetched 27 row(s)The partition value the user sees is
|
||
| } | ||
| } | ||
| } | ||
|
|
||
| class FileExistingTestFileSystem extends RawLocalFileSystem { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -847,4 +847,26 @@ class InsertSuite extends QueryTest with TestHiveSingleton with BeforeAndAfter | |
| } | ||
| } | ||
| } | ||
|
|
||
| test("SPARK-32508 " + | ||
| "Disallow empty part col values in partition spec before static partition writing") { | ||
| withTable("t1") { | ||
| spark.sql( | ||
| """ | ||
| |CREATE TABLE t1 (c1 int) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is this a problem only for hive tables?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In the case that Perhaps this check can only be performed when
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hive calls public Partition getPartition(...){
|| (val != null && val.length() == 0)) {
throw new HiveException("get partition: Value for key "
+ field.getName() + " is null or empty");
}
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
what's the behavior then?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
When writing static partition or dynamic partition, the When
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what's the behavior of hive? Does hive treat
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hive> INSERT OVERWRITE TABLE t1 PARTITION(d='') select 1; hive> INSERT OVERWRITE TABLE t1 PARTITION(d) select 1,'' as d; |
||
| |PARTITIONED BY (d string) | ||
| """.stripMargin) | ||
|
|
||
| val e = intercept[AnalysisException] { | ||
| spark.sql( | ||
| """ | ||
| |INSERT OVERWRITE TABLE t1 PARTITION(d='') | ||
| |SELECT 1 | ||
| """.stripMargin) | ||
| }.getMessage | ||
|
|
||
| assert(!e.contains("get partition: Value for key d is null or empty")) | ||
| assert(e.contains("Partition spec is invalid")) | ||
| } | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the if condition can be put in one line