-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26435][SQL] Support creating partitioned table using Hive CTAS by specifying partition column names #23376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #100423 has finished for PR 23376 at commit
|
|
cc @cloud-fan |
| if (tableDesc.partitionColumnNames.nonEmpty) { | ||
| val errorMessage = "A Create Table As Select (CTAS) statement is not allowed to " + | ||
| "create a partitioned table using Hive's file formats. " + | ||
| "create a partitioned table using Hive's file formats by specifying table schema. " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does Hive report for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hive 3.2.0:
hive> CREATE TABLE t PARTITIONED BY (part string) AS SELECT id, part FROM src;
FAILED: SemanticException [Error 10068]: CREATE-TABLE-AS-SELECT does not support partitioning in the target table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about
Create Partitioned Table As Select cannot specify data type for the partition columns of the target table.
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
Outdated
Show resolved
Hide resolved
|
Test build #100435 has finished for PR 23376 at commit
|
|
retest this please. |
|
Test build #100438 has finished for PR 23376 at commit
|
| if (schema.nonEmpty) { | ||
| operationNotAllowed( | ||
| "Schema may not be specified in a Create Table As Select (CTAS) statement", | ||
| ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check should go first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, because val schema = StructType(dataCols ++ partitionCols) is defined, if this check goes first, it will shadow the next check if (tableDesc.partitionColumnNames.nonEmpty).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then can we check dataCols directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. That's good.
| """.stripMargin) | ||
| checkAnswer(spark.table("t"), Row(1, "a")) | ||
|
|
||
| assert(sql("DESC t").collect().containsSlice( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a better way to test it: spark.sessionState.getTable and check if the partion columns exixts in table metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Changed.
|
|
||
| // When creating partitioned table with CTAS statement, we can't specify data type for the | ||
| // partition columns. | ||
| if (partitionCols.nonEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the check of tableDesc.partitionColumnNames to partitionCols. They are the same effect here, but partitionCols is more accurate and less confusing.
|
Test build #100449 has finished for PR 23376 at commit
|
|
Test build #100447 has finished for PR 23376 at commit
|
|
retest this please. |
|
Test build #100451 has finished for PR 23376 at commit
|
|
retest this please. |
|
Test build #100453 has finished for PR 23376 at commit
|
|
retest this please |
|
Test build #100455 has finished for PR 23376 at commit
|
|
Test build #100456 has finished for PR 23376 at commit
|
|
retest this please... |
|
Test build #100462 has finished for PR 23376 at commit
|
|
retest this please |
|
Test build #100464 has finished for PR 23376 at commit
|
|
It finally passes. :) |
|
thanks, merging to master! |
… by specifying partition column names
## What changes were proposed in this pull request?
Spark SQL doesn't support creating partitioned table using Hive CTAS in SQL syntax. However it is supported by using DataFrameWriter API.
```scala
val df = Seq(("a", 1)).toDF("part", "id")
df.write.format("hive").partitionBy("part").saveAsTable("t")
```
Hive begins to support this syntax in newer version: https://issues.apache.org/jira/browse/HIVE-20241:
```
CREATE TABLE t PARTITIONED BY (part) AS SELECT 1 as id, "a" as part
```
This patch adds this support to SQL syntax.
## How was this patch tested?
Added tests.
Closes apache#23376 from viirya/hive-ctas-partitioned-table.
Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
… by specifying partition column names
## What changes were proposed in this pull request?
Spark SQL doesn't support creating partitioned table using Hive CTAS in SQL syntax. However it is supported by using DataFrameWriter API.
```scala
val df = Seq(("a", 1)).toDF("part", "id")
df.write.format("hive").partitionBy("part").saveAsTable("t")
```
Hive begins to support this syntax in newer version: https://issues.apache.org/jira/browse/HIVE-20241:
```
CREATE TABLE t PARTITIONED BY (part) AS SELECT 1 as id, "a" as part
```
This patch adds this support to SQL syntax.
## How was this patch tested?
Added tests.
Closes apache#23376 from viirya/hive-ctas-partitioned-table.
Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
… by specifying partition column names
Spark SQL doesn't support creating partitioned table using Hive CTAS in SQL syntax. However it is supported by using DataFrameWriter API.
```scala
val df = Seq(("a", 1)).toDF("part", "id")
df.write.format("hive").partitionBy("part").saveAsTable("t")
```
Hive begins to support this syntax in newer version: https://issues.apache.org/jira/browse/HIVE-20241:
```
CREATE TABLE t PARTITIONED BY (part) AS SELECT 1 as id, "a" as part
```
This patch adds this support to SQL syntax.
Added tests.
Closes apache#23376 from viirya/hive-ctas-partitioned-table.
Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
Spark SQL doesn't support creating partitioned table using Hive CTAS in SQL syntax. However it is supported by using DataFrameWriter API.
Hive begins to support this syntax in newer version: https://issues.apache.org/jira/browse/HIVE-20241:
This patch adds this support to SQL syntax.
How was this patch tested?
Added tests.