[SPARK-26435][SQL] Support creating partitioned table using Hive CTAS by specifying partition column names #23376

viirya · 2018-12-24T11:49:50Z

What changes were proposed in this pull request?

Spark SQL doesn't support creating partitioned table using Hive CTAS in SQL syntax. However it is supported by using DataFrameWriter API.

val df = Seq(("a", 1)).toDF("part", "id")
df.write.format("hive").partitionBy("part").saveAsTable("t")

Hive begins to support this syntax in newer version: https://issues.apache.org/jira/browse/HIVE-20241:

CREATE TABLE t PARTITIONED BY (part) AS SELECT 1 as id, "a" as part

This patch adds this support to SQL syntax.

How was this patch tested?

Added tests.

…mes.

SparkQA · 2018-12-24T15:44:35Z

Test build #100423 has finished for PR 23376 at commit 2ea2a4d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-12-24T15:50:11Z

cc @cloud-fan

cloud-fan · 2018-12-25T03:59:47Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

        if (tableDesc.partitionColumnNames.nonEmpty) {
          val errorMessage = "A Create Table As Select (CTAS) statement is not allowed to " +
-            "create a partitioned table using Hive's file formats. " +
+            "create a partitioned table using Hive's file formats by specifying table schema. " +


What does Hive report for this case?

Hive 3.2.0:

hive> CREATE TABLE t PARTITIONED BY (part string) AS SELECT id, part FROM src; FAILED: SemanticException [Error 10068]: CREATE-TABLE-AS-SELECT does not support partitioning in the target table

how about

Create Partitioned Table As Select cannot specify data type for the partition columns of the target table.

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

SparkQA · 2018-12-25T13:49:25Z

Test build #100435 has finished for PR 23376 at commit 934d6f1.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-12-25T14:13:34Z

retest this please.

SparkQA · 2018-12-25T18:32:39Z

Test build #100438 has finished for PR 23376 at commit 934d6f1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-12-26T03:03:04Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

        if (schema.nonEmpty) {
          operationNotAllowed(
            "Schema may not be specified in a Create Table As Select (CTAS) statement",
            ctx)


I think this check should go first.

oh, because val schema = StructType(dataCols ++ partitionCols) is defined, if this check goes first, it will shadow the next check if (tableDesc.partitionColumnNames.nonEmpty).

then can we check dataCols directly?

Ok. That's good.

cloud-fan · 2018-12-26T03:04:14Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

+               """.stripMargin)
+          checkAnswer(spark.table("t"), Row(1, "a"))
+
+          assert(sql("DESC t").collect().containsSlice(


a better way to test it: spark.sessionState.getTable and check if the partion columns exixts in table metadata.

Ok. Changed.

viirya · 2018-12-26T07:20:16Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala


+        // When creating partitioned table with CTAS statement, we can't specify data type for the
+        // partition columns.
+        if (partitionCols.nonEmpty) {


I changed the check of tableDesc.partitionColumnNames to partitionCols. They are the same effect here, but partitionCols is more accurate and less confusing.

SparkQA · 2018-12-26T08:05:02Z

Test build #100449 has finished for PR 23376 at commit 6cd9c2f.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-12-26T08:05:02Z

Test build #100447 has finished for PR 23376 at commit 1a3c63c.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-12-26T08:06:31Z

retest this please.

SparkQA · 2018-12-26T10:06:11Z

Test build #100451 has finished for PR 23376 at commit 6cd9c2f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-12-26T10:20:45Z

retest this please.

SparkQA · 2018-12-26T12:30:47Z

Test build #100453 has finished for PR 23376 at commit 6cd9c2f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-12-26T12:35:12Z

retest this please

SparkQA · 2018-12-26T14:43:33Z

Test build #100455 has finished for PR 23376 at commit 6cd9c2f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-12-26T17:04:11Z

Test build #100456 has finished for PR 23376 at commit d56a82a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-12-27T01:12:04Z

retest this please...

SparkQA · 2018-12-27T03:11:13Z

Test build #100462 has finished for PR 23376 at commit d56a82a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-12-27T03:15:25Z

retest this please

SparkQA · 2018-12-27T07:36:01Z

Test build #100464 has finished for PR 23376 at commit d56a82a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-12-27T07:53:35Z

It finally passes. :)

cloud-fan · 2018-12-27T08:03:44Z

thanks, merging to master!

… by specifying partition column names ## What changes were proposed in this pull request? Spark SQL doesn't support creating partitioned table using Hive CTAS in SQL syntax. However it is supported by using DataFrameWriter API. ```scala val df = Seq(("a", 1)).toDF("part", "id") df.write.format("hive").partitionBy("part").saveAsTable("t") ``` Hive begins to support this syntax in newer version: https://issues.apache.org/jira/browse/HIVE-20241: ``` CREATE TABLE t PARTITIONED BY (part) AS SELECT 1 as id, "a" as part ``` This patch adds this support to SQL syntax. ## How was this patch tested? Added tests. Closes apache#23376 from viirya/hive-ctas-partitioned-table. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

… by specifying partition column names Spark SQL doesn't support creating partitioned table using Hive CTAS in SQL syntax. However it is supported by using DataFrameWriter API. ```scala val df = Seq(("a", 1)).toDF("part", "id") df.write.format("hive").partitionBy("part").saveAsTable("t") ``` Hive begins to support this syntax in newer version: https://issues.apache.org/jira/browse/HIVE-20241: ``` CREATE TABLE t PARTITIONED BY (part) AS SELECT 1 as id, "a" as part ``` This patch adds this support to SQL syntax. Added tests. Closes apache#23376 from viirya/hive-ctas-partitioned-table. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

Hive CTAS creates partitioned table by specifying partition column na…

2ea2a4d

…mes.

cloud-fan reviewed Dec 25, 2018

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala Outdated Show resolved Hide resolved

Fix message.

934d6f1

cloud-fan reviewed Dec 26, 2018

View reviewed changes

viirya added 2 commits December 26, 2018 14:47

Address comments.

1a3c63c

Checks for data columns.

6cd9c2f

viirya commented Dec 26, 2018

View reviewed changes

cloud-fan approved these changes Dec 26, 2018

View reviewed changes

Fix old test.

d56a82a

HyukjinKwon approved these changes Dec 27, 2018

View reviewed changes

asfgit closed this in f89cdec Dec 27, 2018

xuanyuanking mentioned this pull request Aug 9, 2019

[SPARK-28662] [SQL] Create Hive Partitioned Table DDL should fail when partition column type missed #25390

Closed

viirya deleted the hive-ctas-partitioned-table branch December 27, 2023 18:36

[SPARK-26435][SQL] Support creating partitioned table using Hive CTAS by specifying partition column names #23376

[SPARK-26435][SQL] Support creating partitioned table using Hive CTAS by specifying partition column names #23376

Uh oh!

Conversation

viirya commented Dec 24, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Dec 24, 2018

Uh oh!

viirya commented Dec 24, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SparkQA commented Dec 25, 2018

Uh oh!

viirya commented Dec 25, 2018

Uh oh!

SparkQA commented Dec 25, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya Dec 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 26, 2018

Uh oh!

SparkQA commented Dec 26, 2018

Uh oh!

viirya commented Dec 26, 2018

Uh oh!

SparkQA commented Dec 26, 2018

Uh oh!

viirya commented Dec 26, 2018

Uh oh!

SparkQA commented Dec 26, 2018

Uh oh!

cloud-fan commented Dec 26, 2018

Uh oh!

SparkQA commented Dec 26, 2018

Uh oh!

SparkQA commented Dec 26, 2018

Uh oh!

viirya commented Dec 27, 2018

Uh oh!

SparkQA commented Dec 27, 2018

Uh oh!

cloud-fan commented Dec 27, 2018

Uh oh!

SparkQA commented Dec 27, 2018

Uh oh!

viirya commented Dec 27, 2018

Uh oh!

cloud-fan commented Dec 27, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

viirya Dec 26, 2018 •

edited

Loading