[SPARK-17518] [SQL] Block Users to Specify the Internal Data Source Provider Hive #15073

gatorsmile · 2016-09-13T06:48:35Z

What changes were proposed in this pull request?

In Spark 2.1, we introduced a new internal provider hive for telling Hive serde tables from data source tables. This PR is to block users to specify this in DataFrameWriter and SQL APIs.

How was this patch tested?

Added a test case

SparkQA · 2016-09-13T08:14:08Z

Test build #65304 has finished for PR 15073 at commit ef0fe2e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-09-14T06:02:32Z

Test build #65348 has finished for PR 15073 at commit 9711edb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-09-15T02:09:57Z

cc @cloud-fan

cloud-fan · 2016-09-15T03:45:00Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala

    }
  }

+  test("save API - format hive") {


For this API, previously we will fail with message Failed to find data source: hive right? Should we change it?

Sure, change all of them to the message Failed to find data source: hive

Oh sorry I missed this one, what I was asking is, we should only check the provider in saveAsTable, so that the save API is totally untouched.

uh... I see.

cloud-fan · 2016-09-15T03:46:44Z

sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala

   * @since 1.4.0
   */
  def format(source: String): DataFrameWriter[T] = {
+    if (source.toLowerCase == "hive") {


In CatalogImpl.createExternalTable, we check the hive provider without lowercase, can you fix that? thanks!

Done. Thanks!

rxin · 2016-09-16T01:03:54Z

Curious - why do we want to block it?

gatorsmile · 2016-09-16T02:32:52Z

@rxin Let me answer it based on my understanding. Now, we are consolidating the write path, including providing a unified CREATE TABLE interface for both Hive serde tables and data source tables. So far, this feature is not ready. More ongoing works are needed before we can use hive as a data source format. If we do not block it, many bugs exist, since the interface (e.g, SQL, DataFrameWriter APIs, and createExternalTable APIs), DDL execution and metastore formats are still different between Hive serde tables and data source tables.

Thus, blocking the hive format is needed until we can officially support it. Let me know if my understanding is not right. cc @cloud-fan @yhuai

rxin · 2016-09-16T02:33:56Z

OK got it. Thanks.

cloud-fan · 2016-09-16T03:39:39Z

LGTM, pending jenkins

SparkQA · 2016-09-16T05:35:46Z

Test build #65472 has finished for PR 15073 at commit 44f335b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-09-16T08:03:47Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala

+      val e = intercept[AnalysisException] {
+        spark.range(10).write.format("hive").mode(SaveMode.Overwrite).saveAsTable(tableName)
+      }.getMessage
+      assert(e.contains("Failed to find data source: hive"))


after we address https://github.com/apache/spark/pull/15073/files#r79122288, we should follow https://github.com/apache/spark/pull/15073/files#diff-463cb1b0f60d87ada075a820f18e1104R262 to generate error message for this case

SparkQA · 2016-09-17T06:16:11Z

Test build #65520 has finished for PR 15073 at commit da5bec6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-09-17T10:45:37Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

    val options = Option(ctx.tablePropertyList).map(visitPropertyKeyValues).getOrElse(Map.empty)
    val provider = ctx.tableProvider.qualifiedName.getText
+    if (provider.toLowerCase == "hive") {
+      throw new AnalysisException(s"Failed to find data source: $provider")


we should follow error message in other places: Cannot create hive serde table with CREATE TABLE USING

SparkQA · 2016-09-17T17:40:10Z

Test build #65538 has finished for PR 15073 at commit ef174c1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-09-18T07:40:42Z

thanks, merging to master!

…ovider Hive ### What changes were proposed in this pull request? In Spark 2.1, we introduced a new internal provider `hive` for telling Hive serde tables from data source tables. This PR is to block users to specify this in `DataFrameWriter` and SQL APIs. ### How was this patch tested? Added a test case Author: gatorsmile <[email protected]> Closes apache#15073 from gatorsmile/formatHive.

block users to provide the format hive.

ef0fe2e

gatorsmile added 3 commits September 13, 2016 11:48

fix.

f2366ba

Merge remote-tracking branch 'upstream/master' into formatHive

38e958b

fix test cases

9711edb

cloud-fan reviewed Sep 15, 2016

View reviewed changes

address comments

44f335b

cloud-fan reviewed Sep 16, 2016

View reviewed changes

address comments.

da5bec6

cloud-fan reviewed Sep 17, 2016

View reviewed changes

address comments

ef174c1

asfgit closed this in 3a3c9ff Sep 18, 2016

[SPARK-17518] [SQL] Block Users to Specify the Internal Data Source Provider Hive #15073

[SPARK-17518] [SQL] Block Users to Specify the Internal Data Source Provider Hive #15073

Uh oh!

Conversation

gatorsmile commented Sep 13, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Sep 13, 2016

Uh oh!

SparkQA commented Sep 14, 2016

Uh oh!

gatorsmile commented Sep 15, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Sep 16, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rxin commented Sep 16, 2016

Uh oh!

gatorsmile commented Sep 16, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rxin commented Sep 16, 2016

Uh oh!

cloud-fan commented Sep 16, 2016

Uh oh!

SparkQA commented Sep 16, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 17, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 17, 2016

Uh oh!

cloud-fan commented Sep 18, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cloud-fan Sep 16, 2016 •

edited

Loading

gatorsmile commented Sep 16, 2016 •

edited

Loading