Skip to content

Conversation

@gatorsmile
Copy link
Member

What changes were proposed in this pull request?

In Spark 2.1, we introduced a new internal provider hive for telling Hive serde tables from data source tables. This PR is to block users to specify this in DataFrameWriter and SQL APIs.

How was this patch tested?

Added a test case

@SparkQA
Copy link

SparkQA commented Sep 13, 2016

Test build #65304 has finished for PR 15073 at commit ef0fe2e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 14, 2016

Test build #65348 has finished for PR 15073 at commit 9711edb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @cloud-fan

}
}

test("save API - format hive") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this API, previously we will fail with message Failed to find data source: hive right? Should we change it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, change all of them to the message Failed to find data source: hive

Copy link
Contributor

@cloud-fan cloud-fan Sep 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry I missed this one, what I was asking is, we should only check the provider in saveAsTable, so that the save API is totally untouched.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh... I see.

* @since 1.4.0
*/
def format(source: String): DataFrameWriter[T] = {
if (source.toLowerCase == "hive") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CatalogImpl.createExternalTable, we check the hive provider without lowercase, can you fix that? thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

@rxin
Copy link
Contributor

rxin commented Sep 16, 2016

Curious - why do we want to block it?

@gatorsmile
Copy link
Member Author

gatorsmile commented Sep 16, 2016

@rxin Let me answer it based on my understanding. Now, we are consolidating the write path, including providing a unified CREATE TABLE interface for both Hive serde tables and data source tables. So far, this feature is not ready. More ongoing works are needed before we can use hive as a data source format. If we do not block it, many bugs exist, since the interface (e.g, SQL, DataFrameWriter APIs, and createExternalTable APIs), DDL execution and metastore formats are still different between Hive serde tables and data source tables.

Thus, blocking the hive format is needed until we can officially support it. Let me know if my understanding is not right. cc @cloud-fan @yhuai

@rxin
Copy link
Contributor

rxin commented Sep 16, 2016

OK got it. Thanks.

@cloud-fan
Copy link
Contributor

LGTM, pending jenkins

@SparkQA
Copy link

SparkQA commented Sep 16, 2016

Test build #65472 has finished for PR 15073 at commit 44f335b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val e = intercept[AnalysisException] {
spark.range(10).write.format("hive").mode(SaveMode.Overwrite).saveAsTable(tableName)
}.getMessage
assert(e.contains("Failed to find data source: hive"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@SparkQA
Copy link

SparkQA commented Sep 17, 2016

Test build #65520 has finished for PR 15073 at commit da5bec6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val options = Option(ctx.tablePropertyList).map(visitPropertyKeyValues).getOrElse(Map.empty)
val provider = ctx.tableProvider.qualifiedName.getText
if (provider.toLowerCase == "hive") {
throw new AnalysisException(s"Failed to find data source: $provider")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should follow error message in other places: Cannot create hive serde table with CREATE TABLE USING

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. : )

@SparkQA
Copy link

SparkQA commented Sep 17, 2016

Test build #65538 has finished for PR 15073 at commit ef174c1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 3a3c9ff Sep 18, 2016
@cloud-fan
Copy link
Contributor

thanks, merging to master!

wgtmac pushed a commit to wgtmac/spark that referenced this pull request Sep 19, 2016
…ovider Hive

### What changes were proposed in this pull request?
In Spark 2.1, we introduced a new internal provider `hive` for telling Hive serde tables from data source tables. This PR is to block users to specify this in `DataFrameWriter` and SQL APIs.

### How was this patch tested?
Added a test case

Author: gatorsmile <[email protected]>

Closes apache#15073 from gatorsmile/formatHive.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants