-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17518] [SQL] Block Users to Specify the Internal Data Source Provider Hive #15073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #65304 has finished for PR 15073 at commit
|
|
Test build #65348 has finished for PR 15073 at commit
|
|
cc @cloud-fan |
| } | ||
| } | ||
|
|
||
| test("save API - format hive") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this API, previously we will fail with message Failed to find data source: hive right? Should we change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, change all of them to the message Failed to find data source: hive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry I missed this one, what I was asking is, we should only check the provider in saveAsTable, so that the save API is totally untouched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uh... I see.
| * @since 1.4.0 | ||
| */ | ||
| def format(source: String): DataFrameWriter[T] = { | ||
| if (source.toLowerCase == "hive") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In CatalogImpl.createExternalTable, we check the hive provider without lowercase, can you fix that? thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks!
|
Curious - why do we want to block it? |
|
@rxin Let me answer it based on my understanding. Now, we are consolidating the write path, including providing a unified CREATE TABLE interface for both Hive serde tables and data source tables. So far, this feature is not ready. More ongoing works are needed before we can use Thus, blocking the |
|
OK got it. Thanks. |
|
LGTM, pending jenkins |
|
Test build #65472 has finished for PR 15073 at commit
|
| val e = intercept[AnalysisException] { | ||
| spark.range(10).write.format("hive").mode(SaveMode.Overwrite).saveAsTable(tableName) | ||
| }.getMessage | ||
| assert(e.contains("Failed to find data source: hive")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after we address https://github.com/apache/spark/pull/15073/files#r79122288, we should follow https://github.com/apache/spark/pull/15073/files#diff-463cb1b0f60d87ada075a820f18e1104R262 to generate error message for this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
Test build #65520 has finished for PR 15073 at commit
|
| val options = Option(ctx.tablePropertyList).map(visitPropertyKeyValues).getOrElse(Map.empty) | ||
| val provider = ctx.tableProvider.qualifiedName.getText | ||
| if (provider.toLowerCase == "hive") { | ||
| throw new AnalysisException(s"Failed to find data source: $provider") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should follow error message in other places: Cannot create hive serde table with CREATE TABLE USING
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. : )
|
Test build #65538 has finished for PR 15073 at commit
|
|
thanks, merging to master! |
…ovider Hive ### What changes were proposed in this pull request? In Spark 2.1, we introduced a new internal provider `hive` for telling Hive serde tables from data source tables. This PR is to block users to specify this in `DataFrameWriter` and SQL APIs. ### How was this patch tested? Added a test case Author: gatorsmile <[email protected]> Closes apache#15073 from gatorsmile/formatHive.
What changes were proposed in this pull request?
In Spark 2.1, we introduced a new internal provider
hivefor telling Hive serde tables from data source tables. This PR is to block users to specify this inDataFrameWriterand SQL APIs.How was this patch tested?
Added a test case