-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15269][SQL] Removes unexpected empty table directories created while creating external Spark SQL data sourcet tables. #13270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
528b835
fa7b5b6
0aedf7b
505f3f0
04af79d
6241289
336fb55
7d0122f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -377,7 +377,7 @@ private[hive] class HiveClientImpl( | |
| // allows directory paths as location URIs while Spark SQL data source tables also | ||
| // allows file paths. So the standard Hive `dataLocation` is meaningless for Spark SQL | ||
| // data source tables. | ||
| DDLUtils.isDatasourceTable(properties) | ||
| DDLUtils.isDatasourceTable(properties) && h.getTableType == HiveTableType.EXTERNAL_TABLE | ||
| }, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need this check?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because we have to store the placeholder location URI into metastore for external data source tables, and I'd like to avoid exposing it to user space.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Technically, it doesn't harm to keep this field since Spark SQL doesn't use it. But this placeholder location URI doesn't make sense anywhere, and it can be error prone to keep it in the |
||
| inputFormat = Option(h.getInputFormatClass).map(_.getName), | ||
| outputFormat = Option(h.getOutputFormatClass).map(_.getName), | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So the hive compatible datasource table will not be resolved as datasource relation anymore. Correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this doesn't work according to the last Jenkins build failure, because we still want to recognize this table as data source table when saving data into the table using
CreateDataSourceTableAsSelectCommand. My last commit tries to fix this by checking.storage.locationUri.isEmpty. This should work because data source tables never set this field.