-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-21617][SQL] Store correct metadata in Hive for altered DS table. #18824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -616,15 +616,24 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat | |||||||||||||||||||||||||
| // Add table metadata such as table schema, partition columns, etc. to table properties. | ||||||||||||||||||||||||||
| val updatedTable = withNewSchema.copy( | ||||||||||||||||||||||||||
| properties = withNewSchema.properties ++ tableMetaToTableProps(withNewSchema)) | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| // If it's a data source table, make sure the original schema is left unchanged; the | ||||||||||||||||||||||||||
| // actual schema is recorded as a table property. | ||||||||||||||||||||||||||
| val tableToStore = if (DDLUtils.isDatasourceTable(updatedTable)) { | ||||||||||||||||||||||||||
| updatedTable.copy(schema = rawTable.schema) | ||||||||||||||||||||||||||
| } else { | ||||||||||||||||||||||||||
| updatedTable | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| try { | ||||||||||||||||||||||||||
| client.alterTable(updatedTable) | ||||||||||||||||||||||||||
| client.alterTable(tableToStore) | ||||||||||||||||||||||||||
| } catch { | ||||||||||||||||||||||||||
| case NonFatal(e) => | ||||||||||||||||||||||||||
| val warningMessage = | ||||||||||||||||||||||||||
| s"Could not alter schema of table ${rawTable.identifier.quotedString} in a Hive " + | ||||||||||||||||||||||||||
| "compatible way. Updating Hive metastore in Spark SQL specific format." | ||||||||||||||||||||||||||
| logWarning(warningMessage, e) | ||||||||||||||||||||||||||
| client.alterTable(updatedTable.copy(schema = updatedTable.partitionSchema)) | ||||||||||||||||||||||||||
| client.alterTable(updatedTable.copy(schema = tableToStore.partitionSchema)) | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
| def newSparkSQLSpecificMetastoreTable(): CatalogTable = { | |
| table.copy( | |
| // Hive only allows directory paths as location URIs while Spark SQL data source tables | |
| // also allow file paths. For non-hive-compatible format, we should not set location URI | |
| // to avoid hive metastore to throw exception. | |
| storage = table.storage.copy( | |
| locationUri = None, | |
| properties = storagePropsWithLocation), | |
| schema = table.partitionSchema, | |
| bucketSpec = None, | |
| properties = table.properties ++ tableProperties) | |
| } |
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -49,6 +49,7 @@ import org.apache.spark.sql.catalyst.expressions.Expression | |||
| import org.apache.spark.sql.catalyst.parser.{CatalystSqlParser, ParseException} | ||||
| import org.apache.spark.sql.execution.QueryExecutionException | ||||
| import org.apache.spark.sql.execution.command.DDLUtils | ||||
| import org.apache.spark.sql.hive.HiveExternalCatalog | ||||
| import org.apache.spark.sql.hive.client.HiveClientImpl._ | ||||
| import org.apache.spark.sql.types._ | ||||
| import org.apache.spark.util.{CircularBuffer, Utils} | ||||
|
|
@@ -413,7 +414,10 @@ private[hive] class HiveClientImpl( | |||
| unsupportedFeatures += "partitioned view" | ||||
| } | ||||
|
|
||||
| val properties = Option(h.getParameters).map(_.asScala.toMap).orNull | ||||
| val properties = Option(h.getParameters).map(_.asScala.toMap).getOrElse(Map()) | ||||
|
|
||||
| val provider = properties.get(HiveExternalCatalog.DATASOURCE_PROVIDER) | ||||
| .orElse(Some(DDLUtils.HIVE_PROVIDER)) | ||||
|
||||
| case None if table.tableType == VIEW => |
provider to HIVE_PROVIDER too for view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is redundant.
This was definitely not redundant in my testing. The metadata loaded from the metastore in HiveExternalCatalog.alterTableSchema definitely did not have the provider set when I debugged this. In fact the test I wrote fails if I remove this code (or comment the line that sets "provider" a few lines below).
Perhaps some other part of the code sets it in a different code path, but this would make that part of the code redundant, not the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The restoring you mention is done in HiveExternalCatalog.restoreTableMetadata. Let me see if I can use that instead of making this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do support
ALTER TABLE ADD COLUMN, which relies onalterTableSchema. The data source tables can be read by Hive if possible. Thus, I think we should not set the schema unchanged.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked the JIRA description. This sounds a bug we need to resolve. It needs a little bit complex to fix it. We need to follow what we did for
create table. cc @xwu0226 Please help @vanzin address this issue.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I see that this will break DS tables created with
newHiveCompatibleMetastoreTableinstead ofnewSparkSQLSpecificMetastoreTable.For the former, the only thing I can see that could be used to identify the case is the presence of serde properties in the table metadata. That could replace the
DDLUtils.isDatasourceTable(updatedTable)check to see whether the schema needs to be updated.For the latter case, I see that
newSparkSQLSpecificMetastoreTablestores the partition schema as the table's schema (which sort of explains the weird exception handling I saw). So this code is only correct if the partition schema cannot change. Where is the partition schema for a DS table defined? Is that under control of the user (or the data source implementation)? Because if it can change you can run into pretty much the same issue.