-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19667][SQL]create table with hiveenabled in default database use warehouse path instead of the location of default database #17001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
aebdfc6
825c0ad
a2c9168
bacd528
3f6e061
96dcc7d
f329387
83dba73
58a0020
1dce2d7
12f81d3
56e83d5
901bb1c
99d9746
db555e3
d327994
73c8802
747b31a
8f8063f
4dc11c1
9c0773b
80b8133
41ea115
13245e4
096ae63
badd61b
35d2b59
e3a467e
ae9938a
7739ccd
f93f5d3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,6 +17,9 @@ | |
|
|
||
| package org.apache.spark.sql.catalyst.catalog | ||
|
|
||
| import org.apache.hadoop.conf.Configuration | ||
|
|
||
| import org.apache.spark.SparkConf | ||
| import org.apache.spark.sql.catalyst.analysis.{FunctionAlreadyExistsException, NoSuchDatabaseException, NoSuchFunctionException, NoSuchTableException} | ||
| import org.apache.spark.sql.catalyst.expressions.Expression | ||
|
|
||
|
|
@@ -30,7 +33,7 @@ import org.apache.spark.sql.catalyst.expressions.Expression | |
| * | ||
| * Implementations should throw [[NoSuchDatabaseException]] when databases don't exist. | ||
| */ | ||
| abstract class ExternalCatalog { | ||
| abstract class ExternalCatalog(conf: SparkConf, hadoopConf: Configuration) { | ||
| import CatalogTypes.TablePartitionSpec | ||
|
|
||
| protected def requireDbExists(db: String): Unit = { | ||
|
|
@@ -74,7 +77,19 @@ abstract class ExternalCatalog { | |
| */ | ||
| def alterDatabase(dbDefinition: CatalogDatabase): Unit | ||
|
|
||
| def getDatabase(db: String): CatalogDatabase | ||
| def getDatabase(db: String): CatalogDatabase = { | ||
|
||
| val database = getDatabaseInternal(db) | ||
|
||
| // The default database's location always uses the warehouse path. | ||
| // Since the location of database stored in metastore is qualified, | ||
| // we also make the warehouse location qualified. | ||
| if (db == SessionCatalog.DEFAULT_DATABASE) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we totally make default database a virtual concept? i.e. we never create it or ask metastore to retrieve it, just keep an instance of
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think if we create an default instance of
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. makes sense |
||
| val qualifiedWarehousePath = SessionCatalog | ||
| .makeQualifiedPath(warehousePath, hadoopConf).toString | ||
| database.copy(locationUri = qualifiedWarehousePath) | ||
| } else { | ||
| database | ||
| } | ||
| } | ||
|
|
||
| def databaseExists(db: String): Boolean | ||
|
|
||
|
|
@@ -269,4 +284,7 @@ abstract class ExternalCatalog { | |
|
|
||
| def listFunctions(db: String, pattern: String): Seq[String] | ||
|
|
||
| protected def getDatabaseInternal(db: String): CatalogDatabase | ||
|
|
||
| protected def warehousePath: String | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we just pass in a
defaultDB: CatalogDatabase? then we don't need to add theprotected def warehousePath: StringThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think conf/hadoopConf is more useful, later logic can use it. and it's subclass also has these two conf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we still have conf/hadoopConf in
InMemoryCatalogandHiveExternalCatalog, we can just add one more parameter.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we pass a defaultDB, it seems like we introduce an instance of defaultDB as we discussed above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it will be only used in
getDatabase, and we can save a metastore call to get the default database.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok~ let me fix it~
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan I found it that if we add a parameter
defaultDBforExternalCatalogand its subclassInMemoryCatalogandHiveExternalCatalog, this change will cause a lot of related code to be modified, such as test cases ,and other logic where createInMemoryCatalogandHiveExternalCatalogFor example:
currently all the parameters of
InMemoryCataloghave its own default valuewe can create it without an parameters, but if we add a
defaultDB, we should new a defaultDB in the parameter, while we can not create a legal deafultDB because we can not get the warehouse path for the defaultDB like this:if we don't provide a default value for defautDB in the parameter, this will cause more code change which I think it is not proper.
what about we keep the
provided def warehousePathinExternalCatalog, and add alazy val defaultDB = { val qualifiedWarehousePath = SessionCatalog .makeQualifiedPath(warehousePath, hadoopConf).toString CatalogDatabase("default","", qualifiedWarehousePath, Map.empty) }this can also avoid call getDatabase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have modify the code by adding
in
ExternalCatalogif it is not ok ,I will revert it, thanks~