Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Jul 8, 2019

What changes were proposed in this pull request?

The table type is from Hive now. This will have some issues. For example, we don't support index_table, different Hive supports different table types:
Build with Hive 1.2.1:
image
Build with Hive 2.3.5:
image

This pr implement Spark's own GetTableTypesOperation.

How was this patch tested?

unit tests and manual tests:
image

@wangyum
Copy link
Member Author

wangyum commented Jul 8, 2019

@juliuszsompolski What do you think of this change?

listener.onStatementError(statementId, e.getMessage, SparkUtils.exceptionString(e))
throw e
}
listener.onStatementFinish(statementId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add onStatementClosed once #25062 is merged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

try {
CatalogTableType.tableTypes.foreach { tableType =>
if (tableType == EXTERNAL || tableType == EXTERNAL) {
rowSet.addRow(Array[AnyRef]("TABLE"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant (tableType == EXTERNAL || tableType == MANAGED), but then you want to add "TABLE" to the results only once.
I think you want
tableTypes.foreach { tableType => tableTypeString(tableType) }.toSet.foreach { type => rowset.addRow(Array[AnyRef](type) }

tableTypeString can be shared with SparkGetTablesOperation.
To share some such functions between the operatoins, how about a mixin utils trait

trait SparkOperationUtils { this: Operation =>
  def tableTypeString(tableType: CatalogTableType) = ...
}

e.g. at the bottom of SparkSQLOperationManager.scala?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

parentSession: HiveSession): GetTableTypesOperation = synchronized {
val sqlContext = sessionToContexts.get(parentSession.getSessionHandle)
require(sqlContext != null, s"Session handle: ${parentSession.getSessionHandle} has not been" +
s" initialized or had already closed.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no string interpolation here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Thank you.

@SparkQA
Copy link

SparkQA commented Jul 8, 2019

Test build #107334 has finished for PR 25073 at commit 96c79e7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@juliuszsompolski juliuszsompolski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending merge with #25062 (whichever comes first).
cc @gatorsmile

@wangyum
Copy link
Member Author

wangyum commented Jul 8, 2019

@juliuszsompolski Do we need to implement GetFunctionsOperation and GetTypeInfoOperation?
image

@juliuszsompolski
Copy link
Contributor

Hi @wangyum,
I would have to research a bit, but

  • SparkGetFunctionsOperations seems like an useful addition, modeled after SHOW FUNCTIONS / DESCRIBE FUNCTION
  • SparkGetTypeInfoOperation... I'm not sure. I think Spark type should be compatible with Hive types, otherwise we would be in trouble because of using Hive result set serialization?

As for others:

  • I think we don't need GetCatalogs, am I right to assume it will just return null right now?
  • I think we could override GetPrimaryKeys and GetCrossReference operations to just return empty, as even if Spark is connected to an external catalog that contains such reference info, Spark does not utilize these, so better not return them to the tool that may assume that Spark respects these constraints.

@SparkQA
Copy link

SparkQA commented Jul 8, 2019

Test build #107353 has finished for PR 25073 at commit 9d6e73c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


override def runInternal(): Unit = {
val statementId = UUID.randomUUID().toString
val logMsg = s"Listing table types"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit .. s.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Thank you.

@gatorsmile
Copy link
Member

The PR #25062 has been merged. Could you update this PR?

@SparkQA
Copy link

SparkQA commented Jul 13, 2019

Test build #107622 has finished for PR 25073 at commit a008c81.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@juliuszsompolski
Copy link
Contributor

LGTM. Thanks!
(sorry for the delay, missed that it's been updated)

@juliuszsompolski
Copy link
Contributor

@gatorsmile WDYT about #25073 (comment) ?

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 23, 2019

Test build #108060 has finished for PR 25073 at commit a008c81.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

SparkGetFunctionsOperations is useful, but I am also not sure about SparkGetTypeInfoOperation. Do it only when somebody is asking for it?

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks! Merged to master.

@wangyum wangyum deleted the SPARK-28293 branch July 24, 2019 20:29
@juliuszsompolski
Copy link
Contributor

I would have to research a bit, but

(...)

  • SparkGetTypeInfoOperation... I'm not sure. I think Spark type should be compatible with Hive types, otherwise we would be in trouble because of using Hive result set serialization?

As commented in #25277: maybe we should override GetTypeInfo, to filter out the types that Spark thriftserver actually does not support: INTERVAL_YEAR_MONTH, INTERVAL_DAY_TIME, ARRAY, MAP, STRUCT, UNIONTYPE and USER_DEFINED, all of which Spark turns into string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants