[SPARK-28293][SQL] Implement Spark's own GetTableTypesOperation #25073

wangyum · 2019-07-08T07:59:05Z

What changes were proposed in this pull request?

The table type is from Hive now. This will have some issues. For example, we don't support index_table, different Hive supports different table types:
Build with Hive 1.2.1:

Build with Hive 2.3.5:

This pr implement Spark's own GetTableTypesOperation.

How was this patch tested?

unit tests and manual tests:

wangyum · 2019-07-08T09:54:22Z

@juliuszsompolski What do you think of this change?

juliuszsompolski · 2019-07-08T10:00:48Z

...rver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetTableTypesOperation.scala

+        listener.onStatementError(statementId, e.getMessage, SparkUtils.exceptionString(e))
+        throw e
+    }
+    listener.onStatementFinish(statementId)


add onStatementClosed once #25062 is merged.

juliuszsompolski · 2019-07-08T10:09:13Z

...rver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetTableTypesOperation.scala

+    try {
+      CatalogTableType.tableTypes.foreach { tableType =>
+        if (tableType == EXTERNAL || tableType == EXTERNAL) {
+          rowSet.addRow(Array[AnyRef]("TABLE"))


I think you meant (tableType == EXTERNAL || tableType == MANAGED), but then you want to add "TABLE" to the results only once.
I think you want
tableTypes.foreach { tableType => tableTypeString(tableType) }.toSet.foreach { type => rowset.addRow(Array[AnyRef](type) }

tableTypeString can be shared with SparkGetTablesOperation.
To share some such functions between the operatoins, how about a mixin utils trait

trait SparkOperationUtils { this: Operation => def tableTypeString(tableType: CatalogTableType) = ... }

e.g. at the bottom of SparkSQLOperationManager.scala?

viirya · 2019-07-08T10:31:01Z

.../src/main/scala/org/apache/spark/sql/hive/thriftserver/server/SparkSQLOperationManager.scala

+      parentSession: HiveSession): GetTableTypesOperation = synchronized {
+    val sqlContext = sessionToContexts.get(parentSession.getSessionHandle)
+    require(sqlContext != null, s"Session handle: ${parentSession.getSessionHandle} has not been" +
+      s" initialized or had already closed.")


nit: no string interpolation here.

OK. Thank you.

SparkQA · 2019-07-08T11:04:40Z

Test build #107334 has finished for PR 25073 at commit 96c79e7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

juliuszsompolski

LGTM pending merge with #25062 (whichever comes first).
cc @gatorsmile

wangyum · 2019-07-08T14:21:48Z

@juliuszsompolski Do we need to implement GetFunctionsOperation and GetTypeInfoOperation?

juliuszsompolski · 2019-07-08T15:27:49Z

Hi @wangyum,
I would have to research a bit, but

SparkGetFunctionsOperations seems like an useful addition, modeled after SHOW FUNCTIONS / DESCRIBE FUNCTION
SparkGetTypeInfoOperation... I'm not sure. I think Spark type should be compatible with Hive types, otherwise we would be in trouble because of using Hive result set serialization?

As for others:

I think we don't need GetCatalogs, am I right to assume it will just return null right now?
I think we could override GetPrimaryKeys and GetCrossReference operations to just return empty, as even if Spark is connected to an external catalog that contains such reference info, Spark does not utilize these, so better not return them to the tool that may assume that Spark respects these constraints.

SparkQA · 2019-07-08T16:31:14Z

Test build #107353 has finished for PR 25073 at commit 9d6e73c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-07-09T03:22:56Z

...rver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetTableTypesOperation.scala

+
+  override def runInternal(): Unit = {
+    val statementId = UUID.randomUUID().toString
+    val logMsg = s"Listing table types"


nit .. s.

OK. Thank you.

gatorsmile · 2019-07-12T18:06:09Z

The PR #25062 has been merged. Could you update this PR?

SparkQA · 2019-07-13T02:20:28Z

Test build #107622 has finished for PR 25073 at commit a008c81.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

juliuszsompolski · 2019-07-18T12:14:47Z

LGTM. Thanks!
(sorry for the delay, missed that it's been updated)

juliuszsompolski · 2019-07-18T12:15:35Z

@gatorsmile WDYT about #25073 (comment) ?

gatorsmile · 2019-07-23T16:07:17Z

retest this please

SparkQA · 2019-07-23T20:06:38Z

Test build #108060 has finished for PR 25073 at commit a008c81.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2019-07-24T18:25:56Z

SparkGetFunctionsOperations is useful, but I am also not sure about SparkGetTypeInfoOperation. Do it only when somebody is asking for it?

gatorsmile

LGTM

Thanks! Merged to master.

juliuszsompolski · 2019-08-22T11:02:47Z

I would have to research a bit, but

(...)

SparkGetTypeInfoOperation... I'm not sure. I think Spark type should be compatible with Hive types, otherwise we would be in trouble because of using Hive result set serialization?

As commented in #25277: maybe we should override GetTypeInfo, to filter out the types that Spark thriftserver actually does not support: INTERVAL_YEAR_MONTH, INTERVAL_DAY_TIME, ARRAY, MAP, STRUCT, UNIONTYPE and USER_DEFINED, all of which Spark turns into string.

Implement Spark's own GetTableTypesOperation

96c79e7

juliuszsompolski reviewed Jul 8, 2019

View reviewed changes

viirya reviewed Jul 8, 2019

View reviewed changes

Add SparkMetadataOperationUtils

9d6e73c

juliuszsompolski approved these changes Jul 8, 2019

View reviewed changes

dongjoon-hyun added the SQL label Jul 8, 2019

HyukjinKwon reviewed Jul 9, 2019

View reviewed changes

wangyum added 2 commits July 13, 2019 06:46

Merge remote-tracking branch 'upstream/master' into SPARK-28293

2588bff

Merge master

a008c81

gatorsmile approved these changes Jul 24, 2019

View reviewed changes

gatorsmile closed this in 045191e Jul 24, 2019

wangyum deleted the SPARK-28293 branch July 24, 2019 20:29

[SPARK-28293][SQL] Implement Spark's own GetTableTypesOperation #25073

[SPARK-28293][SQL] Implement Spark's own GetTableTypesOperation #25073

Uh oh!

Conversation

wangyum commented Jul 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

wangyum commented Jul 8, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 8, 2019

Uh oh!

juliuszsompolski left a comment

Choose a reason for hiding this comment

Uh oh!

wangyum commented Jul 8, 2019

Uh oh!

juliuszsompolski commented Jul 8, 2019

Uh oh!

SparkQA commented Jul 8, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Jul 12, 2019

Uh oh!

SparkQA commented Jul 13, 2019

Uh oh!

juliuszsompolski commented Jul 18, 2019

Uh oh!

juliuszsompolski commented Jul 18, 2019

Uh oh!

gatorsmile commented Jul 23, 2019

Uh oh!

SparkQA commented Jul 23, 2019

Uh oh!

gatorsmile commented Jul 24, 2019

Uh oh!

gatorsmile left a comment

Choose a reason for hiding this comment

Uh oh!

juliuszsompolski commented Aug 22, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

wangyum commented Jul 8, 2019 •

edited

Loading