[SPARK-28982][SQL] Implementation Spark's own GetTypeInfoOperation #25694

AngersZhuuuu · 2019-09-05T11:20:01Z

What changes were proposed in this pull request?

Current Spark Thrift Server return TypeInfo includes

INTERVAL_YEAR_MONTH
INTERVAL_DAY_TIME
UNION
USER_DEFINED

Spark doesn't support INTERVAL_YEAR_MONTH, INTERVAL_YEAR_MONTH, UNION
and won't return USER)DEFINED type.
This PR overwrite GetTypeInfoOperation with SparkGetTypeInfoOperation to exclude types which we don't need.

In hive-1.2.1 Type class is org.apache.hive.service.cli.Type
In hive-2.3.x Type class is org.apache.hadoop.hive.serde2.thrift.Type

Use ThrifrserverShimUtils to fit version problem and exclude types we don't need

Why are the changes needed?

We should return type info of Spark's own type info

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manuel test & Added UT

AngersZhuuuu · 2019-09-05T11:25:28Z

gently ping @wangyum @juliuszsompolski Implement this operation.

This reverts commit 2ec8565.

juliuszsompolski · 2019-09-06T09:39:17Z

sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/cli/Type.scala

+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.thriftserver.cli


This is essentially copied from Hive Type.java, removing the types that we don't support, right?

Maybe instead of cloning it like that, still use the Hive types that Hive's GetTypeInfoOperation uses, and just have a filter like

val unsupportedTypes = Set("INTERVAL_YEAR_MONTH", "INTERVAL_DAY_TIME", "ARRAY", "MAP", "STRUCT", "UNIONTYPE", "USER_DEFINED") ... if (!unsupportedTypes.contains(typeInfoGetName)) { val rowData = ... }

There may be value in the future in cloning that, if our types would diverge more from Hive types, but I think for now we can just reuse it instead of cloning these several hundred lines of code.

emmmm since for Hive.
In 1.2.1 Type class is org.apache.hive.service.cli.Type
In 2.3.x Type class is org.apache.hadoop.hive.serde2.thrift.Type

Spark only return it's DataType, I add a new Class(Type) for Spark.
If not we may need to add more method to ThriftServerShimUtils.

I am working on make Spark's own ThriftServer, don't rely on hive version.
In my subconscious，don't want to add method on ThriftserverShimUtils.

If this only clones the Hive types, skipping the ones we don't support, I'd prefer adding it to the shim and not cloning the code.
I don't think Spark would further diverge from Hive types. If it happens, we could clone it at that point.
@wangyum what do you think?

Not all same, such as INTEGER, Spark IntegerType's simpleStringName is INTEGER, in hive Types, there is INT. for NULL, change the name to NULL, in hive is void.

Ok. That seems like a good enough argument to just clone it.
Please list these differences in the PR description.

OK, I will add to description.

juliuszsompolski · 2019-09-06T09:39:47Z

sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/cli/Type.scala

+    if (isPrimitiveType) {
+      DatabaseMetaData.typeSearchable.toShort
+    } else {
+      DatabaseMetaData.typePredNone.toShort


out of curiosity (I know it's copied from Hive): shouldn't a type like e.g. String be searchable?

String is searchable. In my return it's 3, means searchable.

Sorry, I misread "isPrimitiveType" and somehow read "isNumericType".

juliuszsompolski · 2019-09-06T10:38:48Z

https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala#L334

val attrTypeString = if (field.dataType == NullType) "void" else field.dataType.catalogString

can this return "null" now instead of "void"?

AngersZhuuuu · 2019-09-06T10:48:55Z

null

NO, my pr only cover !typeinfo, other things can't change. Since FiledSchema is hive's class.

Build a Spark's ThriftServer by it's own API is what needs to be done.

juliuszsompolski · 2019-09-06T10:40:18Z

sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/cli/Type.scala

+  }
+
+  case object INTEGER extends Type {
+    override def getName: String = "INTEGER"


Out of curiosity: where is this actually defined?
IntergerType.simpleString is "int", but parser definitely accepts INTEGER and I can't seem to find where it's defined.

Out of curiosity: where is this actually defined?
IntergerType.simpleString is "int", but parser definitely accepts INTEGER and I can't seem to find where it's defined.

When I check for simpleString, miss this place, then through IntergerType.simpleString is Integer.
So add a method of filter un-support Type is better.

AngersZhuuuu · 2019-09-06T14:28:37Z

@juliuszsompolski change to use ThriftserverShimUtils. Any more advise.

Hive 1.2.1 result :

Hive 2.3.5 result:

juliuszsompolski

LGTM.
I think it's good now.
It can be thought between "NULL" and "VOID", but I think it's fine as is - NULL is a weird type, and it's name cannot be directly used as a type name in a CREATE TABLE statement anyway (CREATE TABLE foo (col NULL) would fail), so I think it doesn't matter if the name is NULL or VOID.

juliuszsompolski · 2019-09-06T21:34:40Z

...ver/v2.3.5/src/main/scala/org/apache/spark/sql/hive/thriftserver/ThriftserverShimUtils.scala

+      Type.BIGINT_TYPE, Type.FLOAT_TYPE, Type.DOUBLE_TYPE, Type.STRING_TYPE, Type.DATE_TYPE,
+      Type.TIMESTAMP_TYPE, Type.DECIMAL_TYPE, Type.BINARY_TYPE)
+  }
  private[thriftserver] def addToClassPath(


nit: empty line between functions.

AngersZhuuuu · 2019-09-07T00:05:33Z

LGTM.
I think it's good now.
It can be thought between "NULL" and "VOID", but I think it's fine as is - NULL is a weird type, and it's name cannot be directly used as a type name in a CREATE TABLE statement anyway (CREATE TABLE foo (col NULL) would fail), so I think it doesn't matter if the name is NULL or VOID.

Yes, and in SparkExecuteStatementOperation it get NullType's attrTypeString as void too

wangyum · 2019-09-07T15:15:32Z

...ver/v2.3.5/src/main/scala/org/apache/spark/sql/hive/thriftserver/ThriftserverShimUtils.scala

+    Array(Type.NULL_TYPE, Type.BOOLEAN_TYPE, Type.TINYINT_TYPE, Type.SMALLINT_TYPE, Type.INT_TYPE,
+      Type.BIGINT_TYPE, Type.FLOAT_TYPE, Type.DOUBLE_TYPE, Type.STRING_TYPE, Type.DATE_TYPE,
+      Type.TIMESTAMP_TYPE, Type.DECIMAL_TYPE, Type.BINARY_TYPE)
+  }


Why do we skip ARRAY_TYPE, MAP_TYPE, STRUCT_TYPE and USER_DEFINED_TYPE?

Why do we skip ARRAY_TYPE, MAP_TYPE, STRUCT_TYPE and USER_DEFINED_TYPE?

Support this type just convert to string to show. Should add .

I think we should add these types. Hive-3.1.2 also converted these types to strings.
@juliuszsompolski What do you think?

How does the client handle it?
If you do

val stmt = conn.prepareStatement("SELECT array, map, struct, interval FROM table") val rs = stmt.executeQuery() val md = rs.getMetaData()

Then what does md.getColumnType(i) return for each of these columns?
What type of rs.getXXX call should the user use for each of these columns? For the array column, should it be rs.getArray(i) or rs.getString(i)?
What is the mapping of types returned by md.getColumnType(i), with the getters that should be used for them in rs.getXXX(i)?

For getMetadataResult, it truly return ARRAY, MAP, STRUCT.
Return content is organized by HiveResult.toHiveString() method as each's DataType.

OK. We can not fully support these type. Please remove them @AngersZhuuuu
Thanks @juliuszsompolski for you example.

Actually, thanks for explaining it @AngersZhuuuu, and you convinced me that ARRAY, MAP and STRUCT must be included.

scala> md.getColumnType(1) res11: Int = 2003 (== java.sql.Types.ARRAY)

but then

scala> md.getColumnClassName(1) res10: String = java.lang.String

so that tells to the client that it is actually returned as String, and I should retrieve it as such, either with rs.getObject(1).asInstance[String] or as convenient shorthand with rs.getString(1).
It would actually be incorrect to not include Array, Map, Struct, because we do return them in ResultSet schema (through SparkExecuteStatement.getTableSchema), so the client can get these type returned, and for any type that can be returned to the client there should be an entry in GetTypeInfo.
We therefore should not include INTERVAL (because we explicitly turn it to String return type after #25277), and not include UNIONTYPE or USER_DEFINED because they don't have any Spark equivalent, but ARRAY, MAP and STRUCT should be there.
Thank you for the explanation 👍

@juliuszsompolski

case Types.OTHER: case Types.JAVA_OBJECT: { switch (hiveType) { case INTERVAL_YEAR_MONTH_TYPE: return HiveIntervalYearMonth.class.getName(); case INTERVAL_DAY_TIME_TYPE: return HiveIntervalDayTime.class.getName(); default: return String.class.getName(); } }

USER_DEFINED in java.sql.Types is OTHERS, in the convert progress, it's also converted to String , same as ARRAY, MAP, STRUCT.
Maybe we should add USER_DEFINED.

But Spark will never return a USER_DEFINED type.
The current implementation of org.apache.spark.sql.types.UserDefinedType will return the underlying sqlType.simpleString as it's catalogString, so Thriftserver queries will return the underlying type in the schema.
Hence for USER_DEFINED (and UNIONTYPE) the argument is not that they wouldn't potentially work, but that Spark does not use them.

But Spark will never return a USER_DEFINED type.
The current implementation of org.apache.spark.sql.types.UserDefinedType will return the underlying sqlType.simpleString as it's catalogString, so Thriftserver queries will return the underlying type in the schema.
Hence for USER_DEFINED (and UNIONTYPE) the argument is not that they wouldn't potentially work, but that Spark does not use them.

Remove it and resolve conflicts.

wangyum

Could we add a test for this change, e.g.

    def checkResult(rs: ResultSet, typeNames: Seq[String]): Unit = {
      for (i <- typeNames.indices) {
        assert(rs.next())
        assert(rs.getString("TYPE_NAME") === typeNames(i))
      }
      // Make sure there are no more elements
      assert(!rs.next())
    }

    withJdbcStatement() { statement =>
      val metaData = statement.getConnection.getMetaData
      checkResult(metaData.getTypeInfo, ThriftserverShimUtils.supportedType().map(_.getName))
    }

...ver/v1.2.1/src/main/scala/org/apache/spark/sql/hive/thriftserver/ThriftserverShimUtils.scala

wangyum · 2019-09-08T09:24:47Z

ok to test

SparkQA · 2019-09-08T09:56:06Z

Test build #110306 has finished for PR 25694 at commit 068352e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…nto SPARK-28982

juliuszsompolski

LGTM
Thank you @AngersZhuuuu and @wangyum for a great discussion and examples that helped me learn the interfaces of Java JDBC java.sql.ResultSetMetaData to understand which types should GetTypeInfo return!

AngersZhuuuu · 2019-09-10T10:11:09Z

LGTM
Thank you @AngersZhuuuu and @wangyum for a great discussion and examples that helped me learn the interfaces of Java JDBC java.sql.ResultSetMetaData to understand which types should GetTypeInfo return!

Thank you all the same, your deep question prompted me to dive into these questions. Also learn a lot for the JDBC process.

SparkQA · 2019-09-10T10:18:50Z

Test build #110413 has finished for PR 25694 at commit 3c488fc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2019-09-10T10:23:30Z

retest this please

SparkQA · 2019-09-10T10:49:52Z

Test build #110414 has finished for PR 25694 at commit 3c488fc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2019-09-10T16:18:50Z

Result:

Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.3.6 by Apache Hive
0: jdbc:hive2://localhost:10000> !typeinfo
+------------+------------+------------+-----------------+-----------------+----------------+-----------+-----------------+-------------+---------------------+-------------------+-----------------+------------------+----------------+----------------+----------------+-------------------+-----------------+
| TYPE_NAME  | DATA_TYPE  | PRECISION  | LITERAL_PREFIX  | LITERAL_SUFFIX  | CREATE_PARAMS  | NULLABLE  | CASE_SENSITIVE  | SEARCHABLE  | UNSIGNED_ATTRIBUTE  | FIXED_PREC_SCALE  | AUTO_INCREMENT  | LOCAL_TYPE_NAME  | MINIMUM_SCALE  | MAXIMUM_SCALE  | SQL_DATA_TYPE  | SQL_DATETIME_SUB  | NUM_PREC_RADIX  |
+------------+------------+------------+-----------------+-----------------+----------------+-----------+-----------------+-------------+---------------------+-------------------+-----------------+------------------+----------------+----------------+----------------+-------------------+-----------------+
| VOID       | 0          | NULL       | NULL            | NULL            | NULL           | 1         | false           | 3           | true                | false             | false           | NULL             | 0              | 0              | NULL           | NULL              | NULL            |
| BOOLEAN    | 16         | NULL       | NULL            | NULL            | NULL           | 1         | false           | 3           | true                | false             | false           | NULL             | 0              | 0              | NULL           | NULL              | NULL            |
| STRING     | 12         | NULL       | NULL            | NULL            | NULL           | 1         | true            | 3           | true                | false             | false           | NULL             | 0              | 0              | NULL           | NULL              | NULL            |
| BINARY     | -2         | NULL       | NULL            | NULL            | NULL           | 1         | false           | 3           | true                | false             | false           | NULL             | 0              | 0              | NULL           | NULL              | NULL            |
| TINYINT    | -6         | 3          | NULL            | NULL            | NULL           | 1         | false           | 3           | false               | false             | false           | NULL             | 0              | 0              | NULL           | NULL              | 10              |
| SMALLINT   | 5          | 5          | NULL            | NULL            | NULL           | 1         | false           | 3           | false               | false             | false           | NULL             | 0              | 0              | NULL           | NULL              | 10              |
| INT        | 4          | 10         | NULL            | NULL            | NULL           | 1         | false           | 3           | false               | false             | false           | NULL             | 0              | 0              | NULL           | NULL              | 10              |
| BIGINT     | -5         | 19         | NULL            | NULL            | NULL           | 1         | false           | 3           | false               | false             | false           | NULL             | 0              | 0              | NULL           | NULL              | 10              |
| FLOAT      | 6          | 7          | NULL            | NULL            | NULL           | 1         | false           | 3           | false               | false             | false           | NULL             | 0              | 0              | NULL           | NULL              | 10              |
| DOUBLE     | 8          | 15         | NULL            | NULL            | NULL           | 1         | false           | 3           | false               | false             | false           | NULL             | 0              | 0              | NULL           | NULL              | 10              |
| DECIMAL    | 3          | 38         | NULL            | NULL            | NULL           | 1         | false           | 3           | false               | false             | false           | NULL             | 0              | 0              | NULL           | NULL              | 10              |
| DATE       | 91         | NULL       | NULL            | NULL            | NULL           | 1         | false           | 3           | true                | false             | false           | NULL             | 0              | 0              | NULL           | NULL              | NULL            |
| TIMESTAMP  | 93         | NULL       | NULL            | NULL            | NULL           | 1         | false           | 3           | true                | false             | false           | NULL             | 0              | 0              | NULL           | NULL              | NULL            |
| ARRAY      | 2003       | NULL       | NULL            | NULL            | NULL           | 1         | false           | 0           | true                | false             | false           | NULL             | 0              | 0              | NULL           | NULL              | NULL            |
| MAP        | 2000       | NULL       | NULL            | NULL            | NULL           | 1         | false           | 0           | true                | false             | false           | NULL             | 0              | 0              | NULL           | NULL              | NULL            |
| STRUCT     | 2002       | NULL       | NULL            | NULL            | NULL           | 1         | false           | 0           | true                | false             | false           | NULL             | 0              | 0              | NULL           | NULL              | NULL            |
+------------+------------+------------+-----------------+-----------------+----------------+-----------+-----------------+-------------+---------------------+-------------------+-----------------+------------------+----------------+----------------+----------------+-------------------+-----------------+

wangyum · 2019-09-10T16:24:57Z

Thank you @AngersZhuuuu and @juliuszsompolski

wangyum · 2019-09-10T16:25:08Z

Merged to master

### What changes were proposed in this pull request? Current Spark Thrift Server return TypeInfo includes 1. INTERVAL_YEAR_MONTH 2. INTERVAL_DAY_TIME 3. UNION 4. USER_DEFINED Spark doesn't support INTERVAL_YEAR_MONTH, INTERVAL_YEAR_MONTH, UNION and won't return USER)DEFINED type. This PR overwrite GetTypeInfoOperation with SparkGetTypeInfoOperation to exclude types which we don't need. In hive-1.2.1 Type class is `org.apache.hive.service.cli.Type` In hive-2.3.x Type class is `org.apache.hadoop.hive.serde2.thrift.Type` Use ThrifrserverShimUtils to fit version problem and exclude types we don't need ### Why are the changes needed? We should return type info of Spark's own type info ### Does this PR introduce any user-facing change? No ### How was this patch tested? Manuel test & Added UT Closes apache#25694 from AngersZhuuuu/SPARK-28982. Lead-authored-by: angerszhu <[email protected]> Co-authored-by: AngersZhuuuu <[email protected]> Signed-off-by: Yuming Wang <[email protected]>

AngersZhuuuu added 2 commits September 5, 2019 15:28

add spark's type

f61d7c2

fix method

1d47dd1

AngersZhuuuu added 4 commits September 5, 2019 21:08

Remove ThrifserverShimUtils's toJavaSQLType

2ec8565

Revert "Remove ThrifserverShimUtils's toJavaSQLType"

d202628

This reverts commit 2ec8565.

change name INT to INTEGER

26de905

change name of INTEGER

945074d

dongjoon-hyun added the SQL label Sep 5, 2019

change method name

d3d2efc

juliuszsompolski reviewed Sep 6, 2019

View reviewed changes

Use ThriftserverShimUtils to show support Type

aa1d721

juliuszsompolski approved these changes Sep 6, 2019

View reviewed changes

Add empty line between function

9ee38eb

wangyum reviewed Sep 7, 2019

View reviewed changes

...ver/v1.2.1/src/main/scala/org/apache/spark/sql/hive/thriftserver/ThriftserverShimUtils.scala Show resolved Hide resolved

AngersZhuuuu added 3 commits September 8, 2019 09:23

add UT

64ad1f3

add ut

51cb636

change Type order

068352e

wangyum mentioned this pull request Sep 9, 2019

[SPARK-29019][WebUI] Improve tooltip JDBC/ODBC Server tab #25723

Closed

AngersZhuuuu added 3 commits September 10, 2019 17:48

Merge branch 'master' into SPARK-28982

1c919a0

remove USER_DEFINED_TYPE

35c5a80

Merge branch 'SPARK-28982' of https://github.com/AngersZhuuuu/spark i…

3c488fc

…nto SPARK-28982

juliuszsompolski approved these changes Sep 10, 2019

View reviewed changes

wangyum changed the title ~~[SPARK-28982][SQL] Implementation Spark's own GetTypeInfoOperation~~ [SPARK-28982][SQL][test-hadoop3.2] Implementation Spark's own GetTypeInfoOperation Sep 10, 2019

wangyum approved these changes Sep 10, 2019

View reviewed changes

wangyum changed the title ~~[SPARK-28982][SQL][test-hadoop3.2] Implementation Spark's own GetTypeInfoOperation~~ [SPARK-28982][SQL] Implementation Spark's own GetTypeInfoOperation Sep 10, 2019

wangyum closed this in 54d3f6e Sep 10, 2019

AngersZhuuuu mentioned this pull request Oct 31, 2019

[WIP][SPARK-29108][SQL] Add new module sql/thriftserver with all code and UT #26340

Closed

juliuszsompolski mentioned this pull request Aug 27, 2020

[SPARK-32696][SQL][test-hive1.2][test-hadoop2.7]Get columns operation should handle interval column properly #29539

Closed

[SPARK-28982][SQL] Implementation Spark's own GetTypeInfoOperation #25694

[SPARK-28982][SQL] Implementation Spark's own GetTypeInfoOperation #25694

Uh oh!

Conversation

AngersZhuuuu commented Sep 5, 2019 • edited by wangyum Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

AngersZhuuuu commented Sep 5, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juliuszsompolski commented Sep 6, 2019

Uh oh!

AngersZhuuuu commented Sep 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AngersZhuuuu commented Sep 6, 2019

Uh oh!

juliuszsompolski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AngersZhuuuu commented Sep 7, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AngersZhuuuu Sep 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangyum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wangyum commented Sep 8, 2019

AngersZhuuuu commented Sep 5, 2019 •

edited by wangyum

Loading

AngersZhuuuu commented Sep 6, 2019 •

edited

Loading

AngersZhuuuu Sep 8, 2019 •

edited

Loading