Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Jul 28, 2019

What changes were proposed in this pull request?

bin/spark-shell support query interval value:

scala> spark.sql("SELECT interval 3 months 1 hours AS i").show(false)
+-------------------------+
|i                        |
+-------------------------+
|interval 3 months 1 hours|
+-------------------------+

But sbin/start-thriftserver.sh can't support query interval value:

0: jdbc:hive2://localhost:10000/default> SELECT interval 3 months 1 hours AS i;
Error: java.lang.IllegalArgumentException: Unrecognized type name: interval (state=,code=0)

This PR maps CalendarIntervalType to StringType for TableSchema to make Thriftserver support query interval value because we do not support INTERVAL_YEAR_MONTH type and INTERVAL_DAY_TIME:

INTERVAL_YEAR_MONTH_TYPE("INTERVAL_YEAR_MONTH",
java.sql.Types.OTHER,
TTypeId.INTERVAL_YEAR_MONTH_TYPE),
INTERVAL_DAY_TIME_TYPE("INTERVAL_DAY_TIME",
java.sql.Types.OTHER,
TTypeId.INTERVAL_DAY_TIME_TYPE),

SPARK-27791: Support SQL year-month INTERVAL type
SPARK-27793: Support SQL day-time INTERVAL type

How was this patch tested?

unit tests

@wangyum
Copy link
Member Author

wangyum commented Jul 28, 2019

Will file a JIRA later.

@wangyum
Copy link
Member Author

wangyum commented Jul 28, 2019

cc @juliuszsompolski

@SparkQA
Copy link

SparkQA commented Jul 28, 2019

Test build #108266 has finished for PR 25277 at commit a2a35f5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@juliuszsompolski
Copy link
Contributor

cc @bogdanghit

@bogdanghit
Copy link
Contributor

Can you please add more details to the description of the PR?

@juliuszsompolski
Copy link
Contributor

This has been filed as SPARK-28637. Could you update the PR description?
If we want to return intervals as string, since Spark does not support actual Interval Year Month and Interval Day Second, then I think this PR is good to go.

But then, I think maybe we should actually override GetTypeInfoOperation.
Currently !typeinfo returns INTERVAL_YEAR_MONTH, INTERVAL_DAY_TIME, ARRAY, MAP, STRUCT, UNIONTYPE and USER_DEFINED, all of which Spark turns into string.
Maybe we should make SparkGetTypeInfoOperation, to exclude types which we don't support?

Then we would be missing only SparkGetCatalogsOperation, which we may add for completeness - e.g. for it to use the HiveThriftServer2.listener and show up in the UI.

WDYT @wangyum ?

@wangyum wangyum changed the title [WIP] Thriftserver support interval type [WIP][SPARK-28637][SQL] Thriftserver support interval type Aug 22, 2019
@wangyum
Copy link
Member Author

wangyum commented Aug 22, 2019

Thank you @juliuszsompolski I will try to override GetTypeInfoOperation.

@wangyum wangyum changed the title [WIP][SPARK-28637][SQL] Thriftserver support interval type [SPARK-28637][SQL] Thriftserver support interval type Aug 25, 2019
@SparkQA
Copy link

SparkQA commented Aug 25, 2019

Test build #109703 has finished for PR 25277 at commit d50f73c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 25, 2019

Test build #109704 has finished for PR 25277 at commit 28f226b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val attrTypeString = field.dataType match {
case NullType => "void"
case CalendarIntervalType => "string"
case other => other.catalogString
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to Interval itself, but shouldn't it also return "string" for case ArrayType | _: StructType | _: MapType | _: UserDefinedType[_] here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because Type does not have a type mapping to our CalendarIntervalType:

public enum Type {
NULL_TYPE("VOID",
java.sql.Types.NULL,
TTypeId.NULL_TYPE),
BOOLEAN_TYPE("BOOLEAN",
java.sql.Types.BOOLEAN,
TTypeId.BOOLEAN_TYPE),
TINYINT_TYPE("TINYINT",
java.sql.Types.TINYINT,
TTypeId.TINYINT_TYPE),
SMALLINT_TYPE("SMALLINT",
java.sql.Types.SMALLINT,
TTypeId.SMALLINT_TYPE),
INT_TYPE("INT",
java.sql.Types.INTEGER,
TTypeId.INT_TYPE),
BIGINT_TYPE("BIGINT",
java.sql.Types.BIGINT,
TTypeId.BIGINT_TYPE),
FLOAT_TYPE("FLOAT",
java.sql.Types.FLOAT,
TTypeId.FLOAT_TYPE),
DOUBLE_TYPE("DOUBLE",
java.sql.Types.DOUBLE,
TTypeId.DOUBLE_TYPE),
STRING_TYPE("STRING",
java.sql.Types.VARCHAR,
TTypeId.STRING_TYPE),
CHAR_TYPE("CHAR",
java.sql.Types.CHAR,
TTypeId.CHAR_TYPE,
true, false, false),
VARCHAR_TYPE("VARCHAR",
java.sql.Types.VARCHAR,
TTypeId.VARCHAR_TYPE,
true, false, false),
DATE_TYPE("DATE",
java.sql.Types.DATE,
TTypeId.DATE_TYPE),
TIMESTAMP_TYPE("TIMESTAMP",
java.sql.Types.TIMESTAMP,
TTypeId.TIMESTAMP_TYPE),
INTERVAL_YEAR_MONTH_TYPE("INTERVAL_YEAR_MONTH",
java.sql.Types.OTHER,
TTypeId.INTERVAL_YEAR_MONTH_TYPE),
INTERVAL_DAY_TIME_TYPE("INTERVAL_DAY_TIME",
java.sql.Types.OTHER,
TTypeId.INTERVAL_DAY_TIME_TYPE),
BINARY_TYPE("BINARY",
java.sql.Types.BINARY,
TTypeId.BINARY_TYPE),
DECIMAL_TYPE("DECIMAL",
java.sql.Types.DECIMAL,
TTypeId.DECIMAL_TYPE,
true, false, false),
ARRAY_TYPE("ARRAY",
java.sql.Types.ARRAY,
TTypeId.ARRAY_TYPE,
true, true),
MAP_TYPE("MAP",
java.sql.Types.JAVA_OBJECT,
TTypeId.MAP_TYPE,
true, true),
STRUCT_TYPE("STRUCT",
java.sql.Types.STRUCT,
TTypeId.STRUCT_TYPE,
true, false),
UNION_TYPE("UNIONTYPE",
java.sql.Types.OTHER,
TTypeId.UNION_TYPE,
true, false),
USER_DEFINED_TYPE("USER_DEFINED",
java.sql.Types.OTHER,
TTypeId.USER_DEFINED_TYPE,
true, false);

Otherwise:

Caused by: java.lang.IllegalArgumentException: Unrecognized type name: interval
	at org.apache.hive.service.cli.Type.getType(Type.java:169)
	at org.apache.hive.service.cli.TypeDescriptor.<init>(TypeDescriptor.java:53)
	at org.apache.hive.service.cli.ColumnDescriptor.<init>(ColumnDescriptor.java:53)
	at org.apache.hive.service.cli.TableSchema.<init>(TableSchema.java:52)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$.getTableSchema(SparkExecuteStatementOperation.scala:321)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.resultSchema$lzycompute(SparkExecuteStatementOperation.scala:70)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.resultSchema(SparkExecuteStatementOperation.scala:65)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getResultSetSchema(SparkExecuteStatementOperation.scala:161)
	at org.apache.hive.service.cli.operation.OperationManager.getOperationResultSetSchema(OperationManager.java:209)
	at org.apache.hive.service.cli.session.HiveSessionImpl.getResultSetMetadata(HiveSessionImpl.java:773)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the query actually returns a string for Arrays, Maps, Structs etc. Should it tell the client that it returns an ARRAY_TYPE or MAP_TYPE or STRUCT_TYPE if it in reality returns a string?

Copy link
Member Author

@wangyum wangyum Sep 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do not need to return string for ArrayType | _: StructType | _: MapType | _: UserDefinedType[_] because this will not change the actual return type. Just a workaround for the interval type:

0: jdbc:hive2://localhost:10000> SELECT interval '1' year '2' day AS i;
+--------------------------+--+
|            i             |
+--------------------------+--+
| interval 1 years 2 days  |
+--------------------------+--+
1 row selected (0.032 seconds)
0: jdbc:hive2://localhost:10000> DESC SELECT interval '1' year '2' day AS i;
+-----------+------------+----------+--+
| col_name  | data_type  | comment  |
+-----------+------------+----------+--+
| i         | interval   | NULL     |
+-----------+------------+----------+--+

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I understand now.
I was wondering how does it work that if you here create a TableSchema with a complex type like Array, and then this TableSchema is used through getResultSetSchema to create the resultRowSet, how does the RowSet know that we are in fact going to return String, and not an Array.
But it goes all the way down to Column in the constructed ColumnBasedSet in https://github.com/apache/spark/blob/master/sql/hive-thriftserver/v1.2.1/src/main/java/org/apache/hive/service/cli/Column.java#L123 that assumes String for any non-primitive type. (or to conversions in ColumnValue for RowBasedSet in older protocol versions https://github.com/apache/spark/blob/master/sql/hive-thriftserver/v1.2.1/src/main/java/org/apache/hive/service/cli/ColumnValue.java#L198).

Now I understand. Thank you 👍

Copy link
Contributor

@juliuszsompolski juliuszsompolski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

@wangyum
Copy link
Member Author

wangyum commented Sep 4, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Sep 4, 2019

Test build #110069 has finished for PR 25277 at commit 28f226b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum wangyum changed the title [SPARK-28637][SQL] Thriftserver support interval type [SPARK-28637][SQL][test-hadoop3.2] Thriftserver support interval type Sep 4, 2019
@wangyum
Copy link
Member Author

wangyum commented Sep 4, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Sep 4, 2019

Test build #110072 has finished for PR 25277 at commit 28f226b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 4, 2019

Test build #110074 has finished for PR 25277 at commit 59b3598.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum wangyum changed the title [SPARK-28637][SQL][test-hadoop3.2] Thriftserver support interval type [SPARK-28637][SQL] Thriftserver support interval type Sep 4, 2019
@wangyum
Copy link
Member Author

wangyum commented Sep 4, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Sep 4, 2019

Test build #110077 has finished for PR 25277 at commit 59b3598.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile gatorsmile closed this in 4a3a6b6 Sep 9, 2019
@wangyum wangyum deleted the Thriftserver-support-interval-type branch September 9, 2019 06:39
@gatorsmile
Copy link
Member

Thanks! Merged to master.

PavithraRamachandran pushed a commit to PavithraRamachandran/spark that referenced this pull request Sep 15, 2019
## What changes were proposed in this pull request?

`bin/spark-shell` support query interval value:
```scala
scala> spark.sql("SELECT interval 3 months 1 hours AS i").show(false)
+-------------------------+
|i                        |
+-------------------------+
|interval 3 months 1 hours|
+-------------------------+
```

But `sbin/start-thriftserver.sh` can't support query interval value:
```sql
0: jdbc:hive2://localhost:10000/default> SELECT interval 3 months 1 hours AS i;
Error: java.lang.IllegalArgumentException: Unrecognized type name: interval (state=,code=0)
```

This PR maps `CalendarIntervalType` to `StringType` for `TableSchema` to make Thriftserver support  query interval value because we do not support `INTERVAL_YEAR_MONTH` type and `INTERVAL_DAY_TIME`:
https://github.com/apache/spark/blob/02c33694c8254f69cb36c71c0876194dccdbc014/sql/hive-thriftserver/v1.2.1/src/main/java/org/apache/hive/service/cli/Type.java#L73-L78
[SPARK-27791](https://issues.apache.org/jira/browse/SPARK-27791): Support SQL year-month INTERVAL type
[SPARK-27793](https://issues.apache.org/jira/browse/SPARK-27793): Support SQL day-time INTERVAL type

## How was this patch tested?

unit tests

Closes apache#25277 from wangyum/Thriftserver-support-interval-type.

Authored-by: Yuming Wang <[email protected]>
Signed-off-by: Xiao Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants