-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28637][SQL] Thriftserver support interval type #25277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Will file a JIRA later. |
|
Test build #108266 has finished for PR 25277 at commit
|
...r/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
Outdated
Show resolved
Hide resolved
|
cc @bogdanghit |
|
Can you please add more details to the description of the PR? |
|
This has been filed as SPARK-28637. Could you update the PR description? But then, I think maybe we should actually override GetTypeInfoOperation. Then we would be missing only SparkGetCatalogsOperation, which we may add for completeness - e.g. for it to use the WDYT @wangyum ? |
|
Thank you @juliuszsompolski I will try to override |
…ort-interval-type
|
Test build #109703 has finished for PR 25277 at commit
|
|
Test build #109704 has finished for PR 25277 at commit
|
| val attrTypeString = field.dataType match { | ||
| case NullType => "void" | ||
| case CalendarIntervalType => "string" | ||
| case other => other.catalogString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to Interval itself, but shouldn't it also return "string" for case ArrayType | _: StructType | _: MapType | _: UserDefinedType[_] here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because Type does not have a type mapping to our CalendarIntervalType:
spark/sql/hive-thriftserver/v1.2.1/src/main/java/org/apache/hive/service/cli/Type.java
Lines 31 to 105 in 02c3369
| public enum Type { | |
| NULL_TYPE("VOID", | |
| java.sql.Types.NULL, | |
| TTypeId.NULL_TYPE), | |
| BOOLEAN_TYPE("BOOLEAN", | |
| java.sql.Types.BOOLEAN, | |
| TTypeId.BOOLEAN_TYPE), | |
| TINYINT_TYPE("TINYINT", | |
| java.sql.Types.TINYINT, | |
| TTypeId.TINYINT_TYPE), | |
| SMALLINT_TYPE("SMALLINT", | |
| java.sql.Types.SMALLINT, | |
| TTypeId.SMALLINT_TYPE), | |
| INT_TYPE("INT", | |
| java.sql.Types.INTEGER, | |
| TTypeId.INT_TYPE), | |
| BIGINT_TYPE("BIGINT", | |
| java.sql.Types.BIGINT, | |
| TTypeId.BIGINT_TYPE), | |
| FLOAT_TYPE("FLOAT", | |
| java.sql.Types.FLOAT, | |
| TTypeId.FLOAT_TYPE), | |
| DOUBLE_TYPE("DOUBLE", | |
| java.sql.Types.DOUBLE, | |
| TTypeId.DOUBLE_TYPE), | |
| STRING_TYPE("STRING", | |
| java.sql.Types.VARCHAR, | |
| TTypeId.STRING_TYPE), | |
| CHAR_TYPE("CHAR", | |
| java.sql.Types.CHAR, | |
| TTypeId.CHAR_TYPE, | |
| true, false, false), | |
| VARCHAR_TYPE("VARCHAR", | |
| java.sql.Types.VARCHAR, | |
| TTypeId.VARCHAR_TYPE, | |
| true, false, false), | |
| DATE_TYPE("DATE", | |
| java.sql.Types.DATE, | |
| TTypeId.DATE_TYPE), | |
| TIMESTAMP_TYPE("TIMESTAMP", | |
| java.sql.Types.TIMESTAMP, | |
| TTypeId.TIMESTAMP_TYPE), | |
| INTERVAL_YEAR_MONTH_TYPE("INTERVAL_YEAR_MONTH", | |
| java.sql.Types.OTHER, | |
| TTypeId.INTERVAL_YEAR_MONTH_TYPE), | |
| INTERVAL_DAY_TIME_TYPE("INTERVAL_DAY_TIME", | |
| java.sql.Types.OTHER, | |
| TTypeId.INTERVAL_DAY_TIME_TYPE), | |
| BINARY_TYPE("BINARY", | |
| java.sql.Types.BINARY, | |
| TTypeId.BINARY_TYPE), | |
| DECIMAL_TYPE("DECIMAL", | |
| java.sql.Types.DECIMAL, | |
| TTypeId.DECIMAL_TYPE, | |
| true, false, false), | |
| ARRAY_TYPE("ARRAY", | |
| java.sql.Types.ARRAY, | |
| TTypeId.ARRAY_TYPE, | |
| true, true), | |
| MAP_TYPE("MAP", | |
| java.sql.Types.JAVA_OBJECT, | |
| TTypeId.MAP_TYPE, | |
| true, true), | |
| STRUCT_TYPE("STRUCT", | |
| java.sql.Types.STRUCT, | |
| TTypeId.STRUCT_TYPE, | |
| true, false), | |
| UNION_TYPE("UNIONTYPE", | |
| java.sql.Types.OTHER, | |
| TTypeId.UNION_TYPE, | |
| true, false), | |
| USER_DEFINED_TYPE("USER_DEFINED", | |
| java.sql.Types.OTHER, | |
| TTypeId.USER_DEFINED_TYPE, | |
| true, false); |
Otherwise:
Caused by: java.lang.IllegalArgumentException: Unrecognized type name: interval
at org.apache.hive.service.cli.Type.getType(Type.java:169)
at org.apache.hive.service.cli.TypeDescriptor.<init>(TypeDescriptor.java:53)
at org.apache.hive.service.cli.ColumnDescriptor.<init>(ColumnDescriptor.java:53)
at org.apache.hive.service.cli.TableSchema.<init>(TableSchema.java:52)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$.getTableSchema(SparkExecuteStatementOperation.scala:321)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.resultSchema$lzycompute(SparkExecuteStatementOperation.scala:70)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.resultSchema(SparkExecuteStatementOperation.scala:65)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getResultSetSchema(SparkExecuteStatementOperation.scala:161)
at org.apache.hive.service.cli.operation.OperationManager.getOperationResultSetSchema(OperationManager.java:209)
at org.apache.hive.service.cli.session.HiveSessionImpl.getResultSetMetadata(HiveSessionImpl.java:773)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the query actually returns a string for Arrays, Maps, Structs etc. Should it tell the client that it returns an ARRAY_TYPE or MAP_TYPE or STRUCT_TYPE if it in reality returns a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do not need to return string for ArrayType | _: StructType | _: MapType | _: UserDefinedType[_] because this will not change the actual return type. Just a workaround for the interval type:
0: jdbc:hive2://localhost:10000> SELECT interval '1' year '2' day AS i;
+--------------------------+--+
| i |
+--------------------------+--+
| interval 1 years 2 days |
+--------------------------+--+
1 row selected (0.032 seconds)
0: jdbc:hive2://localhost:10000> DESC SELECT interval '1' year '2' day AS i;
+-----------+------------+----------+--+
| col_name | data_type | comment |
+-----------+------------+----------+--+
| i | interval | NULL |
+-----------+------------+----------+--+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I understand now.
I was wondering how does it work that if you here create a TableSchema with a complex type like Array, and then this TableSchema is used through getResultSetSchema to create the resultRowSet, how does the RowSet know that we are in fact going to return String, and not an Array.
But it goes all the way down to Column in the constructed ColumnBasedSet in https://github.com/apache/spark/blob/master/sql/hive-thriftserver/v1.2.1/src/main/java/org/apache/hive/service/cli/Column.java#L123 that assumes String for any non-primitive type. (or to conversions in ColumnValue for RowBasedSet in older protocol versions https://github.com/apache/spark/blob/master/sql/hive-thriftserver/v1.2.1/src/main/java/org/apache/hive/service/cli/ColumnValue.java#L198).
Now I understand. Thank you 👍
…ort-interval-type
juliuszsompolski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you!
|
retest this please |
|
Test build #110069 has finished for PR 25277 at commit
|
|
retest this please |
|
Test build #110072 has finished for PR 25277 at commit
|
|
Test build #110074 has finished for PR 25277 at commit
|
|
retest this please |
|
Test build #110077 has finished for PR 25277 at commit
|
|
Thanks! Merged to master. |
## What changes were proposed in this pull request?
`bin/spark-shell` support query interval value:
```scala
scala> spark.sql("SELECT interval 3 months 1 hours AS i").show(false)
+-------------------------+
|i |
+-------------------------+
|interval 3 months 1 hours|
+-------------------------+
```
But `sbin/start-thriftserver.sh` can't support query interval value:
```sql
0: jdbc:hive2://localhost:10000/default> SELECT interval 3 months 1 hours AS i;
Error: java.lang.IllegalArgumentException: Unrecognized type name: interval (state=,code=0)
```
This PR maps `CalendarIntervalType` to `StringType` for `TableSchema` to make Thriftserver support query interval value because we do not support `INTERVAL_YEAR_MONTH` type and `INTERVAL_DAY_TIME`:
https://github.com/apache/spark/blob/02c33694c8254f69cb36c71c0876194dccdbc014/sql/hive-thriftserver/v1.2.1/src/main/java/org/apache/hive/service/cli/Type.java#L73-L78
[SPARK-27791](https://issues.apache.org/jira/browse/SPARK-27791): Support SQL year-month INTERVAL type
[SPARK-27793](https://issues.apache.org/jira/browse/SPARK-27793): Support SQL day-time INTERVAL type
## How was this patch tested?
unit tests
Closes apache#25277 from wangyum/Thriftserver-support-interval-type.
Authored-by: Yuming Wang <[email protected]>
Signed-off-by: Xiao Li <[email protected]>
What changes were proposed in this pull request?
bin/spark-shellsupport query interval value:But
sbin/start-thriftserver.shcan't support query interval value:This PR maps
CalendarIntervalTypetoStringTypeforTableSchemato make Thriftserver support query interval value because we do not supportINTERVAL_YEAR_MONTHtype andINTERVAL_DAY_TIME:spark/sql/hive-thriftserver/v1.2.1/src/main/java/org/apache/hive/service/cli/Type.java
Lines 73 to 78 in 02c3369
SPARK-27791: Support SQL year-month INTERVAL type
SPARK-27793: Support SQL day-time INTERVAL type
How was this patch tested?
unit tests