Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add empty line between function
  • Loading branch information
AngersZhuuuu committed Sep 7, 2019
commit 9ee38ebe394240be371511fc75551800d5d4e5da
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ private[thriftserver] object ThriftserverShimUtils {
Type.BIGINT_TYPE, Type.FLOAT_TYPE, Type.DOUBLE_TYPE, Type.STRING_TYPE, Type.DATE_TYPE,
Type.TIMESTAMP_TYPE, Type.DECIMAL_TYPE, Type.BINARY_TYPE)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we skip ARRAY_TYPE, MAP_TYPE, STRUCT_TYPE and USER_DEFINED_TYPE?

Copy link
Contributor Author

@AngersZhuuuu AngersZhuuuu Sep 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we skip ARRAY_TYPE, MAP_TYPE, STRUCT_TYPE and USER_DEFINED_TYPE?

Support this type just convert to string to show. Should add .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add these types. Hive-3.1.2 also converted these types to strings.
@juliuszsompolski What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the client handle it?
If you do

val stmt = conn.prepareStatement("SELECT array, map, struct, interval FROM table")
val rs = stmt.executeQuery()
val md = rs.getMetaData()

Then what does md.getColumnType(i) return for each of these columns?
What type of rs.getXXX call should the user use for each of these columns? For the array column, should it be rs.getArray(i) or rs.getString(i)?
What is the mapping of types returned by md.getColumnType(i), with the getters that should be used for them in rs.getXXX(i)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For getMetadataResult, it truly return ARRAY, MAP, STRUCT.
Return content is organized by HiveResult.toHiveString() method as each's DataType.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. We can not fully support these type. Please remove them @AngersZhuuuu
Thanks @juliuszsompolski for you example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, thanks for explaining it @AngersZhuuuu, and you convinced me that ARRAY, MAP and STRUCT must be included.

scala> md.getColumnType(1)
res11: Int = 2003 (== java.sql.Types.ARRAY)

but then

scala> md.getColumnClassName(1)
res10: String = java.lang.String

so that tells to the client that it is actually returned as String, and I should retrieve it as such, either with rs.getObject(1).asInstance[String] or as convenient shorthand with rs.getString(1).
It would actually be incorrect to not include Array, Map, Struct, because we do return them in ResultSet schema (through SparkExecuteStatement.getTableSchema), so the client can get these type returned, and for any type that can be returned to the client there should be an entry in GetTypeInfo.
We therefore should not include INTERVAL (because we explicitly turn it to String return type after #25277), and not include UNIONTYPE or USER_DEFINED because they don't have any Spark equivalent, but ARRAY, MAP and STRUCT should be there.
Thank you for the explanation 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juliuszsompolski

case Types.OTHER:
      case Types.JAVA_OBJECT: {
        switch (hiveType) {
          case INTERVAL_YEAR_MONTH_TYPE:
            return HiveIntervalYearMonth.class.getName();
          case INTERVAL_DAY_TIME_TYPE:
            return HiveIntervalDayTime.class.getName();
          default:
            return String.class.getName();
        }
      }

USER_DEFINED in java.sql.Types is OTHERS, in the convert progress, it's also converted to String , same as ARRAY, MAP, STRUCT.
Maybe we should add USER_DEFINED.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But Spark will never return a USER_DEFINED type.
The current implementation of org.apache.spark.sql.types.UserDefinedType will return the underlying sqlType.simpleString as it's catalogString, so Thriftserver queries will return the underlying type in the schema.
Hence for USER_DEFINED (and UNIONTYPE) the argument is not that they wouldn't potentially work, but that Spark does not use them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But Spark will never return a USER_DEFINED type.
The current implementation of org.apache.spark.sql.types.UserDefinedType will return the underlying sqlType.simpleString as it's catalogString, so Thriftserver queries will return the underlying type in the schema.
Hence for USER_DEFINED (and UNIONTYPE) the argument is not that they wouldn't potentially work, but that Spark does not use them.

Remove it and resolve conflicts.


private[thriftserver] def addToClassPath(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: empty line between functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add it .

loader: ClassLoader,
auxJars: Array[String]): ClassLoader = {
Expand Down