[SPARK-19311][SQL] fix UDT hierarchy issue #16660

gmoehler · 2017-01-20T15:23:32Z

What changes were proposed in this pull request?

acceptType() in UDT will no only accept the same type but also all base types

How was this patch tested?

Manual test using a set of generated UDTs fixing acceptType() in my user defined types
*Update: I added a test case to https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala

Please review http://spark.apache.org/contributing.html before opening a pull request.

…9311

viirya · 2017-01-20T15:29:23Z

cc @cloud-fan @rxin Although UDT is now private API, some developers still use it by defining their codes as spark package. I am not sure if you think we need to fix this or not.

cloud-fan · 2017-01-21T02:51:59Z

is it possible to add a unit test? the change LGTM

viirya · 2017-01-21T03:28:10Z

LGTM too. @gmoehler Can you add an unit test?

gatorsmile · 2017-01-21T03:32:36Z

ok to test

gatorsmile · 2017-01-21T03:35:29Z

You can add a test case in https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala

SparkQA · 2017-01-21T06:18:21Z

Test build #71755 has finished for PR 16660 at commit 7ea9aa6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…ewhere? Test case failure is: - SPARK-19311: UDFs disregard UDT type hierarchy *** FAILED *** org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Max iterations (100) reached for batch Resolution, tree: Project [UDF(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(UDF(41) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType)) AS UDF(UDF(41))apache#166] +- SubqueryAlias tmp_table +- Project [_1#157 AS id#160, _2#158 AS saying#161] +- LocalRelation [_1#157, _2#158] at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:105) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:64) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:62) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:48) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) at org.apache.spark.sql.test.SQLTestUtils$$anonfun$sql$1.apply(SQLTestUtils.scala:61)

SparkQA · 2017-01-23T13:12:11Z

Test build #71841 has finished for PR 16660 at commit fad9f0e.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
sealed trait IExampleBaseType extends Serializable
class ExampleBaseClass(override val field: Int) extends IExampleBaseType
class ExampleSubClass(override val field: Int)

SparkQA · 2017-01-23T14:21:28Z

Test build #71846 has finished for PR 16660 at commit fb261d7.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-01-23T14:21:54Z

sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala


+// object and classes to test SPARK-19311
+
+  // Trait/Interface for base type


Please fix the indentation.

corrected in latest commit ... thanks for pointing out

SparkQA · 2017-01-23T14:30:27Z

Test build #71849 has finished for PR 16660 at commit 6b6b773.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

…ation & lf

cloud-fan · 2017-01-23T15:53:26Z

sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala

+
+// a base class
+class ExampleBaseClass(override val field: Int) extends IExampleBaseType {
+  override def toString: String = field.toString


why override toString here?

Please simplify it to

class ExampleBaseClass(override val field: Int) extends IExampleBaseType

@gmoehler I think we don't need toString?

SparkQA · 2017-01-23T17:24:10Z

Test build #71852 has finished for PR 16660 at commit 4ae0555.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
sealed trait IExampleBaseType extends Serializable
class ExampleBaseClass(override val field: Int) extends IExampleBaseType
class ExampleSubClass(override val field: Int)

gatorsmile · 2017-01-23T22:03:56Z

sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala

 import org.apache.spark.sql.test.SharedSQLContext
 import org.apache.spark.sql.types._

+


Nit: Please remove this empty line for avoiding unnecessary changes.

gatorsmile · 2017-01-23T22:04:59Z

sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala

+  extends ExampleBaseClass(field) with IExampleSubType
+
+// UDT for base class
+private[spark] class ExampleBaseTypeUDT extends UserDefinedType[IExampleBaseType] {


private[spark] is not needed here

gatorsmile · 2017-01-23T22:05:56Z

sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala

+
+  override def typeName: String = "exampleBaseType"
+
+  private[spark] override def asNullable: ExampleBaseTypeUDT = this


Nit: to simplify the test cases, please remove asNullable , typeName , equals and hashCode

gatorsmile · 2017-01-23T22:06:11Z

sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala

+  }
+
+  override def serialize(obj: IExampleSubType): InternalRow = {
+


Nit: Please remove this line. Thanks!

not sure which line you mean... we need to overwrite serialize in any case I guess - or is it about a blank line?

gatorsmile · 2017-01-23T22:07:18Z

sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala

+    datum match {
+      case row: InternalRow =>
+        require(row.numFields == 1,
+          s"VectorUDT.deserialize given row with length " +


Please remove this unneeded s String Interpolator. Thanks!

ok - replaced it with a fixed string

gatorsmile · 2017-01-23T22:07:41Z

sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala

+
+  override def typeName: String = "exampleSubType"
+
+  private[spark] override def asNullable: ExampleSubTypeUDT = this


Nit: to simplify the test cases, please remove asNullable, typeName, equals and hashCode too.

gatorsmile · 2017-01-23T22:07:57Z

sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala

+
+
 class UserDefinedTypeSuite extends QueryTest with SharedSQLContext with ParquetTest {
+


Nit: Please remove this empty line.

gatorsmile · 2017-01-23T22:11:36Z

It looks great! My comment is just to simplify the unit test cases. Normally, we want to make the unit test cases as simple as possible. Thanks!

LGTM except a few minor comments.

viirya · 2017-01-23T23:44:43Z

sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala

  }
+
+  test("SPARK-19311: UDFs disregard UDT type hierarchy") {
+    UDTRegistration.register(classOf[IExampleBaseType].getName,


With SQLUserDefinedType, no need to use UDTRegistration. We can remove this two lines.

i tend to leave them, but remove the @SQLUserDefinedType, so we have a test that uses UDTRegistration

oh. if you worry about that, actually we have UDTRegistrationSuite for test case of UDTRegistration. i am fine to either SQLUserDefinedType or UDTRegistration.

viirya · 2017-01-23T23:47:20Z

sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala

+
+  override def hashCode(): Int = classOf[ExampleSubTypeUDT].getName.hashCode()
+
+  override def equals(other: Any): Boolean = other.isInstanceOf[ExampleSubTypeUDT]


equals in UserDefinedType has default impl. to call acceptsType.

right - i have removed equals anyway due to a prev comment of @gatorsmile.

gmoehler · 2017-01-24T10:13:36Z

Thanks for the valuable (and fast!) comments - i have worked them in.

SparkQA · 2017-01-24T12:33:01Z

Test build #71929 has finished for PR 16660 at commit 7aed9a4.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class ExampleBaseTypeUDT extends UserDefinedType[IExampleBaseType]

viirya · 2017-01-24T13:32:57Z

LGTM except one minor comment.

gmoehler · 2017-01-24T15:58:20Z

@viirya Which comment are you referring to? I thought i had included all of them ;-)

gatorsmile · 2017-01-24T21:44:44Z

The remaining comment is: #16660 (comment)

gatorsmile · 2017-01-24T21:46:55Z

LGTM except one comment. Thanks for working on this!

viirya · 2017-01-25T00:25:56Z

@gmoehler ExampleBaseClass overrides toString impl. That is redundant.

gmoehler · 2017-01-25T09:22:10Z

Thanks for pointing out. I had overseen this comment.
Really appreciate you commitment to clean code!
Change is pushed and ready for a test build.

SparkQA · 2017-01-25T11:47:09Z

Test build #71980 has finished for PR 16660 at commit 6c16760.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class ExampleBaseClass(override val field: Int) extends IExampleBaseType

gatorsmile · 2017-01-25T16:13:58Z

LGTM

gatorsmile · 2017-01-25T16:18:18Z

Thanks! Merging to master/2.1

## What changes were proposed in this pull request? acceptType() in UDT will no only accept the same type but also all base types ## How was this patch tested? Manual test using a set of generated UDTs fixing acceptType() in my user defined types Please review http://spark.apache.org/contributing.html before opening a pull request. Author: gmoehler <[email protected]> Closes apache#16660 from gmoehler/master.

…e rows in ScalaUDF as well ### What changes were proposed in this pull request? This PR tries to address the comment: #28645 (comment) It changes `canUpCast/canCast` to allow cast from sub UDT to base UDT, in order to achieve the goal to allow UserDefinedType to use `ExpressionEncoder` to deserialize rows in ScalaUDF as well. One thing that needs to mention is, even we allow cast from sub UDT to base UDT, it doesn't really do the cast in `Cast`. Because, yet, sub UDT and base UDT are considered as the same type(because of #16660), see: https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/sql/catalyst/src/main/scala/org/apache/spark/sql/types/UserDefinedType.scala#L81-L86 https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/sql/catalyst/src/main/scala/org/apache/spark/sql/types/UserDefinedType.scala#L92-L95 Therefore, the optimize rule `SimplifyCast` will eliminate the cast at the end. ### Why are the changes needed? Reduce the special case caused by `UserDefinedType` in `ResolveEncodersInUDF` and `ScalaUDF`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? It should be covered by the test of `SPARK-19311`, which is also updated a little in this PR. Closes #28920 from Ngone51/fix-udf-udt. Authored-by: yi.wu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

fix UDT hierarchy issue https://issues.apache.org/jira/browse/SPARK-1…

7ea9aa6

…9311

minor corrections of unit test case

fb261d7

viirya reviewed Jan 23, 2017

View reviewed changes

re-organize imports to satisfy Scalastyle checks

6b6b773

re-organize imports to satisfy Scalastyle checks & add correct indend…

4ae0555

…ation & lf

cloud-fan reviewed Jan 23, 2017

View reviewed changes

gatorsmile reviewed Jan 23, 2017

View reviewed changes

viirya reviewed Jan 23, 2017

View reviewed changes

worked in comments regarding style & not required overrides

7aed9a4

remove unnecessary toString() method

6c16760

asfgit closed this in f6480b1 Jan 25, 2017

Ngone51 mentioned this pull request Jun 24, 2020

[SPARK-32087][SQL] Allow UserDefinedType to use encoder to deserialize rows in ScalaUDF as well #28920

Closed


		// object and classes to test SPARK-19311

		// Trait/Interface for base type

		import org.apache.spark.sql.test.SharedSQLContext
		import org.apache.spark.sql.types._


		override def typeName: String = "exampleBaseType"

		private[spark] override def asNullable: ExampleBaseTypeUDT = this

		}

		override def serialize(obj: IExampleSubType): InternalRow = {



		class UserDefinedTypeSuite extends QueryTest with SharedSQLContext with ParquetTest {


		override def hashCode(): Int = classOf[ExampleSubTypeUDT].getName.hashCode()

		override def equals(other: Any): Boolean = other.isInstanceOf[ExampleSubTypeUDT]

[SPARK-19311][SQL] fix UDT hierarchy issue #16660

[SPARK-19311][SQL] fix UDT hierarchy issue #16660

Uh oh!

Conversation

gmoehler commented Jan 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

viirya commented Jan 20, 2017

Uh oh!

cloud-fan commented Jan 21, 2017

Uh oh!

viirya commented Jan 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatorsmile commented Jan 21, 2017

Uh oh!

gatorsmile commented Jan 21, 2017

Uh oh!

SparkQA commented Jan 21, 2017

Uh oh!

SparkQA commented Jan 23, 2017

Uh oh!

SparkQA commented Jan 23, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 23, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 23, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Jan 23, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gmoehler commented Jan 20, 2017 •

edited

Loading

viirya commented Jan 21, 2017 •

edited

Loading