Skip to content

Conversation

@gmoehler
Copy link

@gmoehler gmoehler commented Jan 20, 2017

What changes were proposed in this pull request?

acceptType() in UDT will no only accept the same type but also all base types

How was this patch tested?

Manual test using a set of generated UDTs fixing acceptType() in my user defined types
*Update: I added a test case to https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala

Please review http://spark.apache.org/contributing.html before opening a pull request.

@viirya
Copy link
Member

viirya commented Jan 20, 2017

cc @cloud-fan @rxin Although UDT is now private API, some developers still use it by defining their codes as spark package. I am not sure if you think we need to fix this or not.

@cloud-fan
Copy link
Contributor

is it possible to add a unit test? the change LGTM

@viirya
Copy link
Member

viirya commented Jan 21, 2017

LGTM too. @gmoehler Can you add an unit test?

@gatorsmile
Copy link
Member

ok to test

@gatorsmile
Copy link
Member

@SparkQA
Copy link

SparkQA commented Jan 21, 2017

Test build #71755 has finished for PR 16660 at commit 7ea9aa6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…ewhere?

Test case failure is:
- SPARK-19311: UDFs disregard UDT type hierarchy *** FAILED ***
  org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Max iterations (100) reached for batch Resolution, tree:
Project [UDF(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(cast(UDF(41) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType) as exampleBaseType)) AS UDF(UDF(41))apache#166]
+- SubqueryAlias tmp_table
   +- Project [_1#157 AS id#160, _2#158 AS saying#161]
      +- LocalRelation [_1#157, _2#158]
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:105)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
  at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:64)
  at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:62)
  at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:48)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
  at org.apache.spark.sql.test.SQLTestUtils$$anonfun$sql$1.apply(SQLTestUtils.scala:61)
@SparkQA
Copy link

SparkQA commented Jan 23, 2017

Test build #71841 has finished for PR 16660 at commit fad9f0e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • sealed trait IExampleBaseType extends Serializable
  • class ExampleBaseClass(override val field: Int) extends IExampleBaseType
  • class ExampleSubClass(override val field: Int)

@SparkQA
Copy link

SparkQA commented Jan 23, 2017

Test build #71846 has finished for PR 16660 at commit fb261d7.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


// object and classes to test SPARK-19311

// Trait/Interface for base type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the indentation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected in latest commit ... thanks for pointing out

@SparkQA
Copy link

SparkQA commented Jan 23, 2017

Test build #71849 has finished for PR 16660 at commit 6b6b773.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


// a base class
class ExampleBaseClass(override val field: Int) extends IExampleBaseType {
override def toString: String = field.toString
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why override toString here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please simplify it to

class ExampleBaseClass(override val field: Int) extends IExampleBaseType

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmoehler I think we don't need toString?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@SparkQA
Copy link

SparkQA commented Jan 23, 2017

Test build #71852 has finished for PR 16660 at commit 4ae0555.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • sealed trait IExampleBaseType extends Serializable
  • class ExampleBaseClass(override val field: Int) extends IExampleBaseType
  • class ExampleSubClass(override val field: Int)

import org.apache.spark.sql.test.SharedSQLContext
import org.apache.spark.sql.types._


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Please remove this empty line for avoiding unnecessary changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

extends ExampleBaseClass(field) with IExampleSubType

// UDT for base class
private[spark] class ExampleBaseTypeUDT extends UserDefinedType[IExampleBaseType] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private[spark] is not needed here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


override def typeName: String = "exampleBaseType"

private[spark] override def asNullable: ExampleBaseTypeUDT = this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: to simplify the test cases, please remove asNullable , typeName , equals and hashCode

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

}

override def serialize(obj: IExampleSubType): InternalRow = {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Please remove this line. Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure which line you mean... we need to overwrite serialize in any case I guess - or is it about a blank line?

datum match {
case row: InternalRow =>
require(row.numFields == 1,
s"VectorUDT.deserialize given row with length " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this unneeded s String Interpolator. Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - replaced it with a fixed string


override def typeName: String = "exampleSubType"

private[spark] override def asNullable: ExampleSubTypeUDT = this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: to simplify the test cases, please remove asNullable, typeName, equals and hashCode too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done



class UserDefinedTypeSuite extends QueryTest with SharedSQLContext with ParquetTest {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Please remove this empty line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@gatorsmile
Copy link
Member

It looks great! My comment is just to simplify the unit test cases. Normally, we want to make the unit test cases as simple as possible. Thanks!

LGTM except a few minor comments.

}

test("SPARK-19311: UDFs disregard UDT type hierarchy") {
UDTRegistration.register(classOf[IExampleBaseType].getName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With SQLUserDefinedType, no need to use UDTRegistration. We can remove this two lines.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tend to leave them, but remove the @SQLUserDefinedType, so we have a test that uses UDTRegistration

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh. if you worry about that, actually we have UDTRegistrationSuite for test case of UDTRegistration. i am fine to either SQLUserDefinedType or UDTRegistration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


override def hashCode(): Int = classOf[ExampleSubTypeUDT].getName.hashCode()

override def equals(other: Any): Boolean = other.isInstanceOf[ExampleSubTypeUDT]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

equals in UserDefinedType has default impl. to call acceptsType.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right - i have removed equals anyway due to a prev comment of @gatorsmile.

@gmoehler
Copy link
Author

Thanks for the valuable (and fast!) comments - i have worked them in.

@SparkQA
Copy link

SparkQA commented Jan 24, 2017

Test build #71929 has finished for PR 16660 at commit 7aed9a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class ExampleBaseTypeUDT extends UserDefinedType[IExampleBaseType]

@viirya
Copy link
Member

viirya commented Jan 24, 2017

LGTM except one minor comment.

@gmoehler
Copy link
Author

@viirya Which comment are you referring to? I thought i had included all of them ;-)

@gatorsmile
Copy link
Member

The remaining comment is: #16660 (comment)

@gatorsmile
Copy link
Member

LGTM except one comment. Thanks for working on this!

@viirya
Copy link
Member

viirya commented Jan 25, 2017

@gmoehler ExampleBaseClass overrides toString impl. That is redundant.

@gmoehler
Copy link
Author

Thanks for pointing out. I had overseen this comment.
Really appreciate you commitment to clean code!
Change is pushed and ready for a test build.

@SparkQA
Copy link

SparkQA commented Jan 25, 2017

Test build #71980 has finished for PR 16660 at commit 6c16760.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class ExampleBaseClass(override val field: Int) extends IExampleBaseType

@gatorsmile
Copy link
Member

LGTM

@gatorsmile
Copy link
Member

Thanks! Merging to master/2.1

@asfgit asfgit closed this in f6480b1 Jan 25, 2017
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?
acceptType() in UDT will no only accept the same type but also all base types

## How was this patch tested?
Manual test using a set of generated UDTs fixing acceptType() in my user defined types

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: gmoehler <[email protected]>

Closes apache#16660 from gmoehler/master.
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
## What changes were proposed in this pull request?
acceptType() in UDT will no only accept the same type but also all base types

## How was this patch tested?
Manual test using a set of generated UDTs fixing acceptType() in my user defined types

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: gmoehler <[email protected]>

Closes apache#16660 from gmoehler/master.
cloud-fan pushed a commit that referenced this pull request Jun 24, 2020
…e rows in ScalaUDF as well

### What changes were proposed in this pull request?

This PR tries to address the comment: #28645 (comment)
It changes `canUpCast/canCast` to allow cast from sub UDT to base UDT, in order to achieve the goal to allow UserDefinedType to use `ExpressionEncoder` to deserialize rows in ScalaUDF as well.

One thing that needs to mention is, even we allow cast from sub UDT to base UDT, it doesn't really do the cast in `Cast`. Because, yet, sub UDT and base UDT are considered as the same type(because of #16660), see:

https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/sql/catalyst/src/main/scala/org/apache/spark/sql/types/UserDefinedType.scala#L81-L86

https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/sql/catalyst/src/main/scala/org/apache/spark/sql/types/UserDefinedType.scala#L92-L95

Therefore, the optimize rule `SimplifyCast` will eliminate the cast at the end.

### Why are the changes needed?

Reduce the special case caused by `UserDefinedType` in `ResolveEncodersInUDF` and `ScalaUDF`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

It should be covered by the test of `SPARK-19311`, which is also updated a little in this PR.

Closes #28920 from Ngone51/fix-udf-udt.

Authored-by: yi.wu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants