Skip to content

Conversation

@drewrobb
Copy link
Contributor

@drewrobb drewrobb commented Jul 29, 2017

What changes were proposed in this pull request?

This change fixes a particular scenario where default spark SQL can't encode (thrift) types that are generated by twitter scrooge. In particular, these types have the pattern of being a trait with a constructor defined only in a companion object, rather than a case class. The situation is described in the original issue in more detail. In this case, the type has no corresponding constructor symbol and causes an exception. This change catches the case where the type has no constructor and looks for an apply method on the type's companion object. There might be situations in which this approach would also fail in a new way, but it does work for the specific scrooge example.

Note: this fix does not enable using scrooge thrift enums, additional work for this is necessary.

How was this patch tested?

I've added a single new test that failed prior to my fix. The failure is shown below. The test includes an example type that resembles what scrooge generates. For an full scrooge example, see https://gist.github.com/anonymous/ba13d4b612396ca72725eaa989900314

[info] - SPARK-8288 *** FAILED *** (7 milliseconds)
[info]   scala.ScalaReflectionException: <none> is not a term
[info]   at scala.reflect.api.Symbols$SymbolApi$class.asTerm(Symbols.scala:199)
[info]   at scala.reflect.internal.Symbols$SymbolContextApiImpl.asTerm(Symbols.scala:84)
[info]   at org.apache.spark.sql.catalyst.ScalaReflection$class.constructParams(ScalaReflection.scala:872)

@SparkQA
Copy link

SparkQA commented Jul 29, 2017

Test build #80032 has finished for PR 18766 at commit f1cda4c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@drewrobb
Copy link
Contributor Author

fyi @liancheng this was entirely your suggested fix

@gatorsmile
Copy link
Member

cc @liancheng Should we still support this?

@srowen
Copy link
Member

srowen commented May 18, 2018

Is the outcome here that this is an incomplete fix? should it be closed?

@srowen srowen mentioned this pull request Jul 3, 2018
@asfgit asfgit closed this in 5bf95f2 Jul 4, 2018
asfgit pushed a commit that referenced this pull request Nov 21, 2018
## What changes were proposed in this pull request?

This change fixes a particular scenario where default spark SQL can't encode (thrift) types that are generated by twitter scrooge. These types are a trait that extends `scala.ProductX` with a constructor defined only in a companion object, rather than a actual case class. The actual case class used is child class, but that type is almost never referred to in code. The type has no corresponding constructor symbol and causes an exception. For all other purposes, these classes act just like case classes, so it is unfortunate that spark SQL can't serialize them nicely as it can actual case classes. For an full example of a scrooge codegen class, see https://gist.github.com/anonymous/ba13d4b612396ca72725eaa989900314.

This change catches the case where the type has no constructor but does have an `apply` method on the type's companion object. This allows for thrift types to be serialized/deserialized with implicit encoders the same way as normal case classes. This fix had to be done in three places where the constructor is assumed to be an actual constructor:

1) In serializing, determining the schema for the dataframe relies on inspecting its constructor (`ScalaReflection.constructParams`). Here we fall back to using the companion constructor arguments.
2) In deserializing or evaluating, in the java codegen ( `NewInstance.doGenCode`), the type couldn't be constructed with the new keyword. If there is no constructor, we change the constructor call to try the companion constructor.
3)  In deserializing or evaluating, without codegen, the constructor is directly invoked (`NewInstance.constructor`). This was fixed with scala reflection to get the actual companion apply method.

The return type of `findConstructor` was changed because the companion apply method constructor can't be represented as a `java.lang.reflect.Constructor`.

There might be situations in which this approach would also fail in a new way, but it does at a minimum work for the specific scrooge example and will not impact cases that were already succeeding prior to this change

Note: this fix does not enable using scrooge thrift enums, additional work for this is necessary. With this patch, it seems like you could patch `com.twitter.scrooge.ThriftEnum` to extend `_root_.scala.Product1[Int]` with `def _1 = value` to get spark's implicit encoders to handle enums, but I've yet to use this method myself.

Note: I previously opened a PR for this issue, but only was able to fix case 1) there: #18766

## How was this patch tested?

I've fixed all 3 cases and added two tests that use a case class that is similar to scrooge generated one. The test in ScalaReflectionSuite checks 1), and the additional asserting in ObjectExpressionsSuite checks 2) and 3).

Closes #23062 from drewrobb/SPARK-8288.

Authored-by: Drew Robb <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

This change fixes a particular scenario where default spark SQL can't encode (thrift) types that are generated by twitter scrooge. These types are a trait that extends `scala.ProductX` with a constructor defined only in a companion object, rather than a actual case class. The actual case class used is child class, but that type is almost never referred to in code. The type has no corresponding constructor symbol and causes an exception. For all other purposes, these classes act just like case classes, so it is unfortunate that spark SQL can't serialize them nicely as it can actual case classes. For an full example of a scrooge codegen class, see https://gist.github.com/anonymous/ba13d4b612396ca72725eaa989900314.

This change catches the case where the type has no constructor but does have an `apply` method on the type's companion object. This allows for thrift types to be serialized/deserialized with implicit encoders the same way as normal case classes. This fix had to be done in three places where the constructor is assumed to be an actual constructor:

1) In serializing, determining the schema for the dataframe relies on inspecting its constructor (`ScalaReflection.constructParams`). Here we fall back to using the companion constructor arguments.
2) In deserializing or evaluating, in the java codegen ( `NewInstance.doGenCode`), the type couldn't be constructed with the new keyword. If there is no constructor, we change the constructor call to try the companion constructor.
3)  In deserializing or evaluating, without codegen, the constructor is directly invoked (`NewInstance.constructor`). This was fixed with scala reflection to get the actual companion apply method.

The return type of `findConstructor` was changed because the companion apply method constructor can't be represented as a `java.lang.reflect.Constructor`.

There might be situations in which this approach would also fail in a new way, but it does at a minimum work for the specific scrooge example and will not impact cases that were already succeeding prior to this change

Note: this fix does not enable using scrooge thrift enums, additional work for this is necessary. With this patch, it seems like you could patch `com.twitter.scrooge.ThriftEnum` to extend `_root_.scala.Product1[Int]` with `def _1 = value` to get spark's implicit encoders to handle enums, but I've yet to use this method myself.

Note: I previously opened a PR for this issue, but only was able to fix case 1) there: apache#18766

## How was this patch tested?

I've fixed all 3 cases and added two tests that use a case class that is similar to scrooge generated one. The test in ScalaReflectionSuite checks 1), and the additional asserting in ObjectExpressionsSuite checks 2) and 3).

Closes apache#23062 from drewrobb/SPARK-8288.

Authored-by: Drew Robb <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
Closes apache#20932
Closes apache#17843
Closes apache#13477
Closes apache#14291
Closes apache#20919
Closes apache#17907
Closes apache#18766
Closes apache#20809
Closes apache#8849
Closes apache#21076
Closes apache#21507
Closes apache#21336
Closes apache#21681
Closes apache#21691

Author: Sean Owen <[email protected]>

Closes apache#21708 from srowen/CloseStalePRs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants