Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented Mar 19, 2021

What changes were proposed in this pull request?

Use Utils.getSimpleName to avoid hitting Malformed class name error in NewInstance.doGenCode.

NOTE: branch-2.4 does not have the interpreted implementation of SafeProjection, so it does not fall back into the interpreted mode if the compilation fails. Therefore, the test in this PR just checks that the compilation error happens instead of checking that the interpreted mode works well.

This is the backport PR of #31709 and the credit should be @rednaxelafx .

Why are the changes needed?

On older JDK versions (e.g. JDK8u), nested Scala classes may trigger java.lang.Class.getSimpleName to throw an java.lang.InternalError: Malformed class name error.
In this particular case, creating an ExpressionEncoder on such a nested Scala class would create a NewInstance expression under the hood, which will trigger the problem during codegen.

Similar to #29050, we should use Spark's Utils.getSimpleName utility function in place of Class.getSimpleName to avoid hitting the issue.

There are two other occurrences of java.lang.Class.getSimpleName in the same file, but they're safe because they're only guaranteed to be only used on Java classes, which don't have this problem, e.g.:

    // Make a copy of the data if it's unsafe-backed
    def makeCopyIfInstanceOf(clazz: Class[_ <: Any], value: String) =
      s"$value instanceof ${clazz.getSimpleName}? ${value}.copy() : $value"
    val genFunctionValue: String = lambdaFunction.dataType match {
      case StructType(_) => makeCopyIfInstanceOf(classOf[UnsafeRow], genFunction.value)
      case ArrayType(_, _) => makeCopyIfInstanceOf(classOf[UnsafeArrayData], genFunction.value)
      case MapType(_, _, _) => makeCopyIfInstanceOf(classOf[UnsafeMapData], genFunction.value)
      case _ => genFunction.value
    }

The Unsafe-* family of types are all Java types, so they're okay.

Does this PR introduce any user-facing change?

Fixes a bug that throws an error when using ExpressionEncoder on some nested Scala types, otherwise no changes.

How was this patch tested?

Added a test case to org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite. It'll fail on JDK8u before the fix, and pass after the fix.

rednaxelafx and others added 2 commits March 19, 2021 09:00
… class name in NewInstance.doGenCode

Use `Utils.getSimpleName` to avoid hitting `Malformed class name` error in `NewInstance.doGenCode`.

On older JDK versions (e.g. JDK8u), nested Scala classes may trigger `java.lang.Class.getSimpleName` to throw an `java.lang.InternalError: Malformed class name` error.
In this particular case, creating an `ExpressionEncoder` on such a nested Scala class would create a `NewInstance` expression under the hood, which will trigger the problem during codegen.

Similar to apache#29050, we should use  Spark's `Utils.getSimpleName` utility function in place of `Class.getSimpleName` to avoid hitting the issue.

There are two other occurrences of `java.lang.Class.getSimpleName` in the same file, but they're safe because they're only guaranteed to be only used on Java classes, which don't have this problem, e.g.:
```scala
    // Make a copy of the data if it's unsafe-backed
    def makeCopyIfInstanceOf(clazz: Class[_ <: Any], value: String) =
      s"$value instanceof ${clazz.getSimpleName}? ${value}.copy() : $value"
    val genFunctionValue: String = lambdaFunction.dataType match {
      case StructType(_) => makeCopyIfInstanceOf(classOf[UnsafeRow], genFunction.value)
      case ArrayType(_, _) => makeCopyIfInstanceOf(classOf[UnsafeArrayData], genFunction.value)
      case MapType(_, _, _) => makeCopyIfInstanceOf(classOf[UnsafeMapData], genFunction.value)
      case _ => genFunction.value
    }
```
The Unsafe-* family of types are all Java types, so they're okay.

Fixes a bug that throws an error when using `ExpressionEncoder` on some nested Scala types, otherwise no changes.

Added a test case to `org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite`. It'll fail on JDK8u before the fix, and pass after the fix.

Closes apache#31709 from rednaxelafx/spark-34596-master.

Authored-by: Kris Mok <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit ecf4811)
Signed-off-by: HyukjinKwon <[email protected]>
@maropu
Copy link
Member Author

maropu commented Mar 19, 2021

cc: @viirya @rednaxelafx

NOTE: This PR comes from the discussion with @viirya #31747 (comment)

@viirya
Copy link
Member

viirya commented Mar 19, 2021

@maropu Can you add few words to the PR description about why you change the test case? Thanks.

@viirya
Copy link
Member

viirya commented Mar 19, 2021

lgtm if we clearly state why we change the test case.

@maropu
Copy link
Member Author

maropu commented Mar 20, 2021

sure, I'll update it.

@maropu
Copy link
Member Author

maropu commented Mar 20, 2021

NOTE: branch-2.4 does not have the interpreted implementation of SafeProjection, so it does not fall back into 
the interpreted mode if the compilation fails. Therefore, the test in this PR just checks that the compilation 
error happens instead of checking that the interpreted mode works well.

@viirya okay, I updated it and please check it.

@SparkQA
Copy link

SparkQA commented Mar 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40857/

@SparkQA
Copy link

SparkQA commented Mar 20, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40857/

@SparkQA
Copy link

SparkQA commented Mar 20, 2021

Test build #136275 has finished for PR 31888 at commit 5152fff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val errMsg = intercept[RuntimeException] {
boundEncoder.fromRow(row)
}.getCause.getMessage
assert(errMsg.contains("failed to compile: "))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the compilation error?

[info]   Cause: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 32, Column 124: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 32, Column 124: "org.apache.
spark.sql.catalyst.encoders.ExpressionEncoderSuite$MalformedClassObject$" has no member type "ExpressionEncoderSuite$MalformedClassObject$MalformedNameExample"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But by moving MalformedClassObject out of ExpressionEncoderSuite, the generated code can be compiled.

I'm not sure what this test wants to test actually. Am I confused or do I misunderstand it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the compilation error?

Yea, that's the expected error. Actually, the master has same error when useFallback = false:

// master
[info]   Cause: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 32, Column 124: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 32, Column 124: "org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$MalformedClassObject$" has no member type "ExpressionEncoderSuite$MalformedClassObject$MalformedNameExample"
[info]   at org.apache.spark.sql.errors.QueryExecutionErrors$.compilerError(QueryExecutionErrors.scala:328)

But by moving MalformedClassObject out of ExpressionEncoderSuite, the generated code can be compiled.

IIUC the condition where the error happens is that MalformedClassObject is placed in ExpressionEncoderSuite as Kris said in the previous PR: #31709 (comment)

Actually, what's more interesting is the different rules of Java's nested classes and Scala's. Since there is no such
 notion of a "companion object" or built-in "singleton object" notion on the Java language level, at least not in 
Java 8, the Java language syntax for creating a new instance of a inner class object doesn't work well for Scala's
 class Foo { object Bar { class Baz {...} } } kind of nesting, i.e. even when we do use the correct "simple name" 
(e.g. MalformedNameExample), there would still be no valid Java syntax to create it (i.e. outerObj.new 
InnerClass(...) doesn't work when outerObj is a Scala object)

cc: @rednaxelafx

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks for the reference.

@viirya
Copy link
Member

viirya commented Mar 24, 2021

cc @cloud-fan @dongjoon-hyun too

@viirya
Copy link
Member

viirya commented Mar 24, 2021

I will merge this tomorrow if no more comment.

@viirya
Copy link
Member

viirya commented Mar 24, 2021

Thanks all. Merging to 2.4.

viirya pushed a commit that referenced this pull request Mar 24, 2021
…ormed class name in NewInstance.doGenCode

### What changes were proposed in this pull request?

Use `Utils.getSimpleName` to avoid hitting `Malformed class name` error in `NewInstance.doGenCode`.

NOTE: branch-2.4 does not have the interpreted implementation of `SafeProjection`, so it does not fall back into the interpreted mode if the compilation fails. Therefore, the test in this PR just checks that the compilation error happens instead of checking that the interpreted mode works well.

This is the backport PR of #31709 and the credit should be rednaxelafx .

### Why are the changes needed?

On older JDK versions (e.g. JDK8u), nested Scala classes may trigger `java.lang.Class.getSimpleName` to throw an `java.lang.InternalError: Malformed class name` error.
In this particular case, creating an `ExpressionEncoder` on such a nested Scala class would create a `NewInstance` expression under the hood, which will trigger the problem during codegen.

Similar to #29050, we should use  Spark's `Utils.getSimpleName` utility function in place of `Class.getSimpleName` to avoid hitting the issue.

There are two other occurrences of `java.lang.Class.getSimpleName` in the same file, but they're safe because they're only guaranteed to be only used on Java classes, which don't have this problem, e.g.:
```scala
    // Make a copy of the data if it's unsafe-backed
    def makeCopyIfInstanceOf(clazz: Class[_ <: Any], value: String) =
      s"$value instanceof ${clazz.getSimpleName}? ${value}.copy() : $value"
    val genFunctionValue: String = lambdaFunction.dataType match {
      case StructType(_) => makeCopyIfInstanceOf(classOf[UnsafeRow], genFunction.value)
      case ArrayType(_, _) => makeCopyIfInstanceOf(classOf[UnsafeArrayData], genFunction.value)
      case MapType(_, _, _) => makeCopyIfInstanceOf(classOf[UnsafeMapData], genFunction.value)
      case _ => genFunction.value
    }
```
The Unsafe-* family of types are all Java types, so they're okay.

### Does this PR introduce _any_ user-facing change?

Fixes a bug that throws an error when using `ExpressionEncoder` on some nested Scala types, otherwise no changes.

### How was this patch tested?

Added a test case to `org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite`. It'll fail on JDK8u before the fix, and pass after the fix.

Closes #31888 from maropu/SPARK-34596-BRANCH2.4.

Lead-authored-by: Takeshi Yamamuro <[email protected]>
Co-authored-by: Kris Mok <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
@viirya viirya closed this Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants