-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34596][SQL][2.4] Use Utils.getSimpleName to avoid hitting Malformed class name in NewInstance.doGenCode #31888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… class name in NewInstance.doGenCode Use `Utils.getSimpleName` to avoid hitting `Malformed class name` error in `NewInstance.doGenCode`. On older JDK versions (e.g. JDK8u), nested Scala classes may trigger `java.lang.Class.getSimpleName` to throw an `java.lang.InternalError: Malformed class name` error. In this particular case, creating an `ExpressionEncoder` on such a nested Scala class would create a `NewInstance` expression under the hood, which will trigger the problem during codegen. Similar to apache#29050, we should use Spark's `Utils.getSimpleName` utility function in place of `Class.getSimpleName` to avoid hitting the issue. There are two other occurrences of `java.lang.Class.getSimpleName` in the same file, but they're safe because they're only guaranteed to be only used on Java classes, which don't have this problem, e.g.: ```scala // Make a copy of the data if it's unsafe-backed def makeCopyIfInstanceOf(clazz: Class[_ <: Any], value: String) = s"$value instanceof ${clazz.getSimpleName}? ${value}.copy() : $value" val genFunctionValue: String = lambdaFunction.dataType match { case StructType(_) => makeCopyIfInstanceOf(classOf[UnsafeRow], genFunction.value) case ArrayType(_, _) => makeCopyIfInstanceOf(classOf[UnsafeArrayData], genFunction.value) case MapType(_, _, _) => makeCopyIfInstanceOf(classOf[UnsafeMapData], genFunction.value) case _ => genFunction.value } ``` The Unsafe-* family of types are all Java types, so they're okay. Fixes a bug that throws an error when using `ExpressionEncoder` on some nested Scala types, otherwise no changes. Added a test case to `org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite`. It'll fail on JDK8u before the fix, and pass after the fix. Closes apache#31709 from rednaxelafx/spark-34596-master. Authored-by: Kris Mok <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> (cherry picked from commit ecf4811) Signed-off-by: HyukjinKwon <[email protected]>
|
cc: @viirya @rednaxelafx NOTE: This PR comes from the discussion with @viirya #31747 (comment) |
|
@maropu Can you add few words to the PR description about why you change the test case? Thanks. |
|
lgtm if we clearly state why we change the test case. |
|
sure, I'll update it. |
@viirya okay, I updated it and please check it. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #136275 has finished for PR 31888 at commit
|
| val errMsg = intercept[RuntimeException] { | ||
| boundEncoder.fromRow(row) | ||
| }.getCause.getMessage | ||
| assert(errMsg.contains("failed to compile: ")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the compilation error?
[info] Cause: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 32, Column 124: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 32, Column 124: "org.apache.
spark.sql.catalyst.encoders.ExpressionEncoderSuite$MalformedClassObject$" has no member type "ExpressionEncoderSuite$MalformedClassObject$MalformedNameExample"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But by moving MalformedClassObject out of ExpressionEncoderSuite, the generated code can be compiled.
I'm not sure what this test wants to test actually. Am I confused or do I misunderstand it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the compilation error?
Yea, that's the expected error. Actually, the master has same error when useFallback = false:
Line 219 in d32bb4e
| useFallback = true) |
// master
[info] Cause: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 32, Column 124: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 32, Column 124: "org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$MalformedClassObject$" has no member type "ExpressionEncoderSuite$MalformedClassObject$MalformedNameExample"
[info] at org.apache.spark.sql.errors.QueryExecutionErrors$.compilerError(QueryExecutionErrors.scala:328)
But by moving MalformedClassObject out of ExpressionEncoderSuite, the generated code can be compiled.
IIUC the condition where the error happens is that MalformedClassObject is placed in ExpressionEncoderSuite as Kris said in the previous PR: #31709 (comment)
Actually, what's more interesting is the different rules of Java's nested classes and Scala's. Since there is no such
notion of a "companion object" or built-in "singleton object" notion on the Java language level, at least not in
Java 8, the Java language syntax for creating a new instance of a inner class object doesn't work well for Scala's
class Foo { object Bar { class Baz {...} } } kind of nesting, i.e. even when we do use the correct "simple name"
(e.g. MalformedNameExample), there would still be no valid Java syntax to create it (i.e. outerObj.new
InnerClass(...) doesn't work when outerObj is a Scala object)
cc: @rednaxelafx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks for the reference.
|
cc @cloud-fan @dongjoon-hyun too |
|
I will merge this tomorrow if no more comment. |
|
Thanks all. Merging to 2.4. |
…ormed class name in NewInstance.doGenCode ### What changes were proposed in this pull request? Use `Utils.getSimpleName` to avoid hitting `Malformed class name` error in `NewInstance.doGenCode`. NOTE: branch-2.4 does not have the interpreted implementation of `SafeProjection`, so it does not fall back into the interpreted mode if the compilation fails. Therefore, the test in this PR just checks that the compilation error happens instead of checking that the interpreted mode works well. This is the backport PR of #31709 and the credit should be rednaxelafx . ### Why are the changes needed? On older JDK versions (e.g. JDK8u), nested Scala classes may trigger `java.lang.Class.getSimpleName` to throw an `java.lang.InternalError: Malformed class name` error. In this particular case, creating an `ExpressionEncoder` on such a nested Scala class would create a `NewInstance` expression under the hood, which will trigger the problem during codegen. Similar to #29050, we should use Spark's `Utils.getSimpleName` utility function in place of `Class.getSimpleName` to avoid hitting the issue. There are two other occurrences of `java.lang.Class.getSimpleName` in the same file, but they're safe because they're only guaranteed to be only used on Java classes, which don't have this problem, e.g.: ```scala // Make a copy of the data if it's unsafe-backed def makeCopyIfInstanceOf(clazz: Class[_ <: Any], value: String) = s"$value instanceof ${clazz.getSimpleName}? ${value}.copy() : $value" val genFunctionValue: String = lambdaFunction.dataType match { case StructType(_) => makeCopyIfInstanceOf(classOf[UnsafeRow], genFunction.value) case ArrayType(_, _) => makeCopyIfInstanceOf(classOf[UnsafeArrayData], genFunction.value) case MapType(_, _, _) => makeCopyIfInstanceOf(classOf[UnsafeMapData], genFunction.value) case _ => genFunction.value } ``` The Unsafe-* family of types are all Java types, so they're okay. ### Does this PR introduce _any_ user-facing change? Fixes a bug that throws an error when using `ExpressionEncoder` on some nested Scala types, otherwise no changes. ### How was this patch tested? Added a test case to `org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite`. It'll fail on JDK8u before the fix, and pass after the fix. Closes #31888 from maropu/SPARK-34596-BRANCH2.4. Lead-authored-by: Takeshi Yamamuro <[email protected]> Co-authored-by: Kris Mok <[email protected]> Signed-off-by: Liang-Chi Hsieh <[email protected]>
What changes were proposed in this pull request?
Use
Utils.getSimpleNameto avoid hittingMalformed class nameerror inNewInstance.doGenCode.NOTE: branch-2.4 does not have the interpreted implementation of
SafeProjection, so it does not fall back into the interpreted mode if the compilation fails. Therefore, the test in this PR just checks that the compilation error happens instead of checking that the interpreted mode works well.This is the backport PR of #31709 and the credit should be @rednaxelafx .
Why are the changes needed?
On older JDK versions (e.g. JDK8u), nested Scala classes may trigger
java.lang.Class.getSimpleNameto throw anjava.lang.InternalError: Malformed class nameerror.In this particular case, creating an
ExpressionEncoderon such a nested Scala class would create aNewInstanceexpression under the hood, which will trigger the problem during codegen.Similar to #29050, we should use Spark's
Utils.getSimpleNameutility function in place ofClass.getSimpleNameto avoid hitting the issue.There are two other occurrences of
java.lang.Class.getSimpleNamein the same file, but they're safe because they're only guaranteed to be only used on Java classes, which don't have this problem, e.g.:The Unsafe-* family of types are all Java types, so they're okay.
Does this PR introduce any user-facing change?
Fixes a bug that throws an error when using
ExpressionEncoderon some nested Scala types, otherwise no changes.How was this patch tested?
Added a test case to
org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite. It'll fail on JDK8u before the fix, and pass after the fix.