Skip to content

Commit dbce74d

Browse files
committed
[SPARK-34607][SQL] Add Utils.isMemberClass to fix a malformed class name error on jdk8u
### What changes were proposed in this pull request? This PR intends to fix a bug of `objects.NewInstance` if a user runs Spark on jdk8u and a given `cls` in `NewInstance` is a deeply-nested inner class, e.g.,. ``` object OuterLevelWithVeryVeryVeryLongClassName1 { object OuterLevelWithVeryVeryVeryLongClassName2 { object OuterLevelWithVeryVeryVeryLongClassName3 { object OuterLevelWithVeryVeryVeryLongClassName4 { object OuterLevelWithVeryVeryVeryLongClassName5 { object OuterLevelWithVeryVeryVeryLongClassName6 { object OuterLevelWithVeryVeryVeryLongClassName7 { object OuterLevelWithVeryVeryVeryLongClassName8 { object OuterLevelWithVeryVeryVeryLongClassName9 { object OuterLevelWithVeryVeryVeryLongClassName10 { object OuterLevelWithVeryVeryVeryLongClassName11 { object OuterLevelWithVeryVeryVeryLongClassName12 { object OuterLevelWithVeryVeryVeryLongClassName13 { object OuterLevelWithVeryVeryVeryLongClassName14 { object OuterLevelWithVeryVeryVeryLongClassName15 { object OuterLevelWithVeryVeryVeryLongClassName16 { object OuterLevelWithVeryVeryVeryLongClassName17 { object OuterLevelWithVeryVeryVeryLongClassName18 { object OuterLevelWithVeryVeryVeryLongClassName19 { object OuterLevelWithVeryVeryVeryLongClassName20 { case class MalformedNameExample2(x: Int) }}}}}}}}}}}}}}}}}}}} ``` The root cause that Kris (rednaxelafx) investigated is as follows (Kudos to Kris); The reason why the test case above is so convoluted is in the way Scala generates the class name for nested classes. In general, Scala generates a class name for a nested class by inserting the dollar-sign ( `$` ) in between each level of class nesting. The problem is that this format can concatenate into a very long string that goes beyond certain limits, so Scala will change the class name format beyond certain length threshold. For the example above, we can see that the first two levels of class nesting have class names that look like this: ``` org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassName1$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassName1$OuterLevelWithVeryVeryVeryLongClassName2$ ``` If we leave out the fact that Scala uses a dollar-sign ( `$` ) suffix for the class name of the companion object, `OuterLevelWithVeryVeryVeryLongClassName1`'s full name is a prefix (substring) of `OuterLevelWithVeryVeryVeryLongClassName2`. But if we keep going deeper into the levels of nesting, you'll find names that look like: ``` org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$2a1321b953c615695d7442b2adb1$$$$ryVeryLongClassName8$OuterLevelWithVeryVeryVeryLongClassName9$OuterLevelWithVeryVeryVeryLongClassName10$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$2a1321b953c615695d7442b2adb1$$$$ryVeryLongClassName8$OuterLevelWithVeryVeryVeryLongClassName9$OuterLevelWithVeryVeryVeryLongClassName10$OuterLevelWithVeryVeryVeryLongClassName11$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$85f068777e7ecf112afcbe997d461b$$$$VeryLongClassName11$OuterLevelWithVeryVeryVeryLongClassName12$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$85f068777e7ecf112afcbe997d461b$$$$VeryLongClassName11$OuterLevelWithVeryVeryVeryLongClassName12$OuterLevelWithVeryVeryVeryLongClassName13$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$85f068777e7ecf112afcbe997d461b$$$$VeryLongClassName11$OuterLevelWithVeryVeryVeryLongClassName12$OuterLevelWithVeryVeryVeryLongClassName13$OuterLevelWithVeryVeryVeryLongClassName14$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$5f7ad51804cb1be53938ea804699fa$$$$VeryLongClassName14$OuterLevelWithVeryVeryVeryLongClassName15$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$5f7ad51804cb1be53938ea804699fa$$$$VeryLongClassName14$OuterLevelWithVeryVeryVeryLongClassName15$OuterLevelWithVeryVeryVeryLongClassName16$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$5f7ad51804cb1be53938ea804699fa$$$$VeryLongClassName14$OuterLevelWithVeryVeryVeryLongClassName15$OuterLevelWithVeryVeryVeryLongClassName16$OuterLevelWithVeryVeryVeryLongClassName17$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$69b54f16b1965a31e88968df1a58d8$$$$VeryLongClassName17$OuterLevelWithVeryVeryVeryLongClassName18$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$69b54f16b1965a31e88968df1a58d8$$$$VeryLongClassName17$OuterLevelWithVeryVeryVeryLongClassName18$OuterLevelWithVeryVeryVeryLongClassName19$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$69b54f16b1965a31e88968df1a58d8$$$$VeryLongClassName17$OuterLevelWithVeryVeryVeryLongClassName18$OuterLevelWithVeryVeryVeryLongClassName19$OuterLevelWithVeryVeryVeryLongClassName20$ ``` with a hash code in the middle and various levels of nesting omitted. The `java.lang.Class.isMemberClass` method is implemented in JDK8u as: http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/tip/src/share/classes/java/lang/Class.java#l1425 ``` /** * Returns {code true} if and only if the underlying class * is a member class. * * return {code true} if and only if this class is a member class. * since 1.5 */ public boolean isMemberClass() { return getSimpleBinaryName() != null && !isLocalOrAnonymousClass(); } /** * Returns the "simple binary name" of the underlying class, i.e., * the binary name without the leading enclosing class name. * Returns {code null} if the underlying class is a top level * class. */ private String getSimpleBinaryName() { Class<?> enclosingClass = getEnclosingClass(); if (enclosingClass == null) // top level class return null; // Otherwise, strip the enclosing class' name try { return getName().substring(enclosingClass.getName().length()); } catch (IndexOutOfBoundsException ex) { throw new InternalError("Malformed class name", ex); } } ``` and the problematic code is `getName().substring(enclosingClass.getName().length())` -- if a class's enclosing class's full name is *longer* than the nested class's full name, this logic would end up going out of bounds. The bug has been fixed in JDK9 by https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8057919 , but still exists in the latest JDK8u release. So from the Spark side we'd need to do something to avoid hitting this problem. ### Why are the changes needed? Bugfix on jdk8u. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added tests. Closes apache#31733 from maropu/SPARK-34607. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>
1 parent 9ac5ee2 commit dbce74d

File tree

4 files changed

+100
-2
lines changed

4 files changed

+100
-2
lines changed

core/src/main/scala/org/apache/spark/util/Utils.scala

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2858,6 +2858,34 @@ private[spark] object Utils extends Logging {
28582858
Hex.encodeHexString(secretBytes)
28592859
}
28602860

2861+
/**
2862+
* Returns true if and only if the underlying class is a member class.
2863+
*
2864+
* Note: jdk8u throws a "Malformed class name" error if a given class is a deeply-nested
2865+
* inner class (See SPARK-34607 for details). This issue has already been fixed in jdk9+, so
2866+
* we can remove this helper method safely if we drop the support of jdk8u.
2867+
*/
2868+
def isMemberClass(cls: Class[_]): Boolean = {
2869+
try {
2870+
cls.isMemberClass
2871+
} catch {
2872+
case _: InternalError =>
2873+
// We emulate jdk8u `Class.isMemberClass` below:
2874+
// public boolean isMemberClass() {
2875+
// return getSimpleBinaryName() != null && !isLocalOrAnonymousClass();
2876+
// }
2877+
// `getSimpleBinaryName()` returns null if a given class is a top-level class,
2878+
// so we replace it with `cls.getEnclosingClass != null`. The second condition checks
2879+
// if a given class is not a local or an anonymous class, so we replace it with
2880+
// `cls.getEnclosingMethod == null` because `cls.getEnclosingMethod()` return a value
2881+
// only in either case (JVM Spec 4.8.6).
2882+
//
2883+
// Note: The newer jdk evaluates `!isLocalOrAnonymousClass()` first,
2884+
// we reorder the conditions to follow it.
2885+
cls.getEnclosingMethod == null && cls.getEnclosingClass != null
2886+
}
2887+
}
2888+
28612889
/**
28622890
* Safer than Class obj's getSimpleName which may throw Malformed class name error in scala.
28632891
* This method mimics scalatest's getSimpleNameOfAnObjectsClass.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/OuterScopes.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ object OuterScopes {
4848
* useful for inner class defined in REPL.
4949
*/
5050
def getOuterScope(innerCls: Class[_]): () => AnyRef = {
51-
assert(innerCls.isMemberClass)
51+
assert(Utils.isMemberClass(innerCls))
5252
val outerClassName = innerCls.getDeclaringClass.getName
5353
val outer = outerScopes.get(outerClassName)
5454
if (outer == null) {

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -444,7 +444,7 @@ case class NewInstance(
444444
// Note that static inner classes (e.g., inner classes within Scala objects) don't need
445445
// outer pointer registration.
446446
val needOuterPointer =
447-
outerPointer.isEmpty && cls.isMemberClass && !Modifier.isStatic(cls.getModifiers)
447+
outerPointer.isEmpty && Utils.isMemberClass(cls) && !Modifier.isStatic(cls.getModifiers)
448448
childrenResolved && !needOuterPointer
449449
}
450450

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -217,6 +217,76 @@ class ExpressionEncoderSuite extends CodegenInterpretedPlanTest with AnalysisTes
217217
"nested Scala class should work")
218218
}
219219

220+
object OuterLevelWithVeryVeryVeryLongClassName1 {
221+
object OuterLevelWithVeryVeryVeryLongClassName2 {
222+
object OuterLevelWithVeryVeryVeryLongClassName3 {
223+
object OuterLevelWithVeryVeryVeryLongClassName4 {
224+
object OuterLevelWithVeryVeryVeryLongClassName5 {
225+
object OuterLevelWithVeryVeryVeryLongClassName6 {
226+
object OuterLevelWithVeryVeryVeryLongClassName7 {
227+
object OuterLevelWithVeryVeryVeryLongClassName8 {
228+
object OuterLevelWithVeryVeryVeryLongClassName9 {
229+
object OuterLevelWithVeryVeryVeryLongClassName10 {
230+
object OuterLevelWithVeryVeryVeryLongClassName11 {
231+
object OuterLevelWithVeryVeryVeryLongClassName12 {
232+
object OuterLevelWithVeryVeryVeryLongClassName13 {
233+
object OuterLevelWithVeryVeryVeryLongClassName14 {
234+
object OuterLevelWithVeryVeryVeryLongClassName15 {
235+
object OuterLevelWithVeryVeryVeryLongClassName16 {
236+
object OuterLevelWithVeryVeryVeryLongClassName17 {
237+
object OuterLevelWithVeryVeryVeryLongClassName18 {
238+
object OuterLevelWithVeryVeryVeryLongClassName19 {
239+
object OuterLevelWithVeryVeryVeryLongClassName20 {
240+
case class MalformedNameExample(x: Int)
241+
}}}}}}}}}}}}}}}}}}}}
242+
243+
{
244+
OuterScopes.addOuterScope(
245+
OuterLevelWithVeryVeryVeryLongClassName1
246+
.OuterLevelWithVeryVeryVeryLongClassName2
247+
.OuterLevelWithVeryVeryVeryLongClassName3
248+
.OuterLevelWithVeryVeryVeryLongClassName4
249+
.OuterLevelWithVeryVeryVeryLongClassName5
250+
.OuterLevelWithVeryVeryVeryLongClassName6
251+
.OuterLevelWithVeryVeryVeryLongClassName7
252+
.OuterLevelWithVeryVeryVeryLongClassName8
253+
.OuterLevelWithVeryVeryVeryLongClassName9
254+
.OuterLevelWithVeryVeryVeryLongClassName10
255+
.OuterLevelWithVeryVeryVeryLongClassName11
256+
.OuterLevelWithVeryVeryVeryLongClassName12
257+
.OuterLevelWithVeryVeryVeryLongClassName13
258+
.OuterLevelWithVeryVeryVeryLongClassName14
259+
.OuterLevelWithVeryVeryVeryLongClassName15
260+
.OuterLevelWithVeryVeryVeryLongClassName16
261+
.OuterLevelWithVeryVeryVeryLongClassName17
262+
.OuterLevelWithVeryVeryVeryLongClassName18
263+
.OuterLevelWithVeryVeryVeryLongClassName19
264+
.OuterLevelWithVeryVeryVeryLongClassName20)
265+
encodeDecodeTest(
266+
OuterLevelWithVeryVeryVeryLongClassName1
267+
.OuterLevelWithVeryVeryVeryLongClassName2
268+
.OuterLevelWithVeryVeryVeryLongClassName3
269+
.OuterLevelWithVeryVeryVeryLongClassName4
270+
.OuterLevelWithVeryVeryVeryLongClassName5
271+
.OuterLevelWithVeryVeryVeryLongClassName6
272+
.OuterLevelWithVeryVeryVeryLongClassName7
273+
.OuterLevelWithVeryVeryVeryLongClassName8
274+
.OuterLevelWithVeryVeryVeryLongClassName9
275+
.OuterLevelWithVeryVeryVeryLongClassName10
276+
.OuterLevelWithVeryVeryVeryLongClassName11
277+
.OuterLevelWithVeryVeryVeryLongClassName12
278+
.OuterLevelWithVeryVeryVeryLongClassName13
279+
.OuterLevelWithVeryVeryVeryLongClassName14
280+
.OuterLevelWithVeryVeryVeryLongClassName15
281+
.OuterLevelWithVeryVeryVeryLongClassName16
282+
.OuterLevelWithVeryVeryVeryLongClassName17
283+
.OuterLevelWithVeryVeryVeryLongClassName18
284+
.OuterLevelWithVeryVeryVeryLongClassName19
285+
.OuterLevelWithVeryVeryVeryLongClassName20
286+
.MalformedNameExample(42),
287+
"deeply nested Scala class should work")
288+
}
289+
220290
productTest(PrimitiveData(1, 1, 1, 1, 1, 1, true))
221291

222292
productTest(

0 commit comments

Comments
 (0)