Skip to content

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Oct 10, 2022

What changes were proposed in this pull request?

This pr add -Djdk.reflect.useDirectMethodHandle=false to JavaModuleOptions and maven/sbt extraJavaTestArgs to make Spark use UnsafeFieldAccessor as default with Java 18/19 to avoid the bad case described in SPARK-40729 .

Why are the changes needed?

After JEP 416: Reimplement Core Reflection with Method Handles, MethodHandleAccessor 'is the default reflection implementation of Java, but in Spark it will cause the bad case mentioned in SPARK-40729, so add -Djdk.reflect.useDirectMethodHandle=false` as a workaround for Java 18/19.

Does this PR introduce any user-facing change?

No, The new option will not affect Java versions below 18

How was this patch tested?

  • Pass GitHub Actions
  • Manual test:
  1. run repl module test with Java 18/19

Before

- broadcast vars *** FAILED ***
  isContain was true Interpreter output contained 'Exception':
  Welcome to
        ____              __
       / __/__  ___ _____/ /__
      _\ \/ _ \/ _ `/ __/  '_/
     /___/ .__/\_,_/_/ /_/\_\   version 3.4.0-SNAPSHOT
        /_/
           
  Using Scala version 2.12.17 (OpenJDK 64-Bit Server VM, Java 19)
  Type in expressions to have them evaluated.
  Type :help for more information.
  
  scala> 
  scala> array: Array[Int] = Array(0, 0, 0, 0, 0)
  
  scala> broadcastArray: org.apache.spark.broadcast.Broadcast[Array[Int]] = Broadcast(0)
  
  scala> java.lang.InternalError: java.lang.IllegalAccessException: final field has no write access: $Lambda$3029/0x0000000801d80a30.arg$1/putField, from class java.lang.Object (module java.base)
    at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:167)
    at java.base/jdk.internal.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:145)
    at java.base/java.lang.reflect.Field.acquireOverrideFieldAccessor(Field.java:1184)
    at java.base/java.lang.reflect.Field.getOverrideFieldAccessor(Field.java:1153)
    at java.base/java.lang.reflect.Field.set(Field.java:820)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2502)
    at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:413)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:405)
    at org.apache.spark.rdd.RDD.map(RDD.scala:412)
    ... 93 elided
  Caused by: java.lang.IllegalAccessException: final field has no write access: $Lambda$3029/0x0000000801d80a30.arg$1/putField, from class java.lang.Object (module java.base)
    at java.base/java.lang.invoke.MemberName.makeAccessException(MemberName.java:955)
    at java.base/java.lang.invoke.MethodHandles$Lookup.unreflectField(MethodHandles.java:3511)
    at java.base/java.lang.invoke.MethodHandles$Lookup.unreflectSetter(MethodHandles.java:3502)
    at java.base/java.lang.invoke.MethodHandleImpl$1.unreflectField(MethodHandleImpl.java:1630)
    at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:145)
    ... 105 more
  
  scala> 
  scala> java.lang.InternalError: java.lang.IllegalAccessException: final field has no write access: $Lambda$3061/0x0000000801e01000.arg$1/putField, from class java.lang.Object (module java.base)
    at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:167)
    at java.base/jdk.internal.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:145)
    at java.base/java.lang.reflect.Field.acquireOverrideFieldAccessor(Field.java:1184)
    at java.base/java.lang.reflect.Field.getOverrideFieldAccessor(Field.java:1153)
    at java.base/java.lang.reflect.Field.set(Field.java:820)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2502)
    at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:413)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:405)
    at org.apache.spark.rdd.RDD.map(RDD.scala:412)
    ... 93 elided
  Caused by: java.lang.IllegalAccessException: final field has no write access: $Lambda$3061/0x0000000801e01000.arg$1/putField, from class java.lang.Object (module java.base)
    at java.base/java.lang.invoke.MemberName.makeAccessException(MemberName.java:955)
    at java.base/java.lang.invoke.MethodHandles$Lookup.unreflectField(MethodHandles.java:3511)
    at java.base/java.lang.invoke.MethodHandles$Lookup.unreflectSetter(MethodHandles.java:3502)
    at java.base/java.lang.invoke.MethodHandleImpl$1.unreflectField(MethodHandleImpl.java:1630)
    at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:145)
    ... 105 more
  
  scala>      | 
  scala> :quit (ReplSuite.scala:83)

After

Run completed in 1 minute, 12 seconds.
Total number of tests run: 44
Suites: completed 7, aborted 0
Tests: succeeded 44, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
  1. test spark-shell with Java 18/19:

Before

bin/spark-shell --master local
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/10/11 19:13:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/10/11 19:13:08 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark context Web UI available at http://localhost:4041
Spark context available as 'sc' (master = local, app id = local-1665486788733).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.4.0-SNAPSHOT
      /_/
         
Using Scala version 2.12.17 (OpenJDK 64-Bit Server VM, Java 18.0.2.1)
Type in expressions to have them evaluated.
Type :help for more information.

scala> :paste
// Entering paste mode (ctrl-D to finish)

var array = new Array[Int](5)
val broadcastArray = sc.broadcast(array)
sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect()
array(0) = 5
sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect()

// Exiting paste mode, now interpreting.

java.lang.InternalError: java.lang.IllegalAccessException: final field has no write access: $Lambda$2396/0x00000008015b2e70.arg$1/putField, from class java.lang.Object (module java.base)
  at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:167)
  at java.base/jdk.internal.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:176)
  at java.base/java.lang.reflect.Field.acquireOverrideFieldAccessor(Field.java:1184)
  at java.base/java.lang.reflect.Field.getOverrideFieldAccessor(Field.java:1153)
  at java.base/java.lang.reflect.Field.set(Field.java:820)
  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163)
  at org.apache.spark.SparkContext.clean(SparkContext.scala:2502)
  at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:413)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:405)
  at org.apache.spark.rdd.RDD.map(RDD.scala:412)
  ... 43 elided
Caused by: java.lang.IllegalAccessException: final field has no write access: $Lambda$2396/0x00000008015b2e70.arg$1/putField, from class java.lang.Object (module java.base)
  at java.base/java.lang.invoke.MemberName.makeAccessException(MemberName.java:955)
  at java.base/java.lang.invoke.MethodHandles$Lookup.unreflectField(MethodHandles.java:3494)
  at java.base/java.lang.invoke.MethodHandles$Lookup.unreflectSetter(MethodHandles.java:3485)
  at java.base/java.lang.invoke.MethodHandleImpl$1.unreflectField(MethodHandleImpl.java:1637)
  at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.jav

After

bin/spark-shell --master local
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/10/11 19:11:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/10/11 19:11:21 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark context Web UI available at http://localhost:4041
Spark context available as 'sc' (master = local, app id = local-1665486681920).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.4.0-SNAPSHOT
      /_/
         
Using Scala version 2.12.17 (OpenJDK 64-Bit Server VM, Java 18.0.2.1)
Type in expressions to have them evaluated.
Type :help for more information.

scala> :paste
// Entering paste mode (ctrl-D to finish)

var array = new Array[Int](5)
val broadcastArray = sc.broadcast(array)
sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect()
array(0) = 5
sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect()

// Exiting paste mode, now interpreting.

array: Array[Int] = Array(5, 0, 0, 0, 0)
broadcastArray: org.apache.spark.broadcast.Broadcast[Array[Int]] = Broadcast(0)
res0: Array[Int] = Array(5, 0, 0, 0, 0)

@LuciferYang LuciferYang changed the title [SPARK-40729][BUILD][SHELL] Add -Djdk.reflect.useDirectMethodHandle=false to spark-shell and maven/sbt test options [SPARK-40729][BUILD][SHELL] Use -Djdk.reflect.useDirectMethodHandle=false to make Java18/19 use UnsafeFieldAccessor instead of MethodHandleAccessor Oct 10, 2022
@LuciferYang LuciferYang marked this pull request as draft October 10, 2022 12:57
@LuciferYang LuciferYang changed the title [SPARK-40729][BUILD][SHELL] Use -Djdk.reflect.useDirectMethodHandle=false to make Java18/19 use UnsafeFieldAccessor instead of MethodHandleAccessor [WIP][SPARK-40729][BUILD][SHELL] Use -Djdk.reflect.useDirectMethodHandle=false to make Java18/19 use UnsafeFieldAccessor instead of MethodHandleAccessor Oct 10, 2022
@LuciferYang LuciferYang reopened this Oct 11, 2022
@LuciferYang
Copy link
Contributor Author

val outerField = func.getClass.getDeclaredField("arg$1")
// SPARK-37072: When Java 17 is used and `outerField` is read-only,
// the content of `outerField` cannot be set by reflect api directly.
// But we can remove the `final` modifier of `outerField` before set value
// and reset the modifier after set value.
val modifiersField = getFinalModifiersFieldForJava17(outerField)
modifiersField
.foreach(m => m.setInt(outerField, outerField.getModifiers & ~Modifier.FINAL))
outerField.setAccessible(true)
outerField.set(func, clonedOuterThis)
modifiersField
.foreach(m => m.setInt(outerField, outerField.getModifiers | Modifier.FINAL))

outerField.set(func, clonedOuterThis) threw java.lang.IllegalAccessException: final field has no write access: $Lambda$3061/0x0000000801e01000.arg$1/putField, from class java.lang.Object (module java.base), although we have deleted FINAL Modifier.

Any suggestions for code level fix? -Djdk.reflect.useDirectMethodHandle=false will be also deleted in the future

@LuciferYang LuciferYang changed the title [WIP][SPARK-40729][BUILD][SHELL] Use -Djdk.reflect.useDirectMethodHandle=false to make Java18/19 use UnsafeFieldAccessor instead of MethodHandleAccessor [WIP][SPARK-40729][BUILD][SHELL] Use -Djdk.reflect.useDirectMethodHandle=false to make Java18/19 use UnsafeFieldAccessor instead of MethodHandleAccessor when use Spark-Shell Oct 11, 2022
@LuciferYang LuciferYang changed the title [WIP][SPARK-40729][BUILD][SHELL] Use -Djdk.reflect.useDirectMethodHandle=false to make Java18/19 use UnsafeFieldAccessor instead of MethodHandleAccessor when use Spark-Shell [WIP][SPARK-40729][BUILD][SHELL] Use -Djdk.reflect.useDirectMethodHandle=false to make Java18/19 use UnsafeFieldAccessor in Spark-Shell Oct 11, 2022
@LuciferYang
Copy link
Contributor Author

cc @rednaxelafx

pom.xml Outdated
<filereports>SparkTestSuite.txt</filereports>
<argLine>-ea -Xmx4g -Xss4m -XX:MaxMetaspaceSize=2g -XX:ReservedCodeCacheSize=${CodeCacheSize} ${extraJavaTestArgs} -Dio.netty.tryReflectionSetAccessible=true</argLine>
<argLine>-ea -Xmx4g -Xss4m -XX:MaxMetaspaceSize=2g -XX:ReservedCodeCacheSize=${CodeCacheSize} ${extraJavaTestArgs} -Dio.netty.tryReflectionSetAccessible=true
-Djdk.reflect.useDirectMethodHandle=false</argLine>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the vanilla Scala REPL? Is this only Apache Spark REPL issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue seem to be that the ClosureCleaner is trying to set a final field which shouldn't be allowed. This is Spark-specific (and affects all other projects that copied Spark's ClosureCleaner), and doesn't exist in the vanilla Scala REPL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, only Apache Spark REPL issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thank you, @rednaxelafx and @LuciferYang .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is required, why don't we put this into JavaModuleOptions like the other Java options?

@LuciferYang
Copy link
Contributor Author

If this is required, why don't we put this into JavaModuleOptions like the other Java options?

Yes, this is a feasible way, let me try it.

@LuciferYang LuciferYang changed the title [WIP][SPARK-40729][BUILD][SHELL] Use -Djdk.reflect.useDirectMethodHandle=false to make Java18/19 use UnsafeFieldAccessor in Spark-Shell [WIP][SPARK-40729][CORE] Make Spark use UnsafeFieldAccessor as default with Java18/19 Oct 12, 2022
"--add-opens=java.base/sun.util.calendar=ALL-UNNAMED",
"--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED"};
"--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED",
"-Djdk.reflect.useDirectMethodHandle=false"};
Copy link
Contributor Author

@LuciferYang LuciferYang Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to rename JavaModuleOptions as JavaExtraOptions. Do you suggest completing the corresponding refactor work in this pr? @dongjoon-hyun

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea is that this pr solves the issue first if need, and then do the rename work by a separate pr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I agree with you. Let's postpone the renaming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LuciferYang LuciferYang marked this pull request as ready for review October 12, 2022 05:26
@LuciferYang LuciferYang changed the title [WIP][SPARK-40729][CORE] Make Spark use UnsafeFieldAccessor as default with Java18/19 [SPARK-40729][CORE] Make Spark use UnsafeFieldAccessor as default with Java18/19 Oct 12, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @LuciferYang and @rednaxelafx .
Merged to master for Apache Spark 3.4.0.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-40729][CORE] Make Spark use UnsafeFieldAccessor as default with Java18/19 [SPARK-40729][CORE] Make Spark use UnsafeFieldAccessor as default for Java 18+ Oct 12, 2022
@LuciferYang
Copy link
Contributor Author

thanks @dongjoon-hyun @rednaxelafx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants