-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-52942][YARN][BUILD] YARN External Shuffle Service jar should include scala-library
#51650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…clude scala-library
Do you know why it has become so big? |
@LuciferYang the jar can be produced by |
|
This is unfortunate, thanks for working on fixing it. |
scala-libraryscala-library
@mridulm This is what I initially expected, but it seems we can't, because the structured logging framework API for Java depends on Scala classes. spark/common/utils/src/main/java/org/apache/spark/internal/SparkLogger.java Lines 159 to 165 in 5a9929c
|
also cc @gengliangwang and @panbingkun |
|
Thank you so much for reporting and working on this, @pan3793 . If then, we may want to remove
cc @peter-toth |
|
@dongjoon-hyun |
|
Got it. |
|
Wait, |
|
@dongjoon-hyun In the spark/common/utils/src/main/java/org/apache/spark/internal/SparkLogger.java Lines 159 to 165 in 5a9929c
spark/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java Lines 330 to 331 in 5a9929c
|
|
Thank you. Let me rephrase in my words. So, the problem is not 'Structured Logging'. Instead, the root cause is the MDC usage which is newly added based on Scala-dependency. Did I understand correctly? |
|
I think they are the same thing because |
😄 What I was thinking is something like following by removing logger.info("Registered metrics with Hadoop's DefaultMetricsSystem using namespace '{}'",
- MDC.of(LogKeys.SHUFFLE_SERVICE_METRICS_NAMESPACE$.MODULE$, metricsNamespace));
+ metricsNamespace);
|
|
Anyway, it was just my idea. For this |
|
@dongjoon-hyun, thanks for your advice, but I'm afraid this violates the design principle of the Structured Logging Framework. I remember during the development of this feature, @gengliangwang strongly recommended replacing all variables with logger.info("Registered metrics with Hadoop's DefaultMetricsSystem using namespace '{}'",
- MDC.of(LogKeys.SHUFFLE_SERVICE_METRICS_NAMESPACE$.MODULE$, metricsNamespace));
+ metricsNamespace);
|
|
To @pan3793 , it seems that I wan't clear enough in the previous comment. I already walked away from this PR . That was my my idea but I'm also not insisting anything here since the last comment. Literally, you can ignore me (and previous my all comments) on this PR because I don't have any better suggestion as of now. 😄
|
|
Actually, I'm no longer using the External Shuffle service either. But I'm wondering if it's feasible to rewrite |
|
@LuciferYang, thanks for your advice. I have made some progress in rewriting |
fine to me |
### What changes were proposed in this pull request? This PR proposes to rewrite a few classes used by the Structured Logging API from Scala to Java. ### Why are the changes needed? Previously (before 3.5), modules under `common` were pure Java, and were easy to embed into other services, for example, the YARN External Shuffle Service will be run as a plugin of the YARN Resource Manager daemon process. With recent years' changes, some pure Java modules also require `scala-library` to be present at runtime, i.e., SPARK-52942 reports that YARN ESS causes YARN RM to fail to start due to missing `scala-library` in the classpath. Instead of bundling `scala-library` into YARN ESS jar, #51650 (comment) suggests making it scala-free again. This also makes Java's invocation of Structured Logging API much cleaner, now, it can be called without ugly `$.MODULE$`. ```patch - MDC.of(LogKeys.HOST_PORT$.MODULE$, address); + MDC.of(LogKeys.HOST_PORT, address); ``` ### Does this PR introduce _any_ user-facing change? No, they are internal APIs, for those plugin developers who want to provide custom `LogKey`s, there still possible to make it compatible with both Spark 4.0 and the new API proposed by this PR, see https://github.com/pan3793/SPARK-53064 ```java import org.apache.spark.internal.LogKey; // CUSTOM_LOG_KEY is compatible with both Spark 4.0 and SPARK-53064 public class JavaCustomLogKeys { // Custom `LogKey` must be `implements LogKey` public static class CUSTOM_LOG_KEY implements LogKey { Override public String name() { return "custom_log_key"; } } // Singleton public static final CUSTOM_LOG_KEY CUSTOM_LOG_KEY = new CUSTOM_LOG_KEY(); } ``` ### How was this patch tested? Pass GHA, and verified YARN ESS works without `scala-library`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #51775 from pan3793/SPARK-53064. Authored-by: Cheng Pan <[email protected]> Signed-off-by: yangjie01 <[email protected]>
|
After the merge of the pull request at #51775, master should no longer require this patch, but the branch-4.0 should still need this fix, right? |
|
After SPARK-53064, I tested that YARN ESS can successfully bootstrap and serve simple Spark queries w/o the To achieve the goal, we might need to split the For safety, we can merge this PR to both master and branch-4.0. |
…nclude `scala-library` ### What changes were proposed in this pull request? Since SPARK-41400, the `common/network-yarn` module has started to hard depend on Scala, now it causes YARN Resource Manager to fail to start due to missing `scala-library`. ``` 2025-07-24 09:55:38,369 INFO util.ApplicationClassLoader: classpath: [file:/opt/spark/yarn/spark-4.1.0-SNAPSHOT-yarn-shuffle.jar] 2025-07-24 09:55:38,369 INFO util.ApplicationClassLoader: system classes: [java., javax.accessibility., -javax.activation., javax.activity., javax.annotation., javax.annotation.processing., javax.crypto., javax.ima geio., javax.jws., javax.lang.model., -javax.management.j2ee., javax.management., javax.naming., javax.net., javax.print., javax.rmi., javax.script., -javax.security.auth.message., javax.security.auth., javax.secur ity.cert., javax.security.sasl., javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., org.xml.sax., org.apache.commons.logging., org.apache.log4j., -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, hdfs-default.xml, mapred-default.xml, yarn-default.xml] 2025-07-24 09:55:38,538 INFO yarn.YarnShuffleService: Initializing YARN shuffle service for Spark 2025-07-24 09:55:38,539 WARN containermanager.AuxServices: The Auxiliary Service named 'spark_shuffle' in the configuration is for class org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWi thCustomClassLoader which has a name of 'org.apache.spark.network.yarn.YarnShuffleService with custom class loader'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may ha ve issues unless the refer to the name in the config. 2025-07-24 09:55:38,808 ERROR nodemanager.NodeManager: Error starting NodeManager java.lang.NoClassDefFoundError: scala/Product at java.base/java.lang.ClassLoader.defineClass1(Native Method) at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524) at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:427) at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:421) at java.base/java.security.AccessController.doPrivileged(AccessController.java:712) at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:420) at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:176) at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157) at org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:330) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165) at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.serviceInit(AuxiliaryServiceWithCustomClassLoader.java:64) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165) at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.initAuxService(AuxServices.java:475) at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:758) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:110) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:336) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:110) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:501) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:969) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1058) Caused by: java.lang.ClassNotFoundException: scala.Product at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189) at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157) ... 25 more 2025-07-24 09:55:38,815 INFO nodemanager.NodeManager: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NodeManager at hadoop-worker1.orb.local/192.168.97.6 ************************************************************/ ``` Note: now `spark-<version>-yarn-shuffle.jar` is ~100m, while previously it is ~10m. in `common/utils`, the Java and Scala code cross references each other, so we can not simply split it into one Java utils and one Scala utils modules, thus it's not easy to make `spark-<version>-yarn-shuffle.jar` to be scala-free as before. ### Why are the changes needed? Bug fix, recover a broken feature. ### Does this PR introduce _any_ user-facing change? Yes, recover a broken feature. ### How was this patch tested? Tested on a YARN cluster, RM starts successfully after patching. Note: Spark 4 requires JDK 17 or later, but JDK 17 is not officially supported as of Hadoop 3.4.1. The Hadoop community has been actively working on supporting JDK 17 in recent months, and it almost works fine in 3.4.2. For reviewers who expect to verify this locally, consider using 3.4.2 RC1 [1] [1] https://lists.apache.org/thread/f66vj3rj6cpk37gb1jfl2ombq3hltsml ### Was this patch authored or co-authored using generative AI tooling? No. Closes #51650 from pan3793/SPARK-52942. Authored-by: Cheng Pan <[email protected]> Signed-off-by: yangjie01 <[email protected]> (cherry picked from commit e964086) Signed-off-by: yangjie01 <[email protected]>
|
Merged into master/branch-4.0. Thanks @pan3793 @dongjoon-hyun and @mridulm |
|
Thanks for working on this @pan3793 ! |
…dule ### What changes were proposed in this pull request? This PR splits the Java code of `common/utils` into a new module `common/utils-java`, except for: - `common/utils/src/main/java/org/apache/spark/storage/StorageLevelMapper.java` - `common/utils/src/main/java/org/apache/spark/SparkThrowable.java` A few utility methods are rewritten in Java to avoid depending on `scala-library`. ### Why are the changes needed? To make YARN ESS (`common/network-yarn`) scala-free again. Read the discussion of SPARK-52942 (#51650) for more details and the following PR. - #51775 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ``` $ build/mvn -pl common/network-yarn -am clean package -DskipTests -Pyarn ... [INFO] Including org.apache.spark:spark-network-shuffle_2.13:jar:4.1.0-SNAPSHOT in the shaded jar. [INFO] Including org.apache.spark:spark-network-common_2.13:jar:4.1.0-SNAPSHOT in the shaded jar. [INFO] Including io.netty:netty-all:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-buffer:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-codec:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-codec-dns:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-codec-http:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-codec-http2:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-codec-socks:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-common:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-handler:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-transport-native-unix-common:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-handler-proxy:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-resolver:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-resolver-dns:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-transport:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-transport-classes-epoll:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-transport-classes-kqueue:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-transport-native-epoll:jar:linux-riscv64:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-transport-native-epoll:jar:linux-x86_64:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-transport-native-epoll:jar:linux-aarch_64:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-transport-native-kqueue:jar:osx-aarch_64:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-transport-native-kqueue:jar:osx-x86_64:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-tcnative-boringssl-static:jar:linux-x86_64:2.0.72.Final in the shaded jar. [INFO] Including io.netty:netty-tcnative-classes:jar:2.0.72.Final in the shaded jar. [INFO] Including io.netty:netty-tcnative-boringssl-static:jar:windows-x86_64:2.0.72.Final in the shaded jar. [INFO] Including io.netty:netty-tcnative-boringssl-static:jar:linux-aarch_64:2.0.72.Final in the shaded jar. [INFO] Including io.netty:netty-tcnative-boringssl-static:jar:osx-aarch_64:2.0.72.Final in the shaded jar. [INFO] Including io.netty:netty-tcnative-boringssl-static:jar:osx-x86_64:2.0.72.Final in the shaded jar. [INFO] Including org.fusesource.leveldbjni:leveldbjni-all:jar:1.8 in the shaded jar. [INFO] Including org.rocksdb:rocksdbjni:jar:9.8.4 in the shaded jar. [INFO] Including com.fasterxml.jackson.core:jackson-databind:jar:2.19.2 in the shaded jar. [INFO] Including com.fasterxml.jackson.core:jackson-core:jar:2.19.2 in the shaded jar. [INFO] Including com.fasterxml.jackson.core:jackson-annotations:jar:2.19.2 in the shaded jar. [INFO] Including org.apache.commons:commons-crypto:jar:1.1.0 in the shaded jar. [INFO] Including com.google.crypto.tink:tink:jar:1.16.0 in the shaded jar. [INFO] Including com.google.code.gson:gson:jar:2.11.0 in the shaded jar. [INFO] Including org.apache.spark:spark-common-utils-java_2.13:jar:4.1.0-SNAPSHOT in the shaded jar. [INFO] Including org.slf4j:jul-to-slf4j:jar:2.0.17 in the shaded jar. [INFO] Including org.slf4j:jcl-over-slf4j:jar:2.0.17 in the shaded jar. [INFO] Including org.apache.logging.log4j:log4j-slf4j2-impl:jar:2.24.3 in the shaded jar. [INFO] Including org.apache.logging.log4j:log4j-api:jar:2.24.3 in the shaded jar. [INFO] Including org.apache.logging.log4j:log4j-core:jar:2.24.3 in the shaded jar. [INFO] Including org.apache.logging.log4j:log4j-1.2-api:jar:2.24.3 in the shaded jar. [INFO] Including org.apache.logging.log4j:log4j-layout-template-json:jar:2.24.3 in the shaded jar. [INFO] Including org.apache.commons:commons-lang3:jar:3.18.0 in the shaded jar. [INFO] Including io.dropwizard.metrics:metrics-core:jar:4.2.32 in the shaded jar. [INFO] Including org.roaringbitmap:RoaringBitmap:jar:1.3.0 in the shaded jar. [INFO] Including com.google.code.findbugs:jsr305:jar:3.0.0 in the shaded jar. [INFO] Including org.spark-project.spark:unused:jar:1.0.0 in the shaded jar. ... ``` Now, YARN ESS jar is scala-free, rocksdbjni and netty contribute the major size. ``` $ ll common/network-yarn/target/scala-2.13/spark-4.1.0-SNAPSHOT-yarn-shuffle.jar -rw-r--r-- 1 chengpan staff 92M Aug 6 16:54 common/network-yarn/target/scala-2.13/spark-4.1.0-SNAPSHOT-yarn-shuffle.jar $ jar tf common/network-yarn/target/scala-2.13/spark-4.1.0-SNAPSHOT-yarn-shuffle.jar | grep scala <no-output> $ ll ~/.m2/repository/org/rocksdb/rocksdbjni/9.8.4/rocksdbjni-9.8.4.jar -rw-r--r-- 1 chengpan staff 68M Jan 12 2025 /Users/chengpan/.m2/repository/org/rocksdb/rocksdbjni/9.8.4/rocksdbjni-9.8.4.jar ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #51868 from pan3793/SPARK-53138. Authored-by: Cheng Pan <[email protected]> Signed-off-by: yangjie01 <[email protected]>
…nclude `scala-library` ### What changes were proposed in this pull request? Since SPARK-41400, the `common/network-yarn` module has started to hard depend on Scala, now it causes YARN Resource Manager to fail to start due to missing `scala-library`. ``` 2025-07-24 09:55:38,369 INFO util.ApplicationClassLoader: classpath: [file:/opt/spark/yarn/spark-4.1.0-SNAPSHOT-yarn-shuffle.jar] 2025-07-24 09:55:38,369 INFO util.ApplicationClassLoader: system classes: [java., javax.accessibility., -javax.activation., javax.activity., javax.annotation., javax.annotation.processing., javax.crypto., javax.ima geio., javax.jws., javax.lang.model., -javax.management.j2ee., javax.management., javax.naming., javax.net., javax.print., javax.rmi., javax.script., -javax.security.auth.message., javax.security.auth., javax.secur ity.cert., javax.security.sasl., javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., org.xml.sax., org.apache.commons.logging., org.apache.log4j., -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, hdfs-default.xml, mapred-default.xml, yarn-default.xml] 2025-07-24 09:55:38,538 INFO yarn.YarnShuffleService: Initializing YARN shuffle service for Spark 2025-07-24 09:55:38,539 WARN containermanager.AuxServices: The Auxiliary Service named 'spark_shuffle' in the configuration is for class org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWi thCustomClassLoader which has a name of 'org.apache.spark.network.yarn.YarnShuffleService with custom class loader'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may ha ve issues unless the refer to the name in the config. 2025-07-24 09:55:38,808 ERROR nodemanager.NodeManager: Error starting NodeManager java.lang.NoClassDefFoundError: scala/Product at java.base/java.lang.ClassLoader.defineClass1(Native Method) at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524) at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:427) at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:421) at java.base/java.security.AccessController.doPrivileged(AccessController.java:712) at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:420) at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:176) at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157) at org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:330) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165) at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.serviceInit(AuxiliaryServiceWithCustomClassLoader.java:64) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165) at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.initAuxService(AuxServices.java:475) at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:758) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:110) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:336) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:110) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:501) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:969) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1058) Caused by: java.lang.ClassNotFoundException: scala.Product at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189) at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157) ... 25 more 2025-07-24 09:55:38,815 INFO nodemanager.NodeManager: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NodeManager at hadoop-worker1.orb.local/192.168.97.6 ************************************************************/ ``` Note: now `spark-<version>-yarn-shuffle.jar` is ~100m, while previously it is ~10m. in `common/utils`, the Java and Scala code cross references each other, so we can not simply split it into one Java utils and one Scala utils modules, thus it's not easy to make `spark-<version>-yarn-shuffle.jar` to be scala-free as before. ### Why are the changes needed? Bug fix, recover a broken feature. ### Does this PR introduce _any_ user-facing change? Yes, recover a broken feature. ### How was this patch tested? Tested on a YARN cluster, RM starts successfully after patching. Note: Spark 4 requires JDK 17 or later, but JDK 17 is not officially supported as of Hadoop 3.4.1. The Hadoop community has been actively working on supporting JDK 17 in recent months, and it almost works fine in 3.4.2. For reviewers who expect to verify this locally, consider using 3.4.2 RC1 [1] [1] https://lists.apache.org/thread/f66vj3rj6cpk37gb1jfl2ombq3hltsml ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#51650 from pan3793/SPARK-52942. Authored-by: Cheng Pan <[email protected]> Signed-off-by: yangjie01 <[email protected]> (cherry picked from commit 6f6b8f4) Signed-off-by: yangjie01 <[email protected]>


What changes were proposed in this pull request?
Since SPARK-41400, the
common/network-yarnmodule has started to hard depend on Scala, now it causes YARN Resource Manager to fail to start due to missingscala-library.Note: now
spark-<version>-yarn-shuffle.jaris ~100m, while previously it is ~10m. incommon/utils, the Java and Scala code cross references each other, so we can not simply split it into one Java utils and one Scala utils modules, thus it's not easy to makespark-<version>-yarn-shuffle.jarto be scala-free as before.Why are the changes needed?
Bug fix, recover a broken feature.
Does this PR introduce any user-facing change?
Yes, recover a broken feature.
How was this patch tested?
Tested on a YARN cluster, RM starts successfully after patching.
Note: Spark 4 requires JDK 17 or later, but JDK 17 is not officially supported as of Hadoop 3.4.1. The Hadoop community has been actively working on supporting JDK 17 in recent months, and it almost works fine in 3.4.2.
For reviewers who expect to verify this locally, consider using 3.4.2 RC1 [1]
[1] https://lists.apache.org/thread/f66vj3rj6cpk37gb1jfl2ombq3hltsml
Was this patch authored or co-authored using generative AI tooling?
No.