Skip to content

Conversation

@pan3793
Copy link
Member

@pan3793 pan3793 commented Jul 24, 2025

What changes were proposed in this pull request?

Since SPARK-41400, the common/network-yarn module has started to hard depend on Scala, now it causes YARN Resource Manager to fail to start due to missing scala-library.

2025-07-24 09:55:38,369 INFO util.ApplicationClassLoader: classpath: [file:/opt/spark/yarn/spark-4.1.0-SNAPSHOT-yarn-shuffle.jar]
2025-07-24 09:55:38,369 INFO util.ApplicationClassLoader: system classes: [java., javax.accessibility., -javax.activation., javax.activity., javax.annotation., javax.annotation.processing., javax.crypto., javax.ima
geio., javax.jws., javax.lang.model., -javax.management.j2ee., javax.management., javax.naming., javax.net., javax.print., javax.rmi., javax.script., -javax.security.auth.message., javax.security.auth., javax.secur
ity.cert., javax.security.sasl., javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., org.xml.sax., org.apache.commons.logging.,
 org.apache.log4j., -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, hdfs-default.xml, mapred-default.xml, yarn-default.xml]
2025-07-24 09:55:38,538 INFO yarn.YarnShuffleService: Initializing YARN shuffle service for Spark
2025-07-24 09:55:38,539 WARN containermanager.AuxServices: The Auxiliary Service named 'spark_shuffle' in the configuration is for class org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWi
thCustomClassLoader which has a name of 'org.apache.spark.network.yarn.YarnShuffleService with custom class loader'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may ha
ve issues unless the refer to the name in the config.
2025-07-24 09:55:38,808 ERROR nodemanager.NodeManager: Error starting NodeManager
java.lang.NoClassDefFoundError: scala/Product
	at java.base/java.lang.ClassLoader.defineClass1(Native Method)
	at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
	at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
	at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524)
	at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:427)
	at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:421)
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
	at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:420)
	at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:176)
	at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
	at org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:330)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.serviceInit(AuxiliaryServiceWithCustomClassLoader.java:64)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.initAuxService(AuxServices.java:475)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:758)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165)
	at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:110)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:336)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165)
	at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:110)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:501)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:969)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1058)
Caused by: java.lang.ClassNotFoundException: scala.Product
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
	at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189)
	at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
	... 25 more
2025-07-24 09:55:38,815 INFO nodemanager.NodeManager: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NodeManager at hadoop-worker1.orb.local/192.168.97.6
************************************************************/

Note: now spark-<version>-yarn-shuffle.jar is ~100m, while previously it is ~10m. in common/utils, the Java and Scala code cross references each other, so we can not simply split it into one Java utils and one Scala utils modules, thus it's not easy to make spark-<version>-yarn-shuffle.jar to be scala-free as before.

Why are the changes needed?

Bug fix, recover a broken feature.

Does this PR introduce any user-facing change?

Yes, recover a broken feature.

How was this patch tested?

Tested on a YARN cluster, RM starts successfully after patching.

Note: Spark 4 requires JDK 17 or later, but JDK 17 is not officially supported as of Hadoop 3.4.1. The Hadoop community has been actively working on supporting JDK 17 in recent months, and it almost works fine in 3.4.2.

For reviewers who expect to verify this locally, consider using 3.4.2 RC1 [1]

[1] https://lists.apache.org/thread/f66vj3rj6cpk37gb1jfl2ombq3hltsml

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the BUILD label Jul 24, 2025
@pan3793
Copy link
Member Author

pan3793 commented Jul 24, 2025

cc @hvanhovell @LuciferYang

@LuciferYang
Copy link
Contributor

Note: now spark--yarn-shuffle.jar is ~100m, while previously it is ~10m. in common/utils, the Java and Scala code cross references each other, so we can not simply split it into one Java utils and one Scala utils modules, thus it's not easy to make spark--yarn-shuffle.jar to be scala-free as before.

Do you know why it has become so big?

@pan3793
Copy link
Member Author

pan3793 commented Jul 24, 2025

Do you know why it has become so big?

@LuciferYang the jar can be produced by

build/mvn -pl common/network-yarn -am clean package -DskipTests -Pyarn
spark-3.3.4-yarn-shuffle.jar - 11M
spark-3.4.4-yarn-shuffle.jar - 68M -- rocksdbjni contributes the most part
spark-3.5.6-yarn-shuffle.jar - 74M -- start consuming `spark-common-utils_2.12:jar` and it's transitive deps
spark-4.0.0-yarn-shuffle.jar - 96M -- start consuming `netty-tcnative-boringssl-static:jar` -- broken
spark-4.1.0-SNAPSHOT-yarn-shuffle.jar - 103M -- this PR

@mridulm
Copy link
Contributor

mridulm commented Jul 24, 2025

This is unfortunate, thanks for working on fixing it.
I am also concerned about the size escalation, and wondering if we can break the dependency on scala for ess.

@pan3793 pan3793 changed the title [SPARK-5294][YARN][BUILD] YARN External Shuffle Service jar should include scala-library [SPARK-52942][YARN][BUILD] YARN External Shuffle Service jar should include scala-library Jul 24, 2025
@pan3793
Copy link
Member Author

pan3793 commented Jul 24, 2025

... if we can break the dependency on scala for ess.

@mridulm This is what I initially expected, but it seems we can't, because the structured logging framework API for Java depends on Scala classes.

public void info(String msg, Throwable throwable, MDC... mdcs) {
if (mdcs == null || mdcs.length == 0) {
slf4jLogger.info(msg, throwable);
} else if (slf4jLogger.isInfoEnabled()) {
withLogContext(msg, mdcs, throwable, mt -> slf4jLogger.info(mt.message, mt.throwable));
}
}

@LuciferYang
Copy link
Contributor

LuciferYang commented Jul 24, 2025

... if we can break the dependency on scala for ess.

@mridulm This is what I initially expected, but it seems we can't, because the structured logging framework API for Java depends on Scala classes.

public void info(String msg, Throwable throwable, MDC... mdcs) {
if (mdcs == null || mdcs.length == 0) {
slf4jLogger.info(msg, throwable);
} else if (slf4jLogger.isInfoEnabled()) {
withLogContext(msg, mdcs, throwable, mt -> slf4jLogger.info(mt.message, mt.throwable));
}
}

also cc @gengliangwang and @panbingkun

@dongjoon-hyun
Copy link
Member

Thank you so much for reporting and working on this, @pan3793 .

If then, we may want to remove Structured Logging Framework usage in YARN External Shuffle Service module? Technically, it's useless in YARN, isn't it?

it seems we can't, because the structured logging framework API for Java depends on Scala classes.

cc @peter-toth

@pan3793
Copy link
Member Author

pan3793 commented Jul 26, 2025

@dongjoon-hyun common/network-yarn module depends on common/utils, common/network-common, common/network-shuffle. If we want to make common/network-yarn scala-free, we should purge Structured Logging Framework usage in all of those modules.

@dongjoon-hyun
Copy link
Member

Got it.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 26, 2025

Wait, Structured Logging Framework is controlled by configuration. Do you mean Structured Logging Framework-disabled code path also requires Scala at runtime, @pan3793 ? In case of spark.log.structuredLogging.enabled=false, we might be free scala-library like before.

@pan3793
Copy link
Member Author

pan3793 commented Jul 28, 2025

@dongjoon-hyun spark.log.structuredLogging.enabled=false does not work either. The stacktrace pasted in the PR description is produced under disabling structured logging.

In the Structured Logging Framework API, MDC is a Scala case class, which inherits from scala.Product, thus always requires scala-library when this API is called.

public void info(String msg, Throwable throwable, MDC... mdcs) {
if (mdcs == null || mdcs.length == 0) {
slf4jLogger.info(msg, throwable);
} else if (slf4jLogger.isInfoEnabled()) {
withLogContext(msg, mdcs, throwable, mt -> slf4jLogger.info(mt.message, mt.throwable));
}
}

case class MDC(key: LogKey, value: Any) {

logger.info("Registered metrics with Hadoop's DefaultMetricsSystem using namespace '{}'",
MDC.of(LogKeys.SHUFFLE_SERVICE_METRICS_NAMESPACE$.MODULE$, metricsNamespace));

@dongjoon-hyun
Copy link
Member

Thank you. Let me rephrase in my words. So, the problem is not 'Structured Logging'. Instead, the root cause is the MDC usage which is newly added based on Scala-dependency. Did I understand correctly?

@pan3793
Copy link
Member Author

pan3793 commented Jul 28, 2025

I think they are the same thing because MDC is part of the structured logging API.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 28, 2025

I think they are the same thing because MDC is part of the structured logging API.

😄 What I was thinking is something like following by removing MDC feature in those key modules (YARN external shuffle service depends), @pan3793 . Although it means we are going to lose MDC feature for those modules, I thought it's a few places compared to the whole Spark modules. However, log is still there, isn't it?

 logger.info("Registered metrics with Hadoop's DefaultMetricsSystem using namespace '{}'", 
-   MDC.of(LogKeys.SHUFFLE_SERVICE_METRICS_NAMESPACE$.MODULE$, metricsNamespace)); 
+   metricsNamespace); 

Structured Logging can be considered two parts. The main part is definitely writing into JSON files. MDC is a secondary part only to enrich the log information.

@dongjoon-hyun
Copy link
Member

Anyway, it was just my idea. For this YARN External Shuffle service, I'll leave it to you guys because I'm not using this technically. So, feel free to proceed the AS-IS direction, @pan3793 and all.

@pan3793
Copy link
Member Author

pan3793 commented Jul 29, 2025

@dongjoon-hyun, thanks for your advice, but I'm afraid this violates the design principle of the Structured Logging Framework. I remember during the development of this feature, @gengliangwang strongly recommended replacing all variables with MDC for INFO and higher severity levels. And in the SparkLogger, there is no such API to allow us to do:

 logger.info("Registered metrics with Hadoop's DefaultMetricsSystem using namespace '{}'", 
-   MDC.of(LogKeys.SHUFFLE_SERVICE_METRICS_NAMESPACE$.MODULE$, metricsNamespace)); 
+   metricsNamespace); 
image

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 29, 2025

To @pan3793 , it seems that I wan't clear enough in the previous comment. I already walked away from this PR . That was my my idea but I'm also not insisting anything here since the last comment. Literally, you can ignore me (and previous my all comments) on this PR because I don't have any better suggestion as of now. 😄

@dongjoon-hyun, thanks for your advice, but I'm afraid this violates the design principle of the Structured Logging Framework. I remember during the development of this feature, @gengliangwang strongly recommended replacing all variables with MDC for INFO and higher severity levels. And in the SparkLogger, there is no such API to allow us to do:

 logger.info("Registered metrics with Hadoop's DefaultMetricsSystem using namespace '{}'", 
-   MDC.of(LogKeys.SHUFFLE_SERVICE_METRICS_NAMESPACE$.MODULE$, metricsNamespace)); 
+   metricsNamespace); 
image

@LuciferYang
Copy link
Contributor

Actually, I'm no longer using the External Shuffle service either. But I'm wondering if it's feasible to rewrite MDC and LogKeys in Java. That should enable decoupling, right? They are both in the org.apache.spark.internal package, so they probably don't belong to the public API either.

@pan3793
Copy link
Member Author

pan3793 commented Aug 1, 2025

@LuciferYang, thanks for your advice. I have made some progress in rewriting MDC and LogKeys from Scala to Java locally, although it's not done yet, I'm optimistic about its feasibility. But the change is non-trivial, at least not fit for branch-4.0. I suggest adopting this PR as a simple fix first, and I will open a PR follows your idea and discuss later. WDYT?

@LuciferYang
Copy link
Contributor

LuciferYang commented Aug 1, 2025

@LuciferYang, thanks for your advice. I have made some progress in rewriting MDC and LogKeys from Scala to Java locally, although it's not done yet, I'm optimistic about its feasibility. But the change is untrivial, at least not fit for branch-4.0. I suggest adopting this PR as a simple fix first, and I will open a PR follows your idea and discuss later. WDYT?

fine to me

LuciferYang pushed a commit that referenced this pull request Aug 5, 2025
### What changes were proposed in this pull request?

This PR proposes to rewrite a few classes used by the Structured Logging API from Scala to Java.

### Why are the changes needed?

Previously (before 3.5), modules under `common` were pure Java, and were easy to embed into other services, for example, the YARN External Shuffle Service will be run as a plugin of the YARN Resource Manager daemon process. With recent years' changes, some pure Java modules also require `scala-library` to be present at runtime, i.e., SPARK-52942 reports that YARN ESS causes YARN RM to fail to start due to missing `scala-library` in the classpath.

Instead of bundling `scala-library` into YARN ESS jar, #51650 (comment) suggests making it scala-free again.

This also makes Java's invocation of Structured Logging API much cleaner, now, it can be called without ugly `$.MODULE$`.
```patch
- MDC.of(LogKeys.HOST_PORT$.MODULE$, address);
+ MDC.of(LogKeys.HOST_PORT, address);
```

### Does this PR introduce _any_ user-facing change?

No, they are internal APIs, for those plugin developers who want to provide custom `LogKey`s, there still possible to make it compatible with both Spark 4.0 and the new API proposed by this PR, see https://github.com/pan3793/SPARK-53064

```java
import org.apache.spark.internal.LogKey;

// CUSTOM_LOG_KEY is compatible with both Spark 4.0 and SPARK-53064
public class JavaCustomLogKeys {
  // Custom `LogKey` must be `implements LogKey`
  public static class CUSTOM_LOG_KEY implements LogKey {
    Override
    public String name() {
      return "custom_log_key";
    }
  }

  // Singleton
  public static final CUSTOM_LOG_KEY CUSTOM_LOG_KEY = new CUSTOM_LOG_KEY();
}
```

### How was this patch tested?

Pass GHA, and verified YARN ESS works without `scala-library`.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #51775 from pan3793/SPARK-53064.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
@LuciferYang
Copy link
Contributor

LuciferYang commented Aug 6, 2025

After the merge of the pull request at #51775, master should no longer require this patch, but the branch-4.0 should still need this fix, right?

@pan3793
Copy link
Member Author

pan3793 commented Aug 6, 2025

After SPARK-53064, I tested that YARN ESS can successfully bootstrap and serve simple Spark queries w/o the scala-library, but I can NOT say that YARN ESS is scala-free, because I cannot guarantee that all the code in YARN ESS does not call Scala utility classes.

To achieve the goal, we might need to split the common/utils into 2 modules(one Java, one Scala), and let other common modules only depend on the pure Java utils.

For safety, we can merge this PR to both master and branch-4.0.

LuciferYang pushed a commit that referenced this pull request Aug 6, 2025
…nclude `scala-library`

### What changes were proposed in this pull request?

Since SPARK-41400, the `common/network-yarn` module has started to hard depend on Scala, now it causes YARN Resource Manager to fail to start due to missing `scala-library`.

```
2025-07-24 09:55:38,369 INFO util.ApplicationClassLoader: classpath: [file:/opt/spark/yarn/spark-4.1.0-SNAPSHOT-yarn-shuffle.jar]
2025-07-24 09:55:38,369 INFO util.ApplicationClassLoader: system classes: [java., javax.accessibility., -javax.activation., javax.activity., javax.annotation., javax.annotation.processing., javax.crypto., javax.ima
geio., javax.jws., javax.lang.model., -javax.management.j2ee., javax.management., javax.naming., javax.net., javax.print., javax.rmi., javax.script., -javax.security.auth.message., javax.security.auth., javax.secur
ity.cert., javax.security.sasl., javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., org.xml.sax., org.apache.commons.logging.,
 org.apache.log4j., -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, hdfs-default.xml, mapred-default.xml, yarn-default.xml]
2025-07-24 09:55:38,538 INFO yarn.YarnShuffleService: Initializing YARN shuffle service for Spark
2025-07-24 09:55:38,539 WARN containermanager.AuxServices: The Auxiliary Service named 'spark_shuffle' in the configuration is for class org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWi
thCustomClassLoader which has a name of 'org.apache.spark.network.yarn.YarnShuffleService with custom class loader'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may ha
ve issues unless the refer to the name in the config.
2025-07-24 09:55:38,808 ERROR nodemanager.NodeManager: Error starting NodeManager
java.lang.NoClassDefFoundError: scala/Product
	at java.base/java.lang.ClassLoader.defineClass1(Native Method)
	at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
	at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
	at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524)
	at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:427)
	at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:421)
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
	at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:420)
	at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:176)
	at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
	at org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:330)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.serviceInit(AuxiliaryServiceWithCustomClassLoader.java:64)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.initAuxService(AuxServices.java:475)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:758)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165)
	at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:110)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:336)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165)
	at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:110)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:501)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:969)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1058)
Caused by: java.lang.ClassNotFoundException: scala.Product
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
	at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189)
	at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
	... 25 more
2025-07-24 09:55:38,815 INFO nodemanager.NodeManager: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NodeManager at hadoop-worker1.orb.local/192.168.97.6
************************************************************/
```

Note: now `spark-<version>-yarn-shuffle.jar` is ~100m, while previously it is ~10m. in `common/utils`, the Java and Scala code cross references each other, so we can not simply split it into one Java utils and one Scala utils modules, thus it's not easy to make `spark-<version>-yarn-shuffle.jar` to be scala-free as before.

### Why are the changes needed?

Bug fix, recover a broken feature.

### Does this PR introduce _any_ user-facing change?

Yes, recover a broken feature.

### How was this patch tested?

Tested on a YARN cluster, RM starts successfully after patching.

Note: Spark 4 requires JDK 17 or later, but JDK 17 is not officially supported as of Hadoop 3.4.1. The Hadoop community has been actively working on supporting JDK 17 in recent months, and it almost works fine in 3.4.2.

For reviewers who expect to verify this locally, consider using 3.4.2 RC1 [1]

[1] https://lists.apache.org/thread/f66vj3rj6cpk37gb1jfl2ombq3hltsml

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #51650 from pan3793/SPARK-52942.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
(cherry picked from commit e964086)
Signed-off-by: yangjie01 <[email protected]>
@LuciferYang
Copy link
Contributor

Merged into master/branch-4.0. Thanks @pan3793 @dongjoon-hyun and @mridulm

@mridulm
Copy link
Contributor

mridulm commented Aug 7, 2025

Thanks for working on this @pan3793 !
I missed reviewing the rewrite for MDC/LogKeys - but it is great to see this dependency getting removed.

LuciferYang pushed a commit that referenced this pull request Aug 12, 2025
…dule

### What changes were proposed in this pull request?

This PR splits the Java code of `common/utils` into a new module `common/utils-java`, except for:

- `common/utils/src/main/java/org/apache/spark/storage/StorageLevelMapper.java`
- `common/utils/src/main/java/org/apache/spark/SparkThrowable.java`

A few utility methods are rewritten in Java to avoid depending on `scala-library`.

### Why are the changes needed?

To make YARN ESS (`common/network-yarn`) scala-free again.

Read the discussion of SPARK-52942 (#51650) for more details and the following PR.

- #51775

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

```
$ build/mvn -pl common/network-yarn -am clean package -DskipTests -Pyarn
...
[INFO] Including org.apache.spark:spark-network-shuffle_2.13:jar:4.1.0-SNAPSHOT in the shaded jar.
[INFO] Including org.apache.spark:spark-network-common_2.13:jar:4.1.0-SNAPSHOT in the shaded jar.
[INFO] Including io.netty:netty-all:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-buffer:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-codec:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-codec-dns:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-codec-http:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-codec-http2:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-codec-socks:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-common:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-handler:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-transport-native-unix-common:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-handler-proxy:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-resolver:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-resolver-dns:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-transport:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-transport-classes-epoll:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-transport-classes-kqueue:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-transport-native-epoll:jar:linux-riscv64:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-transport-native-epoll:jar:linux-x86_64:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-transport-native-epoll:jar:linux-aarch_64:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-transport-native-kqueue:jar:osx-aarch_64:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-transport-native-kqueue:jar:osx-x86_64:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-tcnative-boringssl-static:jar:linux-x86_64:2.0.72.Final in the shaded jar.
[INFO] Including io.netty:netty-tcnative-classes:jar:2.0.72.Final in the shaded jar.
[INFO] Including io.netty:netty-tcnative-boringssl-static:jar:windows-x86_64:2.0.72.Final in the shaded jar.
[INFO] Including io.netty:netty-tcnative-boringssl-static:jar:linux-aarch_64:2.0.72.Final in the shaded jar.
[INFO] Including io.netty:netty-tcnative-boringssl-static:jar:osx-aarch_64:2.0.72.Final in the shaded jar.
[INFO] Including io.netty:netty-tcnative-boringssl-static:jar:osx-x86_64:2.0.72.Final in the shaded jar.
[INFO] Including org.fusesource.leveldbjni:leveldbjni-all:jar:1.8 in the shaded jar.
[INFO] Including org.rocksdb:rocksdbjni:jar:9.8.4 in the shaded jar.
[INFO] Including com.fasterxml.jackson.core:jackson-databind:jar:2.19.2 in the shaded jar.
[INFO] Including com.fasterxml.jackson.core:jackson-core:jar:2.19.2 in the shaded jar.
[INFO] Including com.fasterxml.jackson.core:jackson-annotations:jar:2.19.2 in the shaded jar.
[INFO] Including org.apache.commons:commons-crypto:jar:1.1.0 in the shaded jar.
[INFO] Including com.google.crypto.tink:tink:jar:1.16.0 in the shaded jar.
[INFO] Including com.google.code.gson:gson:jar:2.11.0 in the shaded jar.
[INFO] Including org.apache.spark:spark-common-utils-java_2.13:jar:4.1.0-SNAPSHOT in the shaded jar.
[INFO] Including org.slf4j:jul-to-slf4j:jar:2.0.17 in the shaded jar.
[INFO] Including org.slf4j:jcl-over-slf4j:jar:2.0.17 in the shaded jar.
[INFO] Including org.apache.logging.log4j:log4j-slf4j2-impl:jar:2.24.3 in the shaded jar.
[INFO] Including org.apache.logging.log4j:log4j-api:jar:2.24.3 in the shaded jar.
[INFO] Including org.apache.logging.log4j:log4j-core:jar:2.24.3 in the shaded jar.
[INFO] Including org.apache.logging.log4j:log4j-1.2-api:jar:2.24.3 in the shaded jar.
[INFO] Including org.apache.logging.log4j:log4j-layout-template-json:jar:2.24.3 in the shaded jar.
[INFO] Including org.apache.commons:commons-lang3:jar:3.18.0 in the shaded jar.
[INFO] Including io.dropwizard.metrics:metrics-core:jar:4.2.32 in the shaded jar.
[INFO] Including org.roaringbitmap:RoaringBitmap:jar:1.3.0 in the shaded jar.
[INFO] Including com.google.code.findbugs:jsr305:jar:3.0.0 in the shaded jar.
[INFO] Including org.spark-project.spark:unused:jar:1.0.0 in the shaded jar.
...
```

Now, YARN ESS jar is scala-free, rocksdbjni and netty contribute the major size.
```
$ ll common/network-yarn/target/scala-2.13/spark-4.1.0-SNAPSHOT-yarn-shuffle.jar
-rw-r--r-- 1 chengpan  staff    92M Aug  6 16:54 common/network-yarn/target/scala-2.13/spark-4.1.0-SNAPSHOT-yarn-shuffle.jar
$ jar tf common/network-yarn/target/scala-2.13/spark-4.1.0-SNAPSHOT-yarn-shuffle.jar | grep scala
<no-output>
$ ll ~/.m2/repository/org/rocksdb/rocksdbjni/9.8.4/rocksdbjni-9.8.4.jar
-rw-r--r-- 1 chengpan  staff    68M Jan 12  2025 /Users/chengpan/.m2/repository/org/rocksdb/rocksdbjni/9.8.4/rocksdbjni-9.8.4.jar
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #51868 from pan3793/SPARK-53138.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 14, 2025
…nclude `scala-library`

### What changes were proposed in this pull request?

Since SPARK-41400, the `common/network-yarn` module has started to hard depend on Scala, now it causes YARN Resource Manager to fail to start due to missing `scala-library`.

```
2025-07-24 09:55:38,369 INFO util.ApplicationClassLoader: classpath: [file:/opt/spark/yarn/spark-4.1.0-SNAPSHOT-yarn-shuffle.jar]
2025-07-24 09:55:38,369 INFO util.ApplicationClassLoader: system classes: [java., javax.accessibility., -javax.activation., javax.activity., javax.annotation., javax.annotation.processing., javax.crypto., javax.ima
geio., javax.jws., javax.lang.model., -javax.management.j2ee., javax.management., javax.naming., javax.net., javax.print., javax.rmi., javax.script., -javax.security.auth.message., javax.security.auth., javax.secur
ity.cert., javax.security.sasl., javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., org.xml.sax., org.apache.commons.logging.,
 org.apache.log4j., -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, hdfs-default.xml, mapred-default.xml, yarn-default.xml]
2025-07-24 09:55:38,538 INFO yarn.YarnShuffleService: Initializing YARN shuffle service for Spark
2025-07-24 09:55:38,539 WARN containermanager.AuxServices: The Auxiliary Service named 'spark_shuffle' in the configuration is for class org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWi
thCustomClassLoader which has a name of 'org.apache.spark.network.yarn.YarnShuffleService with custom class loader'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may ha
ve issues unless the refer to the name in the config.
2025-07-24 09:55:38,808 ERROR nodemanager.NodeManager: Error starting NodeManager
java.lang.NoClassDefFoundError: scala/Product
	at java.base/java.lang.ClassLoader.defineClass1(Native Method)
	at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
	at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
	at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524)
	at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:427)
	at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:421)
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
	at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:420)
	at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:176)
	at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
	at org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:330)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.serviceInit(AuxiliaryServiceWithCustomClassLoader.java:64)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.initAuxService(AuxServices.java:475)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:758)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165)
	at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:110)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:336)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165)
	at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:110)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:501)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:165)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:969)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1058)
Caused by: java.lang.ClassNotFoundException: scala.Product
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
	at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189)
	at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
	... 25 more
2025-07-24 09:55:38,815 INFO nodemanager.NodeManager: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NodeManager at hadoop-worker1.orb.local/192.168.97.6
************************************************************/
```

Note: now `spark-<version>-yarn-shuffle.jar` is ~100m, while previously it is ~10m. in `common/utils`, the Java and Scala code cross references each other, so we can not simply split it into one Java utils and one Scala utils modules, thus it's not easy to make `spark-<version>-yarn-shuffle.jar` to be scala-free as before.

### Why are the changes needed?

Bug fix, recover a broken feature.

### Does this PR introduce _any_ user-facing change?

Yes, recover a broken feature.

### How was this patch tested?

Tested on a YARN cluster, RM starts successfully after patching.

Note: Spark 4 requires JDK 17 or later, but JDK 17 is not officially supported as of Hadoop 3.4.1. The Hadoop community has been actively working on supporting JDK 17 in recent months, and it almost works fine in 3.4.2.

For reviewers who expect to verify this locally, consider using 3.4.2 RC1 [1]

[1] https://lists.apache.org/thread/f66vj3rj6cpk37gb1jfl2ombq3hltsml

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#51650 from pan3793/SPARK-52942.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
(cherry picked from commit 6f6b8f4)
Signed-off-by: yangjie01 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants