Skip to content

Conversation

@pan3793
Copy link
Member

@pan3793 pan3793 commented Aug 6, 2025

What changes were proposed in this pull request?

This PR splits the Java code of common/utils into a new module common/utils-java, except for:

  • common/utils/src/main/java/org/apache/spark/storage/StorageLevelMapper.java
  • common/utils/src/main/java/org/apache/spark/SparkThrowable.java

A few utility methods are rewritten in Java to avoid depending on scala-library.

Why are the changes needed?

To make YARN ESS (common/network-yarn) scala-free again.

Read the discussion of SPARK-52942 (#51650) for more details and the following PR.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

$ build/mvn -pl common/network-yarn -am clean package -DskipTests -Pyarn
...
[INFO] Including org.apache.spark:spark-network-shuffle_2.13:jar:4.1.0-SNAPSHOT in the shaded jar.
[INFO] Including org.apache.spark:spark-network-common_2.13:jar:4.1.0-SNAPSHOT in the shaded jar.
[INFO] Including io.netty:netty-all:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-buffer:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-codec:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-codec-dns:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-codec-http:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-codec-http2:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-codec-socks:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-common:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-handler:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-transport-native-unix-common:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-handler-proxy:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-resolver:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-resolver-dns:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-transport:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-transport-classes-epoll:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-transport-classes-kqueue:jar:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-transport-native-epoll:jar:linux-riscv64:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-transport-native-epoll:jar:linux-x86_64:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-transport-native-epoll:jar:linux-aarch_64:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-transport-native-kqueue:jar:osx-aarch_64:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-transport-native-kqueue:jar:osx-x86_64:4.1.123.Final in the shaded jar.
[INFO] Including io.netty:netty-tcnative-boringssl-static:jar:linux-x86_64:2.0.72.Final in the shaded jar.
[INFO] Including io.netty:netty-tcnative-classes:jar:2.0.72.Final in the shaded jar.
[INFO] Including io.netty:netty-tcnative-boringssl-static:jar:windows-x86_64:2.0.72.Final in the shaded jar.
[INFO] Including io.netty:netty-tcnative-boringssl-static:jar:linux-aarch_64:2.0.72.Final in the shaded jar.
[INFO] Including io.netty:netty-tcnative-boringssl-static:jar:osx-aarch_64:2.0.72.Final in the shaded jar.
[INFO] Including io.netty:netty-tcnative-boringssl-static:jar:osx-x86_64:2.0.72.Final in the shaded jar.
[INFO] Including org.fusesource.leveldbjni:leveldbjni-all:jar:1.8 in the shaded jar.
[INFO] Including org.rocksdb:rocksdbjni:jar:9.8.4 in the shaded jar.
[INFO] Including com.fasterxml.jackson.core:jackson-databind:jar:2.19.2 in the shaded jar.
[INFO] Including com.fasterxml.jackson.core:jackson-core:jar:2.19.2 in the shaded jar.
[INFO] Including com.fasterxml.jackson.core:jackson-annotations:jar:2.19.2 in the shaded jar.
[INFO] Including org.apache.commons:commons-crypto:jar:1.1.0 in the shaded jar.
[INFO] Including com.google.crypto.tink:tink:jar:1.16.0 in the shaded jar.
[INFO] Including com.google.code.gson:gson:jar:2.11.0 in the shaded jar.
[INFO] Including org.apache.spark:spark-common-utils-java_2.13:jar:4.1.0-SNAPSHOT in the shaded jar.
[INFO] Including org.slf4j:jul-to-slf4j:jar:2.0.17 in the shaded jar.
[INFO] Including org.slf4j:jcl-over-slf4j:jar:2.0.17 in the shaded jar.
[INFO] Including org.apache.logging.log4j:log4j-slf4j2-impl:jar:2.24.3 in the shaded jar.
[INFO] Including org.apache.logging.log4j:log4j-api:jar:2.24.3 in the shaded jar.
[INFO] Including org.apache.logging.log4j:log4j-core:jar:2.24.3 in the shaded jar.
[INFO] Including org.apache.logging.log4j:log4j-1.2-api:jar:2.24.3 in the shaded jar.
[INFO] Including org.apache.logging.log4j:log4j-layout-template-json:jar:2.24.3 in the shaded jar.
[INFO] Including org.apache.commons:commons-lang3:jar:3.18.0 in the shaded jar.
[INFO] Including io.dropwizard.metrics:metrics-core:jar:4.2.32 in the shaded jar.
[INFO] Including org.roaringbitmap:RoaringBitmap:jar:1.3.0 in the shaded jar.
[INFO] Including com.google.code.findbugs:jsr305:jar:3.0.0 in the shaded jar.
[INFO] Including org.spark-project.spark:unused:jar:1.0.0 in the shaded jar.
...

Now, YARN ESS jar is scala-free, rocksdbjni and netty contribute the major size.

$ ll common/network-yarn/target/scala-2.13/spark-4.1.0-SNAPSHOT-yarn-shuffle.jar                 
-rw-r--r--@ 1 chengpan  staff    92M Aug  6 16:54 common/network-yarn/target/scala-2.13/spark-4.1.0-SNAPSHOT-yarn-shuffle.jar
$ jar tf common/network-yarn/target/scala-2.13/spark-4.1.0-SNAPSHOT-yarn-shuffle.jar | grep scala
<no-output>
$ ll ~/.m2/repository/org/rocksdb/rocksdbjni/9.8.4/rocksdbjni-9.8.4.jar         
-rw-r--r--@ 1 chengpan  staff    68M Jan 12  2025 /Users/chengpan/.m2/repository/org/rocksdb/rocksdbjni/9.8.4/rocksdbjni-9.8.4.jar

Was this patch authored or co-authored using generative AI tooling?

No.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if the style is right here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

common/variant still depends on common/utils(Scala) because this is a Scala class

@pan3793 pan3793 changed the title [SPARK-53138] Split common-utils Java code into a new module [SPARK-53138][CORE][BUILD] Split common-utils Java code into a new module Aug 6, 2025
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not change the output of artifacts, but is required to fix the scaladocs packaging

[INFO] --- scala:4.9.5:doc-jar (attach-scaladocs) @ spark-network-yarn_2.13 ---
scaladoc error: fatal error: object scala in compiler mirror not found.

@github-actions github-actions bot added the INFRA label Aug 6, 2025
@pan3793
Copy link
Member Author

pan3793 commented Aug 6, 2025

this is ready for review, cc @LuciferYang @mridulm @dongjoon-hyun

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this approach, @pan3793 .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar changes should be incorporated into the maven_test.yml file. Meanwhile, both of these files should be utilized for the daily tests of branch-4.0/3.5. It is necessary to consider compatibility issues arising from the absence of certain modules in the relevant branches.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems branch-3.5 does not have the Maven daily test, I changed maven_test.yml in 21c892f99bcac44b28038ab1a0fd1034d09d3bc4, please help check.

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost of rewriting the Scala code in common-utils into Java would be significant. If we want to address this issue but have no better alternative, then I accept this solution.

But let's wait for others' opinions.

@pan3793
Copy link
Member Author

pan3793 commented Aug 8, 2025

Rebased to resolve conflicts

@pan3793
Copy link
Member Author

pan3793 commented Aug 11, 2025

Rebased to resolve conflicts

@LuciferYang
Copy link
Contributor

If there are no objections within 24 hours, I will merge this one

@LuciferYang
Copy link
Contributor

Merged into master for Apache Spark 4.1.0. Thanks @pan3793 and @dongjoon-hyun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants