-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-53064][CORE] Rewrite MDC LogKey in Java #51775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9bd987c to
f07cd20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's unable to write a class in Java that allows Scala to construct instances as same as Scala's case class, I create this method instead.
So, for logging in Scala
logInfo(log"Application ID is ${MDC(APP_ID, appId)}")previously MDC(APP_ID, appId) calls case class's constructor, now it calls this method.
|
For reviewers, you can check only the first commit of this PR, the second one only modifies |
60b6ddf to
9c832d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be an exception previously
|
@pan3793 Great work! LGTM overall. The changes are non-trivial, so I’ll take a closer look over the weekend. |
|
This is a great change, and overall it looks ok. I'll continue to review the details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the customer log key from Spark 4.0 won't work anymore?
cc @panbingkun since you are the author of this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so.
if we override the name(), then both 4.0 and this PR work. I updated to PR description to provide a demo project to help reviewers to verify that.
public class JavaCustomLogKeys {
// Custom `LogKey` must be `implements LogKey`
public static class CUSTOM_LOG_KEY implements LogKey {
+ @Override
+ public String name() {
+ return "custom_log_key";
+ }
}
// Singleton
public static final CUSTOM_LOG_KEY CUSTOM_LOG_KEY = new CUSTOM_LOG_KEY();
}BTW, the previous example have some issues on compilation
class JavaCustomLogKeys {
// Custom `LogKey` must be `implements LogKey`
public static class CUSTOM_LOG_KEY implements LogKey { }
// Singleton
public static final CUSTOM_LOG_KEY CUSTOM_LOG_KEY = new CUSTOM_LOG_KEY();
}
I can use javac to compile it, but scalac raises error
[INFO] --- scala:4.9.5:compile (scala-compile-first) @ SPARK-53064 ---
[INFO] Compiler bridge file: /Users/chengpan/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.13-1.10.8-bin_2.13.16__61.0-1.10.8_20250303T164041.jar
[INFO] compiling 2 Scala sources and 1 Java source to /Users/chengpan/Projects/SPARK-53064/target/classes ...
[ERROR] /Users/chengpan/Projects/SPARK-53064/src/main/java/io/github/pan3793/JavaCustomLogKeys.java:10: CUSTOM_LOG_KEY is already defined as class CUSTOM_LOG_KEY
[ERROR] one error found
gengliangwang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the works!
|
Rebased to resolve conflict |
| * Various keys used for mapped diagnostic contexts(MDC) in logging. All structured logging keys | ||
| * should be defined here for standardization. | ||
| */ | ||
| public enum LogKeys implements LogKey { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we employed a centralized Java enum to define all MDC keys. As far as I know, this approach results in the one-time initialization of all instances during class loading, thereby prolonging the class loading time and increasing memory usage.
Previously, because Scala's case objects are lazily loaded, unused keys did not consume memory or require initialization time.
Therefore, we may need to carry out some follow-up work later by splitting up this enum to mitigate the issues I've mentioned above.
If I'm mistaken, please correct me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your concern, but I am quite optimistic about this. The current approach avoids creating a class for each LogKey, creating hundreds of simple enum instances should not introduce much overhead for the JVM. I write a simple test to show that.
master branch
$ build/mvn -pl common/utils -am clean install -DskipTests
$ cat Test.java
import org.apache.spark.internal.LogKeys;
public class Main {
public static void main(String[] args) {
LogKeys.APP_ID$.MODULE$.name();
}
}
$ time java -cp common/utils/target/spark-common-utils_2.13-4.1.0-SNAPSHOT.jar:scala-library-2.13.16.jar Test.java
java -cp Test.java 0.73s user 0.04s system 215% cpu 0.360 total
$ time java -cp common/utils/target/spark-common-utils_2.13-4.1.0-SNAPSHOT.jar:scala-library-2.13.16.jar Test.java
java -cp Test.java 0.74s user 0.04s system 217% cpu 0.362 total
$ time java -cp common/utils/target/spark-common-utils_2.13-4.1.0-SNAPSHOT.jar:scala-library-2.13.16.jar Test.java
java -cp Test.java 0.74s user 0.04s system 217% cpu 0.359 total
this PR
$ build/mvn -pl common/utils -am clean install -DskipTests
$ cat Test.java
import org.apache.spark.internal.LogKeys;
public class Main {
public static void main(String[] args) {
LogKeys.APP_ID.name();
}
}
$ time java -cp common/utils/target/spark-common-utils_2.13-4.1.0-SNAPSHOT.jar:scala-library-2.13.16.jar Test.java
java -cp Test.java 0.60s user 0.05s system 165% cpu 0.396 total
$ time java -cp common/utils/target/spark-common-utils_2.13-4.1.0-SNAPSHOT.jar:scala-library-2.13.16.jar Test.java
java -cp Test.java 0.61s user 0.04s system 201% cpu 0.321 total
$ time java -cp common/utils/target/spark-common-utils_2.13-4.1.0-SNAPSHOT.jar:scala-library-2.13.16.jar Test.java
java -cp Test.java 0.60s user 0.04s system 199% cpu 0.316 total
it's even faster if I remove scala-library from the classpath
$ time java -cp common/utils/target/spark-common-utils_2.13-4.1.0-SNAPSHOT.jar Test.java
java -cp Test.java 0.55s user 0.03s system 207% cpu 0.278 total
$ time java -cp common/utils/target/spark-common-utils_2.13-4.1.0-SNAPSHOT.jar Test.java
java -cp Test.java 0.55s user 0.03s system 200% cpu 0.289 total
$ time java -cp common/utils/target/spark-common-utils_2.13-4.1.0-SNAPSHOT.jar Test.java
java -cp Test.java 0.54s user 0.04s system 195% cpu 0.294 total
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and yes, we can split LogKeys into multiple enumerations if we observe a real perf bottleneck in the future :)
| */ | ||
| private[spark] def enableStructuredLogging(): Unit = { | ||
| structuredLoggingEnabled = true | ||
| SparkLoggerFactory.enableStructuredLogging() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the modifications in these lines strongly related to the intent of this pull request (PR)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's necessary to make Java's Structured Logging API scala-free, otherwise, the Java SparkLogger still requires to access Scala Logging at runtime.
|
Merged into master. Thank you @pan3793 and @gengliangwang |
…dule ### What changes were proposed in this pull request? This PR splits the Java code of `common/utils` into a new module `common/utils-java`, except for: - `common/utils/src/main/java/org/apache/spark/storage/StorageLevelMapper.java` - `common/utils/src/main/java/org/apache/spark/SparkThrowable.java` A few utility methods are rewritten in Java to avoid depending on `scala-library`. ### Why are the changes needed? To make YARN ESS (`common/network-yarn`) scala-free again. Read the discussion of SPARK-52942 (#51650) for more details and the following PR. - #51775 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ``` $ build/mvn -pl common/network-yarn -am clean package -DskipTests -Pyarn ... [INFO] Including org.apache.spark:spark-network-shuffle_2.13:jar:4.1.0-SNAPSHOT in the shaded jar. [INFO] Including org.apache.spark:spark-network-common_2.13:jar:4.1.0-SNAPSHOT in the shaded jar. [INFO] Including io.netty:netty-all:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-buffer:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-codec:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-codec-dns:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-codec-http:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-codec-http2:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-codec-socks:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-common:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-handler:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-transport-native-unix-common:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-handler-proxy:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-resolver:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-resolver-dns:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-transport:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-transport-classes-epoll:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-transport-classes-kqueue:jar:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-transport-native-epoll:jar:linux-riscv64:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-transport-native-epoll:jar:linux-x86_64:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-transport-native-epoll:jar:linux-aarch_64:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-transport-native-kqueue:jar:osx-aarch_64:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-transport-native-kqueue:jar:osx-x86_64:4.1.123.Final in the shaded jar. [INFO] Including io.netty:netty-tcnative-boringssl-static:jar:linux-x86_64:2.0.72.Final in the shaded jar. [INFO] Including io.netty:netty-tcnative-classes:jar:2.0.72.Final in the shaded jar. [INFO] Including io.netty:netty-tcnative-boringssl-static:jar:windows-x86_64:2.0.72.Final in the shaded jar. [INFO] Including io.netty:netty-tcnative-boringssl-static:jar:linux-aarch_64:2.0.72.Final in the shaded jar. [INFO] Including io.netty:netty-tcnative-boringssl-static:jar:osx-aarch_64:2.0.72.Final in the shaded jar. [INFO] Including io.netty:netty-tcnative-boringssl-static:jar:osx-x86_64:2.0.72.Final in the shaded jar. [INFO] Including org.fusesource.leveldbjni:leveldbjni-all:jar:1.8 in the shaded jar. [INFO] Including org.rocksdb:rocksdbjni:jar:9.8.4 in the shaded jar. [INFO] Including com.fasterxml.jackson.core:jackson-databind:jar:2.19.2 in the shaded jar. [INFO] Including com.fasterxml.jackson.core:jackson-core:jar:2.19.2 in the shaded jar. [INFO] Including com.fasterxml.jackson.core:jackson-annotations:jar:2.19.2 in the shaded jar. [INFO] Including org.apache.commons:commons-crypto:jar:1.1.0 in the shaded jar. [INFO] Including com.google.crypto.tink:tink:jar:1.16.0 in the shaded jar. [INFO] Including com.google.code.gson:gson:jar:2.11.0 in the shaded jar. [INFO] Including org.apache.spark:spark-common-utils-java_2.13:jar:4.1.0-SNAPSHOT in the shaded jar. [INFO] Including org.slf4j:jul-to-slf4j:jar:2.0.17 in the shaded jar. [INFO] Including org.slf4j:jcl-over-slf4j:jar:2.0.17 in the shaded jar. [INFO] Including org.apache.logging.log4j:log4j-slf4j2-impl:jar:2.24.3 in the shaded jar. [INFO] Including org.apache.logging.log4j:log4j-api:jar:2.24.3 in the shaded jar. [INFO] Including org.apache.logging.log4j:log4j-core:jar:2.24.3 in the shaded jar. [INFO] Including org.apache.logging.log4j:log4j-1.2-api:jar:2.24.3 in the shaded jar. [INFO] Including org.apache.logging.log4j:log4j-layout-template-json:jar:2.24.3 in the shaded jar. [INFO] Including org.apache.commons:commons-lang3:jar:3.18.0 in the shaded jar. [INFO] Including io.dropwizard.metrics:metrics-core:jar:4.2.32 in the shaded jar. [INFO] Including org.roaringbitmap:RoaringBitmap:jar:1.3.0 in the shaded jar. [INFO] Including com.google.code.findbugs:jsr305:jar:3.0.0 in the shaded jar. [INFO] Including org.spark-project.spark:unused:jar:1.0.0 in the shaded jar. ... ``` Now, YARN ESS jar is scala-free, rocksdbjni and netty contribute the major size. ``` $ ll common/network-yarn/target/scala-2.13/spark-4.1.0-SNAPSHOT-yarn-shuffle.jar -rw-r--r-- 1 chengpan staff 92M Aug 6 16:54 common/network-yarn/target/scala-2.13/spark-4.1.0-SNAPSHOT-yarn-shuffle.jar $ jar tf common/network-yarn/target/scala-2.13/spark-4.1.0-SNAPSHOT-yarn-shuffle.jar | grep scala <no-output> $ ll ~/.m2/repository/org/rocksdb/rocksdbjni/9.8.4/rocksdbjni-9.8.4.jar -rw-r--r-- 1 chengpan staff 68M Jan 12 2025 /Users/chengpan/.m2/repository/org/rocksdb/rocksdbjni/9.8.4/rocksdbjni-9.8.4.jar ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #51868 from pan3793/SPARK-53138. Authored-by: Cheng Pan <[email protected]> Signed-off-by: yangjie01 <[email protected]>
| Cause | Type | Category | Description | Affected Files | |-------|------|----------|-------------|----------------| | - | Feat | Feature | Introduce Spark41Shims and update build configuration to support Spark 4.1. | pom.xml<br>shims/pom.xml<br>shims/spark41/pom.xml<br>shims/spark41/.../META-INF/services/org.apache.gluten.sql.shims.SparkShimProvider<br>shims/spark41/.../spark41/Spark41Shims.scala<br>shims/spark41/.../spark41/SparkShimProvider.scala | | [#51477](apache/spark#51477) | Fix | Compatibility | Use class name instead of class object for streaming call detection to ensure Spark 4.1 compatibility. | gluten-core/.../caller/CallerInfo.scala | | [#50852](apache/spark#50852) | Fix | Compatibility | Add printOutputColumns parameter to generateTreeString methods | shims/spark41/.../GenerateTreeStringShim.scala | | [#51775](apache/spark#51775) | Fix | Compatibility | Remove unused MDC import in FileSourceScanExecShim.scala | shims/spark41/.../FileSourceScanExecShim.scala | | [#51979](apache/spark#51979) | Fix | Compatibility | Add missing StoragePartitionJoinParams import in BatchScanExecShim and AbstractBatchScanExec | shims/spark41/.../v2/AbstractBatchScanExec.scala<br>shims/spark41/.../v2/BatchScanExecShim.scala | | [#51302](apache/spark#51302) | Fix | Compatibility | Remove TimeAdd from ExpressionConverter and ExpressionMappings for test | gluten-substrait/.../ExpressionConverter.scala<br>gluten-substrait/.../ExpressionMappings.scala | | [#50598](apache/spark#50598) | Fix | Compatibility | Adapt to QueryExecution.createSparkPlan interface change | gluten-substrait/.../GlutenImplicits.scala<br>shims/spark*/.../QueryExecutionShim.scala | | [#52599](apache/spark#52599) | Fix | Compatibility | Adapt to DataSourceV2Relation interface change | backends-velox/.../ArrowConvertorRule.scala<br>shims/spark*/.../v2/DataSourceV2RelationShim.scala | | [#52384](apache/spark#52384) | Fix | Compatibility | Using new interface of ParquetFooterReader | backends-velox/.../ParquetMetadataUtils.scala<br>gluten-ut/spark40/.../parquet/GlutenParquetRowIndexSuite.scala<br>shims/spark*/.../parquet/ParquetFooterReaderShim.scala | | [#52509](apache/spark#52509) | Fix | Build | Update Scala version to 2.13.17 in pom.xml to fix `java.lang.NoSuchMethodError: 'java.lang.String scala.util.hashing.MurmurHash3$.caseClassHash$default$2()'` | pom.xml | | - | Fix | Test | Refactor Spark version checks in VeloxHashJoinSuite to improve readability and maintainability | backends-velox/.../VeloxHashJoinSuite.scala | | [#50849](apache/spark#50849) | Fix | Test | Fix MiscOperatorSuite to support OneRowRelationExec plan Spark 4.1 | backends-velox/.../MiscOperatorSuite.scala | | [#52723](apache/spark#52723) | Fix | Compatibility | Add GeographyVal and GeometryVal support in ColumnarArrayShim | shims/spark41/.../vectorized/ColumnarArrayShim.java | | [#48470](apache/spark#48470) | 4.1.0 | Exclude | Exclude split test in VeloxStringFunctionsSuite | backends-velox/.../VeloxStringFunctionsSuite.scala | | [#51259](apache/spark#51259) | 4.1.0 | Exclude | Only Run ArrowEvalPythonExecSuite tests up to Spark 4.0, we need update ci python to 3.10 | backends-velox/.../python/ArrowEvalPythonExecSuite.scala |
| Cause | Type | Category | Description | Affected Files | |-------|------|----------|-------------|----------------| | - | Feat | Feature | Introduce Spark41Shims and update build configuration to support Spark 4.1. | pom.xml<br>shims/pom.xml<br>shims/spark41/pom.xml<br>shims/spark41/.../META-INF/services/org.apache.gluten.sql.shims.SparkShimProvider<br>shims/spark41/.../spark41/Spark41Shims.scala<br>shims/spark41/.../spark41/SparkShimProvider.scala | | [#51477](apache/spark#51477) | Fix | Compatibility | Use class name instead of class object for streaming call detection to ensure Spark 4.1 compatibility. | gluten-core/.../caller/CallerInfo.scala | | [#50852](apache/spark#50852) | Fix | Compatibility | Add printOutputColumns parameter to generateTreeString methods | shims/spark41/.../GenerateTreeStringShim.scala | | [#51775](apache/spark#51775) | Fix | Compatibility | Remove unused MDC import in FileSourceScanExecShim.scala | shims/spark41/.../FileSourceScanExecShim.scala | | [#51979](apache/spark#51979) | Fix | Compatibility | Add missing StoragePartitionJoinParams import in BatchScanExecShim and AbstractBatchScanExec | shims/spark41/.../v2/AbstractBatchScanExec.scala<br>shims/spark41/.../v2/BatchScanExecShim.scala | | [#51302](apache/spark#51302) | Fix | Compatibility | Remove TimeAdd from ExpressionConverter and ExpressionMappings for test | gluten-substrait/.../ExpressionConverter.scala<br>gluten-substrait/.../ExpressionMappings.scala | | [#50598](apache/spark#50598) | Fix | Compatibility | Adapt to QueryExecution.createSparkPlan interface change | gluten-substrait/.../GlutenImplicits.scala<br>shims/spark*/.../QueryExecutionShim.scala | | [#52599](apache/spark#52599) | Fix | Compatibility | Adapt to DataSourceV2Relation interface change | backends-velox/.../ArrowConvertorRule.scala<br>shims/spark*/.../v2/DataSourceV2RelationShim.scala | | [#52384](apache/spark#52384) | Fix | Compatibility | Using new interface of ParquetFooterReader | backends-velox/.../ParquetMetadataUtils.scala<br>gluten-ut/spark40/.../parquet/GlutenParquetRowIndexSuite.scala<br>shims/spark*/.../parquet/ParquetFooterReaderShim.scala | | [#52509](apache/spark#52509) | Fix | Build | Update Scala version to 2.13.17 in pom.xml to fix `java.lang.NoSuchMethodError: 'java.lang.String scala.util.hashing.MurmurHash3$.caseClassHash$default$2()'` | pom.xml | | - | Fix | Test | Refactor Spark version checks in VeloxHashJoinSuite to improve readability and maintainability | backends-velox/.../VeloxHashJoinSuite.scala | | [#50849](apache/spark#50849) | Fix | Test | Fix MiscOperatorSuite to support OneRowRelationExec plan Spark 4.1 | backends-velox/.../MiscOperatorSuite.scala | | [#52723](apache/spark#52723) | Fix | Compatibility | Add GeographyVal and GeometryVal support in ColumnarArrayShim | shims/spark41/.../vectorized/ColumnarArrayShim.java | | [#48470](apache/spark#48470) | 4.1.0 | Exclude | Exclude split test in VeloxStringFunctionsSuite | backends-velox/.../VeloxStringFunctionsSuite.scala | | [#51259](apache/spark#51259) | 4.1.0 | Exclude | Only Run ArrowEvalPythonExecSuite tests up to Spark 4.0, we need update ci python to 3.10 | backends-velox/.../python/ArrowEvalPythonExecSuite.scala |
| Cause | Type | Category | Description | Affected Files | |-------|------|----------|-------------|----------------| | - | Feat | Feature | Introduce Spark41Shims and update build configuration to support Spark 4.1. | pom.xml<br>shims/pom.xml<br>shims/spark41/pom.xml<br>shims/spark41/.../META-INF/services/org.apache.gluten.sql.shims.SparkShimProvider<br>shims/spark41/.../spark41/Spark41Shims.scala<br>shims/spark41/.../spark41/SparkShimProvider.scala | | [#51477](apache/spark#51477) | Fix | Compatibility | Use class name instead of class object for streaming call detection to ensure Spark 4.1 compatibility. | gluten-core/.../caller/CallerInfo.scala | | [#50852](apache/spark#50852) | Fix | Compatibility | Add printOutputColumns parameter to generateTreeString methods | shims/spark41/.../GenerateTreeStringShim.scala | | [#51775](apache/spark#51775) | Fix | Compatibility | Remove unused MDC import in FileSourceScanExecShim.scala | shims/spark41/.../FileSourceScanExecShim.scala | | [#51979](apache/spark#51979) | Fix | Compatibility | Add missing StoragePartitionJoinParams import in BatchScanExecShim and AbstractBatchScanExec | shims/spark41/.../v2/AbstractBatchScanExec.scala<br>shims/spark41/.../v2/BatchScanExecShim.scala | | [#51302](apache/spark#51302) | Fix | Compatibility | Remove TimeAdd from ExpressionConverter and ExpressionMappings for test | gluten-substrait/.../ExpressionConverter.scala<br>gluten-substrait/.../ExpressionMappings.scala | | [#50598](apache/spark#50598) | Fix | Compatibility | Adapt to QueryExecution.createSparkPlan interface change | gluten-substrait/.../GlutenImplicits.scala<br>shims/spark\*/.../shims/spark\*/Spark*Shims.scala | | [#52599](apache/spark#52599) | Fix | Compatibility | Adapt to DataSourceV2Relation interface change | backends-velox/.../ArrowConvertorRule.scala | | [#52384](apache/spark#52384) | Fix | Compatibility | Using new interface of ParquetFooterReader | backends-velox/.../ParquetMetadataUtils.scala<br>gluten-ut/spark40/.../parquet/GlutenParquetRowIndexSuite.scala<br>shims/spark*/.../parquet/ParquetFooterReaderShim.scala | | [#52509](apache/spark#52509) | Fix | Build | Update Scala version to 2.13.17 in pom.xml to fix `java.lang.NoSuchMethodError: 'java.lang.String scala.util.hashing.MurmurHash3$.caseClassHash$default$2()'` | pom.xml | | - | Fix | Test | Refactor Spark version checks in VeloxHashJoinSuite to improve readability and maintainability | backends-velox/.../VeloxHashJoinSuite.scala | | [#50849](apache/spark#50849) | Fix | Test | Fix MiscOperatorSuite to support OneRowRelationExec plan Spark 4.1 | backends-velox/.../MiscOperatorSuite.scala | | [#52723](apache/spark#52723) | Fix | Compatibility | Add GeographyVal and GeometryVal support in ColumnarArrayShim | shims/spark41/.../vectorized/ColumnarArrayShim.java | | [#48470](apache/spark#48470) | 4.1.0 | Exclude | Exclude split test in VeloxStringFunctionsSuite | backends-velox/.../VeloxStringFunctionsSuite.scala | | [#51259](apache/spark#51259) | 4.1.0 | Exclude | Only Run ArrowEvalPythonExecSuite tests up to Spark 4.0, we need update ci python to 3.10 | backends-velox/.../python/ArrowEvalPythonExecSuite.scala |
| Cause | Type | Category | Description | Affected Files | |-------|------|----------|-------------|----------------| | - | Feat | Feature | Introduce Spark41Shims and update build configuration to support Spark 4.1. | pom.xml<br>shims/pom.xml<br>shims/spark41/pom.xml<br>shims/spark41/.../META-INF/services/org.apache.gluten.sql.shims.SparkShimProvider<br>shims/spark41/.../spark41/Spark41Shims.scala<br>shims/spark41/.../spark41/SparkShimProvider.scala | | [#51477](apache/spark#51477) | Fix | Compatibility | Use class name instead of class object for streaming call detection to ensure Spark 4.1 compatibility. | gluten-core/.../caller/CallerInfo.scala | | [#50852](apache/spark#50852) | Fix | Compatibility | Add printOutputColumns parameter to generateTreeString methods | shims/spark41/.../GenerateTreeStringShim.scala | | [#51775](apache/spark#51775) | Fix | Compatibility | Remove unused MDC import in FileSourceScanExecShim.scala | shims/spark41/.../FileSourceScanExecShim.scala | | [#51979](apache/spark#51979) | Fix | Compatibility | Add missing StoragePartitionJoinParams import in BatchScanExecShim and AbstractBatchScanExec | shims/spark41/.../v2/AbstractBatchScanExec.scala<br>shims/spark41/.../v2/BatchScanExecShim.scala | | [#51302](apache/spark#51302) | Fix | Compatibility | Remove TimeAdd from ExpressionConverter and ExpressionMappings for test | gluten-substrait/.../ExpressionConverter.scala<br>gluten-substrait/.../ExpressionMappings.scala | | [#50598](apache/spark#50598) | Fix | Compatibility | Adapt to QueryExecution.createSparkPlan interface change | gluten-substrait/.../GlutenImplicits.scala<br>shims/spark\*/.../shims/spark\*/Spark*Shims.scala | | [#52599](apache/spark#52599) | Fix | Compatibility | Adapt to DataSourceV2Relation interface change | backends-velox/.../ArrowConvertorRule.scala | | [#52384](apache/spark#52384) | Fix | Compatibility | Using new interface of ParquetFooterReader | backends-velox/.../ParquetMetadataUtils.scala<br>gluten-ut/spark40/.../parquet/GlutenParquetRowIndexSuite.scala<br>shims/spark*/.../parquet/ParquetFooterReaderShim.scala | | [#52509](apache/spark#52509) | Fix | Build | Update Scala version to 2.13.17 in pom.xml to fix `java.lang.NoSuchMethodError: 'java.lang.String scala.util.hashing.MurmurHash3$.caseClassHash$default$2()'` | pom.xml | | - | Fix | Test | Refactor Spark version checks in VeloxHashJoinSuite to improve readability and maintainability | backends-velox/.../VeloxHashJoinSuite.scala | | [#50849](apache/spark#50849) | Fix | Test | Fix MiscOperatorSuite to support OneRowRelationExec plan Spark 4.1 | backends-velox/.../MiscOperatorSuite.scala | | [#52723](apache/spark#52723) | Fix | Compatibility | Add GeographyVal and GeometryVal support in ColumnarArrayShim | shims/spark41/.../vectorized/ColumnarArrayShim.java | | [#48470](apache/spark#48470) | 4.1.0 | Exclude | Exclude split test in VeloxStringFunctionsSuite | backends-velox/.../VeloxStringFunctionsSuite.scala | | [#51259](apache/spark#51259) | 4.1.0 | Exclude | Only Run ArrowEvalPythonExecSuite tests up to Spark 4.0, we need update ci python to 3.10 | backends-velox/.../python/ArrowEvalPythonExecSuite.scala |
| Cause | Type | Category | Description | Affected Files | |-------|------|----------|-------------|----------------| | - | Feat | Feature | Introduce Spark41Shims and update build configuration to support Spark 4.1. | pom.xml<br>shims/pom.xml<br>shims/spark41/pom.xml<br>shims/spark41/.../META-INF/services/org.apache.gluten.sql.shims.SparkShimProvider<br>shims/spark41/.../spark41/Spark41Shims.scala<br>shims/spark41/.../spark41/SparkShimProvider.scala | | [#51477](apache/spark#51477) | Fix | Compatibility | Use class name instead of class object for streaming call detection to ensure Spark 4.1 compatibility. | gluten-core/.../caller/CallerInfo.scala | | [#50852](apache/spark#50852) | Fix | Compatibility | Add printOutputColumns parameter to generateTreeString methods | shims/spark41/.../GenerateTreeStringShim.scala | | [#51775](apache/spark#51775) | Fix | Compatibility | Remove unused MDC import in FileSourceScanExecShim.scala | shims/spark41/.../FileSourceScanExecShim.scala | | [#51979](apache/spark#51979) | Fix | Compatibility | Add missing StoragePartitionJoinParams import in BatchScanExecShim and AbstractBatchScanExec | shims/spark41/.../v2/AbstractBatchScanExec.scala<br>shims/spark41/.../v2/BatchScanExecShim.scala | | [#51302](apache/spark#51302) | Fix | Compatibility | Remove TimeAdd from ExpressionConverter and ExpressionMappings for test | gluten-substrait/.../ExpressionConverter.scala<br>gluten-substrait/.../ExpressionMappings.scala | | [#50598](apache/spark#50598) | Fix | Compatibility | Adapt to QueryExecution.createSparkPlan interface change | gluten-substrait/.../GlutenImplicits.scala<br>shims/spark\*/.../shims/spark\*/Spark*Shims.scala | | [#52599](apache/spark#52599) | Fix | Compatibility | Adapt to DataSourceV2Relation interface change | backends-velox/.../ArrowConvertorRule.scala | | [#52384](apache/spark#52384) | Fix | Compatibility | Using new interface of ParquetFooterReader | backends-velox/.../ParquetMetadataUtils.scala<br>gluten-ut/spark40/.../parquet/GlutenParquetRowIndexSuite.scala<br>shims/spark*/.../parquet/ParquetFooterReaderShim.scala | | [#52509](apache/spark#52509) | Fix | Build | Update Scala version to 2.13.17 in pom.xml to fix `java.lang.NoSuchMethodError: 'java.lang.String scala.util.hashing.MurmurHash3$.caseClassHash$default$2()'` | pom.xml | | - | Fix | Test | Refactor Spark version checks in VeloxHashJoinSuite to improve readability and maintainability | backends-velox/.../VeloxHashJoinSuite.scala | | [#50849](apache/spark#50849) | Fix | Test | Fix MiscOperatorSuite to support OneRowRelationExec plan Spark 4.1 | backends-velox/.../MiscOperatorSuite.scala | | [#52723](apache/spark#52723) | Fix | Compatibility | Add GeographyVal and GeometryVal support in ColumnarArrayShim | shims/spark41/.../vectorized/ColumnarArrayShim.java | | [#48470](apache/spark#48470) | 4.1.0 | Exclude | Exclude split test in VeloxStringFunctionsSuite | backends-velox/.../VeloxStringFunctionsSuite.scala | | [#51259](apache/spark#51259) | 4.1.0 | Exclude | Only Run ArrowEvalPythonExecSuite tests up to Spark 4.0, we need update ci python to 3.10 | backends-velox/.../python/ArrowEvalPythonExecSuite.scala |
| Cause | Type | Category | Description | Affected Files | |-------|------|----------|-------------|----------------| | - | Feat | Feature | Introduce Spark41Shims and update build configuration to support Spark 4.1. | pom.xml<br>shims/pom.xml<br>shims/spark41/pom.xml<br>shims/spark41/.../META-INF/services/org.apache.gluten.sql.shims.SparkShimProvider<br>shims/spark41/.../spark41/Spark41Shims.scala<br>shims/spark41/.../spark41/SparkShimProvider.scala | | [#51477](apache/spark#51477) | Fix | Compatibility | Use class name instead of class object for streaming call detection to ensure Spark 4.1 compatibility. | gluten-core/.../caller/CallerInfo.scala | | [#50852](apache/spark#50852) | Fix | Compatibility | Add printOutputColumns parameter to generateTreeString methods | shims/spark41/.../GenerateTreeStringShim.scala | | [#51775](apache/spark#51775) | Fix | Compatibility | Remove unused MDC import in FileSourceScanExecShim.scala | shims/spark41/.../FileSourceScanExecShim.scala | | [#51979](apache/spark#51979) | Fix | Compatibility | Add missing StoragePartitionJoinParams import in BatchScanExecShim and AbstractBatchScanExec | shims/spark41/.../v2/AbstractBatchScanExec.scala<br>shims/spark41/.../v2/BatchScanExecShim.scala | | [#51302](apache/spark#51302) | Fix | Compatibility | Remove TimeAdd from ExpressionConverter and ExpressionMappings for test | gluten-substrait/.../ExpressionConverter.scala<br>gluten-substrait/.../ExpressionMappings.scala | | [#50598](apache/spark#50598) | Fix | Compatibility | Adapt to QueryExecution.createSparkPlan interface change | gluten-substrait/.../GlutenImplicits.scala<br>shims/spark\*/.../shims/spark\*/Spark*Shims.scala | | [#52599](apache/spark#52599) | Fix | Compatibility | Adapt to DataSourceV2Relation interface change | backends-velox/.../ArrowConvertorRule.scala | | [#52384](apache/spark#52384) | Fix | Compatibility | Using new interface of ParquetFooterReader | backends-velox/.../ParquetMetadataUtils.scala<br>gluten-ut/spark40/.../parquet/GlutenParquetRowIndexSuite.scala<br>shims/spark*/.../parquet/ParquetFooterReaderShim.scala | | [#52509](apache/spark#52509) | Fix | Build | Update Scala version to 2.13.17 in pom.xml to fix `java.lang.NoSuchMethodError: 'java.lang.String scala.util.hashing.MurmurHash3$.caseClassHash$default$2()'` | pom.xml | | - | Fix | Test | Refactor Spark version checks in VeloxHashJoinSuite to improve readability and maintainability | backends-velox/.../VeloxHashJoinSuite.scala | | [#50849](apache/spark#50849) | Fix | Test | Fix MiscOperatorSuite to support OneRowRelationExec plan Spark 4.1 | backends-velox/.../MiscOperatorSuite.scala | | [#52723](apache/spark#52723) | Fix | Compatibility | Add GeographyVal and GeometryVal support in ColumnarArrayShim | shims/spark41/.../vectorized/ColumnarArrayShim.java | | [#48470](apache/spark#48470) | 4.1.0 | Exclude | Exclude split test in VeloxStringFunctionsSuite | backends-velox/.../VeloxStringFunctionsSuite.scala | | [#51259](apache/spark#51259) | 4.1.0 | Exclude | Only Run ArrowEvalPythonExecSuite tests up to Spark 4.0, we need update ci python to 3.10 | backends-velox/.../python/ArrowEvalPythonExecSuite.scala |
| Cause | Type | Category | Description | Affected Files | |-------|------|----------|-------------|----------------| | - | Feat | Feature | Introduce Spark41Shims and update build configuration to support Spark 4.1. | pom.xml<br>shims/pom.xml<br>shims/spark41/pom.xml<br>shims/spark41/.../META-INF/services/org.apache.gluten.sql.shims.SparkShimProvider<br>shims/spark41/.../spark41/Spark41Shims.scala<br>shims/spark41/.../spark41/SparkShimProvider.scala | | [#51477](apache/spark#51477) | Fix | Compatibility | Use class name instead of class object for streaming call detection to ensure Spark 4.1 compatibility. | gluten-core/.../caller/CallerInfo.scala | | [#50852](apache/spark#50852) | Fix | Compatibility | Add printOutputColumns parameter to generateTreeString methods | shims/spark41/.../GenerateTreeStringShim.scala | | [#51775](apache/spark#51775) | Fix | Compatibility | Remove unused MDC import in FileSourceScanExecShim.scala | shims/spark41/.../FileSourceScanExecShim.scala | | [#51979](apache/spark#51979) | Fix | Compatibility | Add missing StoragePartitionJoinParams import in BatchScanExecShim and AbstractBatchScanExec | shims/spark41/.../v2/AbstractBatchScanExec.scala<br>shims/spark41/.../v2/BatchScanExecShim.scala | | [#51302](apache/spark#51302) | Fix | Compatibility | Remove TimeAdd from ExpressionConverter and ExpressionMappings for test | gluten-substrait/.../ExpressionConverter.scala<br>gluten-substrait/.../ExpressionMappings.scala | | [#50598](apache/spark#50598) | Fix | Compatibility | Adapt to QueryExecution.createSparkPlan interface change | gluten-substrait/.../GlutenImplicits.scala<br>shims/spark\*/.../shims/spark\*/Spark*Shims.scala | | [#52599](apache/spark#52599) | Fix | Compatibility | Adapt to DataSourceV2Relation interface change | backends-velox/.../ArrowConvertorRule.scala | | [#52384](apache/spark#52384) | Fix | Compatibility | Using new interface of ParquetFooterReader | backends-velox/.../ParquetMetadataUtils.scala<br>gluten-ut/spark40/.../parquet/GlutenParquetRowIndexSuite.scala<br>shims/spark*/.../parquet/ParquetFooterReaderShim.scala | | [#52509](apache/spark#52509) | Fix | Build | Update Scala version to 2.13.17 in pom.xml to fix `java.lang.NoSuchMethodError: 'java.lang.String scala.util.hashing.MurmurHash3$.caseClassHash$default$2()'` | pom.xml | | - | Fix | Test | Refactor Spark version checks in VeloxHashJoinSuite to improve readability and maintainability | backends-velox/.../VeloxHashJoinSuite.scala | | [#50849](apache/spark#50849) | Fix | Test | Fix MiscOperatorSuite to support OneRowRelationExec plan Spark 4.1 | backends-velox/.../MiscOperatorSuite.scala | | [#52723](apache/spark#52723) | Fix | Compatibility | Add GeographyVal and GeometryVal support in ColumnarArrayShim | shims/spark41/.../vectorized/ColumnarArrayShim.java | | [#48470](apache/spark#48470) | 4.1.0 | Exclude | Exclude split test in VeloxStringFunctionsSuite | backends-velox/.../VeloxStringFunctionsSuite.scala | | [#51259](apache/spark#51259) | 4.1.0 | Exclude | Only Run ArrowEvalPythonExecSuite tests up to Spark 4.0, we need update ci python to 3.10 | backends-velox/.../python/ArrowEvalPythonExecSuite.scala |
| Cause | Type | Category | Description | Affected Files | |-------|------|----------|-------------|----------------| | - | Feat | Feature | Introduce Spark41Shims and update build configuration to support Spark 4.1. | pom.xml<br>shims/pom.xml<br>shims/spark41/pom.xml<br>shims/spark41/.../META-INF/services/org.apache.gluten.sql.shims.SparkShimProvider<br>shims/spark41/.../spark41/Spark41Shims.scala<br>shims/spark41/.../spark41/SparkShimProvider.scala | | [#51477](apache/spark#51477) | Fix | Compatibility | Use class name instead of class object for streaming call detection to ensure Spark 4.1 compatibility. | gluten-core/.../caller/CallerInfo.scala | | [#50852](apache/spark#50852) | Fix | Compatibility | Add printOutputColumns parameter to generateTreeString methods | shims/spark41/.../GenerateTreeStringShim.scala | | [#51775](apache/spark#51775) | Fix | Compatibility | Remove unused MDC import in FileSourceScanExecShim.scala | shims/spark41/.../FileSourceScanExecShim.scala | | [#51979](apache/spark#51979) | Fix | Compatibility | Add missing StoragePartitionJoinParams import in BatchScanExecShim and AbstractBatchScanExec | shims/spark41/.../v2/AbstractBatchScanExec.scala<br>shims/spark41/.../v2/BatchScanExecShim.scala | | [#51302](apache/spark#51302) | Fix | Compatibility | Remove TimeAdd from ExpressionConverter and ExpressionMappings for test | gluten-substrait/.../ExpressionConverter.scala<br>gluten-substrait/.../ExpressionMappings.scala | | [#50598](apache/spark#50598) | Fix | Compatibility | Adapt to QueryExecution.createSparkPlan interface change | gluten-substrait/.../GlutenImplicits.scala<br>shims/spark\*/.../shims/spark\*/Spark*Shims.scala | | [#52599](apache/spark#52599) | Fix | Compatibility | Adapt to DataSourceV2Relation interface change | backends-velox/.../ArrowConvertorRule.scala | | [#52384](apache/spark#52384) | Fix | Compatibility | Using new interface of ParquetFooterReader | backends-velox/.../ParquetMetadataUtils.scala<br>gluten-ut/spark40/.../parquet/GlutenParquetRowIndexSuite.scala<br>shims/spark*/.../parquet/ParquetFooterReaderShim.scala | | [#52509](apache/spark#52509) | Fix | Build | Update Scala version to 2.13.17 in pom.xml to fix `java.lang.NoSuchMethodError: 'java.lang.String scala.util.hashing.MurmurHash3$.caseClassHash$default$2()'` | pom.xml | | - | Fix | Test | Refactor Spark version checks in VeloxHashJoinSuite to improve readability and maintainability | backends-velox/.../VeloxHashJoinSuite.scala | | [#50849](apache/spark#50849) | Fix | Test | Fix MiscOperatorSuite to support OneRowRelationExec plan Spark 4.1 | backends-velox/.../MiscOperatorSuite.scala | | [#52723](apache/spark#52723) | Fix | Compatibility | Add GeographyVal and GeometryVal support in ColumnarArrayShim | shims/spark41/.../vectorized/ColumnarArrayShim.java | | [#48470](apache/spark#48470) | 4.1.0 | Exclude | Exclude split test in VeloxStringFunctionsSuite | backends-velox/.../VeloxStringFunctionsSuite.scala | | [#51259](apache/spark#51259) | 4.1.0 | Exclude | Only Run ArrowEvalPythonExecSuite tests up to Spark 4.0, we need update ci python to 3.10 | backends-velox/.../python/ArrowEvalPythonExecSuite.scala |
| Cause | Type | Category | Description | Affected Files | |-------|------|----------|-------------|----------------| | - | Feat | Feature | Introduce Spark41Shims and update build configuration to support Spark 4.1. | pom.xml<br>shims/pom.xml<br>shims/spark41/pom.xml<br>shims/spark41/.../META-INF/services/org.apache.gluten.sql.shims.SparkShimProvider<br>shims/spark41/.../spark41/Spark41Shims.scala<br>shims/spark41/.../spark41/SparkShimProvider.scala | | [#51477](apache/spark#51477) | Fix | Compatibility | Use class name instead of class object for streaming call detection to ensure Spark 4.1 compatibility. | gluten-core/.../caller/CallerInfo.scala | | [#50852](apache/spark#50852) | Fix | Compatibility | Add printOutputColumns parameter to generateTreeString methods | shims/spark41/.../GenerateTreeStringShim.scala | | [#51775](apache/spark#51775) | Fix | Compatibility | Remove unused MDC import in FileSourceScanExecShim.scala | shims/spark41/.../FileSourceScanExecShim.scala | | [#51979](apache/spark#51979) | Fix | Compatibility | Add missing StoragePartitionJoinParams import in BatchScanExecShim and AbstractBatchScanExec | shims/spark41/.../v2/AbstractBatchScanExec.scala<br>shims/spark41/.../v2/BatchScanExecShim.scala | | [#51302](apache/spark#51302) | Fix | Compatibility | Remove TimeAdd from ExpressionConverter and ExpressionMappings for test | gluten-substrait/.../ExpressionConverter.scala<br>gluten-substrait/.../ExpressionMappings.scala | | [#50598](apache/spark#50598) | Fix | Compatibility | Adapt to QueryExecution.createSparkPlan interface change | gluten-substrait/.../GlutenImplicits.scala<br>shims/spark\*/.../shims/spark\*/Spark*Shims.scala | | [#52599](apache/spark#52599) | Fix | Compatibility | Adapt to DataSourceV2Relation interface change | backends-velox/.../ArrowConvertorRule.scala | | [#52384](apache/spark#52384) | Fix | Compatibility | Using new interface of ParquetFooterReader | backends-velox/.../ParquetMetadataUtils.scala<br>gluten-ut/spark40/.../parquet/GlutenParquetRowIndexSuite.scala<br>shims/spark*/.../parquet/ParquetFooterReaderShim.scala | | [#52509](apache/spark#52509) | Fix | Build | Update Scala version to 2.13.17 in pom.xml to fix `java.lang.NoSuchMethodError: 'java.lang.String scala.util.hashing.MurmurHash3$.caseClassHash$default$2()'` | pom.xml | | - | Fix | Test | Refactor Spark version checks in VeloxHashJoinSuite to improve readability and maintainability | backends-velox/.../VeloxHashJoinSuite.scala | | [#50849](apache/spark#50849) | Fix | Test | Fix MiscOperatorSuite to support OneRowRelationExec plan Spark 4.1 | backends-velox/.../MiscOperatorSuite.scala | | [#52723](apache/spark#52723) | Fix | Compatibility | Add GeographyVal and GeometryVal support in ColumnarArrayShim | shims/spark41/.../vectorized/ColumnarArrayShim.java | | [#48470](apache/spark#48470) | 4.1.0 | Exclude | Exclude split test in VeloxStringFunctionsSuite | backends-velox/.../VeloxStringFunctionsSuite.scala | | [#51259](apache/spark#51259) | 4.1.0 | Exclude | Only Run ArrowEvalPythonExecSuite tests up to Spark 4.0, we need update ci python to 3.10 | backends-velox/.../python/ArrowEvalPythonExecSuite.scala |
What changes were proposed in this pull request?
This PR proposes to rewrite a few classes used by the Structured Logging API from Scala to Java.
Why are the changes needed?
Previously (before 3.5), modules under
commonwere pure Java, and were easy to embed into other services, for example, the YARN External Shuffle Service will be run as a plugin of the YARN Resource Manager daemon process. With recent years' changes, some pure Java modules also requirescala-libraryto be present at runtime, i.e., SPARK-52942 reports that YARN ESS causes YARN RM to fail to start due to missingscala-libraryin the classpath.Instead of bundling
scala-libraryinto YARN ESS jar, #51650 (comment) suggests making it scala-free again.This also makes Java's invocation of Structured Logging API much cleaner, now, it can be called without ugly
$.MODULE$.Does this PR introduce any user-facing change?
No, they are internal APIs, for those plugin developers who want to provide custom
LogKeys, there still possible to make it compatible with both Spark 4.0 and the new API proposed by this PR, see https://github.com/pan3793/SPARK-53064How was this patch tested?
Pass GHA, and verified YARN ESS works without
scala-library.Was this patch authored or co-authored using generative AI tooling?
No.