-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26507][CORE] Fix core tests for Java 11 #23419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| "my-accumulator-2" -> acc2) | ||
| LongAccumulatorSource.register(mockContext, accs) | ||
| val captor = new ArgumentCaptor[AccumulatorSource]() | ||
| val captor = ArgumentCaptor.forClass(classOf[AccumulatorSource]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually just cleaning up a deprecation warning along the way; not strictly required for Java 11
| // This prints something useful if the JSON strings don't match | ||
| println("=== EXPECTED ===\n" + expectedJson + "\n") | ||
| println("=== ACTUAL ===\n" + actualJson + "\n") | ||
| println(s"=== EXPECTED ===\n${pretty(expectedJson)}\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of Properties, based on Hashtable, is returning elements in a different order in Java 11. Just comparing the actual JSON content rather than its string representation shows it still produces semantically correct output, so I changed the test.
| Map<String, String> env = new HashMap<>(); | ||
| List<String> cmd = buildCommand(sparkSubmitArgs, env); | ||
| assertEquals("python", cmd.get(cmd.size() - 1)); | ||
| assertTrue(Arrays.asList("python", "python2", "python3").contains(cmd.get(cmd.size() - 1))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not strictly related, but something I caught while debugging. The Pyspark python interpreter might legitimately be set to a few other values.
pom.xml
Outdated
| </includes> | ||
| <reportsDirectory>${project.build.directory}/surefire-reports</reportsDirectory> | ||
| <argLine>-ea -Xmx3g -Xss4m -XX:ReservedCodeCacheSize=${CodeCacheSize}</argLine> | ||
| <argLine>-ea -Xmx6g -Xss4m -XX:ReservedCodeCacheSize=${CodeCacheSize}</argLine> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Required now to make TimSort tests pass in Java 11, probably because of how direct buffer handling has changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to know what changed. Do you have an idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What tests failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TimSort test failed, where it allocates a very very large array.
I'm pretty sure it's because we can't get around the off-heap allocation limits in Java 9+ (see https://github.com/apache/spark/pull/22993/files). I think this would also pass if we manually set the off-heap limit very high, or allowed access to the Cleaner class on the command line. Upping the memory seemed like a simpler workaround but those should be fine too.
EDIT: I don't think this is right. See #23419 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the % changed is a quite a bit though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for increasing this. I also saw multiple TimSort OOM failures even in Apache Spark Jenkins environment(JDK8).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixcheung Let me try 4G or 5G. I think I just doubled it and it worked so I left it, but may not be necessary to make it that big.
| </executions> | ||
| <configuration> | ||
| <scalaVersion>${scala.version}</scalaVersion> | ||
| <checkMultipleScalaVersions>true</checkMultipleScalaVersions> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not strictly related, but helpful when I was debugging another Java 11 issue.
|
Test build #100584 has finished for PR 23419 at commit
|
|
Test build #4490 has finished for PR 23419 at commit
|
pom.xml
Outdated
| </includes> | ||
| <reportsDirectory>${project.build.directory}/surefire-reports</reportsDirectory> | ||
| <argLine>-ea -Xmx3g -Xss4m -XX:ReservedCodeCacheSize=${CodeCacheSize}</argLine> | ||
| <argLine>-ea -Xmx6g -Xss4m -XX:ReservedCodeCacheSize=${CodeCacheSize}</argLine> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the % changed is a quite a bit though
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. I also tested JDK11 manually (and +1 for the removal of UtilsSuite.Safe getSimpleName test case).
|
@dongjoon-hyun by the way, how did you test manually? I find we still can't build Spark with Java 11 (Scala itself and some build tools don't seem to be ready yet) but can run Spark tests with Java 11. (We will need to continue to build with Java 8 I think to ensure it runs on Java 8, so that's not something we have to 'fix') I stopped at getting all tests up to but not including SQL working, as I was getting some possibly-flaky test failures there. Did you see it run SQL or catalyst tests successfully? |
|
Test build #100625 has finished for PR 23419 at commit
|
|
@srowen . I did run sbt test on $ export JAVA_HOME=$APACHE/jdk-release/jdk-11.0.1.jdk/Contents/Home
$ java -version
openjdk version "11.0.1" 2018-10-16
OpenJDK Runtime Environment 18.9 (build 11.0.1+13)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.1+13, mixed mode)
$ build/sbt "project core" test |
|
OK that's good news; maybe the tests really all do pass now :) @felixcheung Looks like |
|
@srowen . Could you update |
|
For |
|
Following my comment at #23419 (comment) -- I don't think that's the explanation. The test itself fails on allocating a big int array, which itself is not related to any Java 11 ByteBuffer changes. It may be that this test is just where a huge allocation takes place and where the higher memory usage causes a problem. I am worried that being unable to set a This change itself ought to be safe and fine even if we later change the memory limit back down, but I'm first going to try some other changes locally to confirm/deny. |
|
Sure. What I asked is to keep the consistency for JVM option in both |
|
Oh right I understand now. Well, I think I'm going to revert that change now anyway. When I made a change to avoid DirectByteBuffer without a Cleaner, it passed again with 3g on Java 11. I will merge this then when it passes again and follow up with my fix separately. |
|
That's great! Yep. +1 for handling them separately. |
|
OK, so you are suggesting increasing the heap size there just because it currently fails sometimes? that's fine too, I can also make that change separately. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 again. I also tested this PR once more with the last commit.
At this time, I tested inside openjdk:11 Docker image to be sure. The following is the result.
$ docker run -it --rm openjdk:11 /bin/bash
...
[info] ScalaTest
[info] Run completed in 23 minutes, 3 seconds.
[info] Total number of tests run: 2222
[info] Suites: completed 219, aborted 0
[info] Tests: succeeded 2222, failed 0, canceled 12, ignored 7, pending 0
[info] All tests passed.
[info] Passed: Total 2471, Failed 0, Errors 0, Passed 2471, Ignored 7, Canceled 12
[success] Total time: 1528 s, completed Jan 2, 2019, 3:53:53 AM
spark@310d9c4dbae0:~/spark$ java -version
openjdk version "11.0.1" 2018-10-16
OpenJDK Runtime Environment (build 11.0.1+13-Debian-2bpo91)
OpenJDK 64-Bit Server VM (build 11.0.1+13-Debian-2bpo91, mixed mode, sharing)
spark@310d9c4dbae0:~/spark$ git log --oneline -n3
e4551bd63b Revert mem change to 3g
729a3c9a8a See if TimSort passes with 4g heap
d4b0652270 Additional Java 11 test fixes
spark@310d9c4dbae0:~/spark$
|
Test build #100629 has finished for PR 23419 at commit
|
|
Merged to master |
|
Thank you for fixing this, @srowen ! |
|
+1 as well. |
… if Cleaner can't be set ## What changes were proposed in this pull request? In Java 9+ we can't use sun.misc.Cleaner by default anymore, and this was largely handled in #22993 However I think the change there left a significant problem. If a DirectByteBuffer is allocated using the reflective hack in Platform, now, we by default can't set a Cleaner. But I believe this means the memory isn't freed promptly or possibly at all. If a Cleaner can't be set, I think we need to use normal APIs to allocate the direct ByteBuffer. According to comments in the code, the downside is simply that the normal APIs will check and impose limits on how much off-heap memory can be allocated. Per the original review on #22993 this much seems fine, as either way in this case the user would have to add a JVM setting (increase max, or allow the reflective access). ## How was this patch tested? Existing tests. This resolved an OutOfMemoryError in Java 11 from TimSort tests without increasing test heap size. (See #23419 (comment) ) This suggests there is a problem and that this resolves it. Closes #23424 from srowen/SPARK-24421.2. Authored-by: Sean Owen <[email protected]> Signed-off-by: Sean Owen <[email protected]>
## What changes were proposed in this pull request? This should make tests in core modules pass for Java 11. ## How was this patch tested? Existing tests, with modifications. Closes apache#23419 from srowen/Java11. Authored-by: Sean Owen <[email protected]> Signed-off-by: Sean Owen <[email protected]>
… if Cleaner can't be set ## What changes were proposed in this pull request? In Java 9+ we can't use sun.misc.Cleaner by default anymore, and this was largely handled in apache#22993 However I think the change there left a significant problem. If a DirectByteBuffer is allocated using the reflective hack in Platform, now, we by default can't set a Cleaner. But I believe this means the memory isn't freed promptly or possibly at all. If a Cleaner can't be set, I think we need to use normal APIs to allocate the direct ByteBuffer. According to comments in the code, the downside is simply that the normal APIs will check and impose limits on how much off-heap memory can be allocated. Per the original review on apache#22993 this much seems fine, as either way in this case the user would have to add a JVM setting (increase max, or allow the reflective access). ## How was this patch tested? Existing tests. This resolved an OutOfMemoryError in Java 11 from TimSort tests without increasing test heap size. (See apache#23419 (comment) ) This suggests there is a problem and that this resolves it. Closes apache#23424 from srowen/SPARK-24421.2. Authored-by: Sean Owen <[email protected]> Signed-off-by: Sean Owen <[email protected]>
What changes were proposed in this pull request?
This should make tests in core modules pass for Java 11.
How was this patch tested?
Existing tests, with modifications.