Skip to content

Conversation

@srowen
Copy link
Member

@srowen srowen commented Dec 31, 2018

What changes were proposed in this pull request?

This should make tests in core modules pass for Java 11.

How was this patch tested?

Existing tests, with modifications.

"my-accumulator-2" -> acc2)
LongAccumulatorSource.register(mockContext, accs)
val captor = new ArgumentCaptor[AccumulatorSource]()
val captor = ArgumentCaptor.forClass(classOf[AccumulatorSource])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually just cleaning up a deprecation warning along the way; not strictly required for Java 11

// This prints something useful if the JSON strings don't match
println("=== EXPECTED ===\n" + expectedJson + "\n")
println("=== ACTUAL ===\n" + actualJson + "\n")
println(s"=== EXPECTED ===\n${pretty(expectedJson)}\n")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of Properties, based on Hashtable, is returning elements in a different order in Java 11. Just comparing the actual JSON content rather than its string representation shows it still produces semantically correct output, so I changed the test.

Map<String, String> env = new HashMap<>();
List<String> cmd = buildCommand(sparkSubmitArgs, env);
assertEquals("python", cmd.get(cmd.size() - 1));
assertTrue(Arrays.asList("python", "python2", "python3").contains(cmd.get(cmd.size() - 1)));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not strictly related, but something I caught while debugging. The Pyspark python interpreter might legitimately be set to a few other values.

pom.xml Outdated
</includes>
<reportsDirectory>${project.build.directory}/surefire-reports</reportsDirectory>
<argLine>-ea -Xmx3g -Xss4m -XX:ReservedCodeCacheSize=${CodeCacheSize}</argLine>
<argLine>-ea -Xmx6g -Xss4m -XX:ReservedCodeCacheSize=${CodeCacheSize}</argLine>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required now to make TimSort tests pass in Java 11, probably because of how direct buffer handling has changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to know what changed. Do you have an idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What tests failed?

Copy link
Member Author

@srowen srowen Dec 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TimSort test failed, where it allocates a very very large array.
I'm pretty sure it's because we can't get around the off-heap allocation limits in Java 9+ (see https://github.com/apache/spark/pull/22993/files). I think this would also pass if we manually set the off-heap limit very high, or allowed access to the Cleaner class on the command line. Upping the memory seemed like a simpler workaround but those should be fine too.

EDIT: I don't think this is right. See #23419 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the % changed is a quite a bit though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for increasing this. I also saw multiple TimSort OOM failures even in Apache Spark Jenkins environment(JDK8).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixcheung Let me try 4G or 5G. I think I just doubled it and it worked so I left it, but may not be necessary to make it that big.

</executions>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
<checkMultipleScalaVersions>true</checkMultipleScalaVersions>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not strictly related, but helpful when I was debugging another Java 11 issue.

@SparkQA
Copy link

SparkQA commented Dec 31, 2018

Test build #100584 has finished for PR 23419 at commit d4b0652.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 31, 2018

Test build #4490 has finished for PR 23419 at commit d4b0652.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

pom.xml Outdated
</includes>
<reportsDirectory>${project.build.directory}/surefire-reports</reportsDirectory>
<argLine>-ea -Xmx3g -Xss4m -XX:ReservedCodeCacheSize=${CodeCacheSize}</argLine>
<argLine>-ea -Xmx6g -Xss4m -XX:ReservedCodeCacheSize=${CodeCacheSize}</argLine>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the % changed is a quite a bit though

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. I also tested JDK11 manually (and +1 for the removal of UtilsSuite.Safe getSimpleName test case).

@srowen
Copy link
Member Author

srowen commented Jan 1, 2019

@dongjoon-hyun by the way, how did you test manually? I find we still can't build Spark with Java 11 (Scala itself and some build tools don't seem to be ready yet) but can run Spark tests with Java 11. (We will need to continue to build with Java 8 I think to ensure it runs on Java 8, so that's not something we have to 'fix')

I stopped at getting all tests up to but not including SQL working, as I was getting some possibly-flaky test failures there. Did you see it run SQL or catalyst tests successfully?

@SparkQA
Copy link

SparkQA commented Jan 1, 2019

Test build #100625 has finished for PR 23419 at commit 729a3c9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

@srowen . I did run sbt test on core module on Mac. I think that is the intended test procedure for this PR.

$ export JAVA_HOME=$APACHE/jdk-release/jdk-11.0.1.jdk/Contents/Home
$ java -version
openjdk version "11.0.1" 2018-10-16
OpenJDK Runtime Environment 18.9 (build 11.0.1+13)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.1+13, mixed mode)
$ build/sbt "project core" test

@srowen
Copy link
Member Author

srowen commented Jan 2, 2019

OK that's good news; maybe the tests really all do pass now :)
We'll know better when JDK 11 is available on Jenkins.

@felixcheung Looks like -Xmx4g works locally for the TimSort test, so I pushed that here (obviously it was going to pass here either way). For reference this was the failure it fixes:

SorterSuite:
- equivalent to Arrays.sort
- KVArraySorter
*** RUN ABORTED ***
  java.lang.OutOfMemoryError: Java heap space
  at org.apache.spark.util.collection.TestTimSort.createArray(TestTimSort.java:56)
  at org.apache.spark.util.collection.TestTimSort.getTimSortBugTestSet(TestTimSort.java:43)
  at org.apache.spark.util.collection.SorterSuite.$anonfun$new$8(SorterSuite.scala:70)
  at org.apache.spark.util.collection.SorterSuite$$Lambda$2204/0x0000000800a5f840.apply$mcV$sp(Unknown Source)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)
  at org.scalatest.Transformer.apply(Transformer.scala:20)

@dongjoon-hyun
Copy link
Member

@srowen . Could you update SparkBuild.scala together for consistency?
https://github.com/apache/spark/blob/master/project/SparkBuild.scala#L891-L893

@dongjoon-hyun
Copy link
Member

For SorterSuite and OpenHashSetSuite, I also met some OOM issues (especially when it run all tests) while they passes when it runs individually.

@srowen
Copy link
Member Author

srowen commented Jan 2, 2019

Following my comment at #23419 (comment) -- I don't think that's the explanation.

The test itself fails on allocating a big int array, which itself is not related to any Java 11 ByteBuffer changes. It may be that this test is just where a huge allocation takes place and where the higher memory usage causes a problem.

I am worried that being unable to set a Cleaner on DirectByteBuffer is potentially causing a memory leak, and need to revisit the change I mentioned above.

This change itself ought to be safe and fine even if we later change the memory limit back down, but I'm first going to try some other changes locally to confirm/deny.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 2, 2019

Sure. What I asked is to keep the consistency for JVM option in both mvn and sbt environments.
The OOM issues are not related to JDK11 (as you said) because I also observed them on JDK8, too.

@srowen
Copy link
Member Author

srowen commented Jan 2, 2019

Oh right I understand now. Well, I think I'm going to revert that change now anyway. When I made a change to avoid DirectByteBuffer without a Cleaner, it passed again with 3g on Java 11. I will merge this then when it passes again and follow up with my fix separately.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 2, 2019

That's great! Yep. +1 for handling them separately.
BTW, I found that SorterSuite flakiness issue was filed as https://issues.apache.org/jira/browse/SPARK-26306 . And, I added two recent Jenkins failure urls, too.

@srowen
Copy link
Member Author

srowen commented Jan 2, 2019

OK, so you are suggesting increasing the heap size there just because it currently fails sometimes? that's fine too, I can also make that change separately.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 again. I also tested this PR once more with the last commit.

At this time, I tested inside openjdk:11 Docker image to be sure. The following is the result.

$ docker run -it --rm openjdk:11 /bin/bash
...
[info] ScalaTest
[info] Run completed in 23 minutes, 3 seconds.
[info] Total number of tests run: 2222
[info] Suites: completed 219, aborted 0
[info] Tests: succeeded 2222, failed 0, canceled 12, ignored 7, pending 0
[info] All tests passed.
[info] Passed: Total 2471, Failed 0, Errors 0, Passed 2471, Ignored 7, Canceled 12
[success] Total time: 1528 s, completed Jan 2, 2019, 3:53:53 AM

spark@310d9c4dbae0:~/spark$ java -version
openjdk version "11.0.1" 2018-10-16
OpenJDK Runtime Environment (build 11.0.1+13-Debian-2bpo91)
OpenJDK 64-Bit Server VM (build 11.0.1+13-Debian-2bpo91, mixed mode, sharing)

spark@310d9c4dbae0:~/spark$ git log --oneline -n3
e4551bd63b Revert mem change to 3g
729a3c9a8a See if TimSort passes with 4g heap
d4b0652270 Additional Java 11 test fixes
spark@310d9c4dbae0:~/spark$

@SparkQA
Copy link

SparkQA commented Jan 2, 2019

Test build #100629 has finished for PR 23419 at commit e4551bd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Jan 2, 2019

Merged to master

@srowen srowen closed this in 4bdfda9 Jan 2, 2019
@dongjoon-hyun
Copy link
Member

Thank you for fixing this, @srowen !

@HyukjinKwon
Copy link
Member

+1 as well.

@srowen srowen deleted the Java11 branch January 3, 2019 16:29
srowen added a commit that referenced this pull request Jan 4, 2019
… if Cleaner can't be set

## What changes were proposed in this pull request?

In Java 9+ we can't use sun.misc.Cleaner by default anymore, and this was largely handled in #22993 However I think the change there left a significant problem.

If a DirectByteBuffer is allocated using the reflective hack in Platform, now, we by default can't set a Cleaner. But I believe this means the memory isn't freed promptly or possibly at all. If a Cleaner can't be set, I think we need to use normal APIs to allocate the direct ByteBuffer.

According to comments in the code, the downside is simply that the normal APIs will check and impose limits on how much off-heap memory can be allocated. Per the original review on #22993 this much seems fine, as either way in this case the user would have to add a JVM setting (increase max, or allow the reflective access).

## How was this patch tested?

Existing tests. This resolved an OutOfMemoryError in Java 11 from TimSort tests without increasing test heap size. (See #23419 (comment) ) This suggests there is a problem and that this resolves it.

Closes #23424 from srowen/SPARK-24421.2.

Authored-by: Sean Owen <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

This should make tests in core modules pass for Java 11.

## How was this patch tested?

Existing tests, with modifications.

Closes apache#23419 from srowen/Java11.

Authored-by: Sean Owen <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
… if Cleaner can't be set

## What changes were proposed in this pull request?

In Java 9+ we can't use sun.misc.Cleaner by default anymore, and this was largely handled in apache#22993 However I think the change there left a significant problem.

If a DirectByteBuffer is allocated using the reflective hack in Platform, now, we by default can't set a Cleaner. But I believe this means the memory isn't freed promptly or possibly at all. If a Cleaner can't be set, I think we need to use normal APIs to allocate the direct ByteBuffer.

According to comments in the code, the downside is simply that the normal APIs will check and impose limits on how much off-heap memory can be allocated. Per the original review on apache#22993 this much seems fine, as either way in this case the user would have to add a JVM setting (increase max, or allow the reflective access).

## How was this patch tested?

Existing tests. This resolved an OutOfMemoryError in Java 11 from TimSort tests without increasing test heap size. (See apache#23419 (comment) ) This suggests there is a problem and that this resolves it.

Closes apache#23424 from srowen/SPARK-24421.2.

Authored-by: Sean Owen <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants