[SPARK-23415][SQL][TEST] Make behavior of BufferHolderSparkSubmitSuite correct and stable #20636

kiszk · 2018-02-19T09:11:18Z

What changes were proposed in this pull request?

This PR addresses two issues in BufferHolderSparkSubmitSuite.

While BufferHolderSparkSubmitSuite tried to allocate a large object several times, it actually allocated an object once and reused the object.
BufferHolderSparkSubmitSuite may fail due to timeout

To assign a small object before allocating a large object each time solved issue 1 by avoiding reuse.
To increasing heap size from 4g to 7g solved issue 2. It can also avoid OOM after fixing issue 1.

How was this patch tested?

Updated existing BufferHolderSparkSubmitSuite

SparkQA · 2018-02-19T12:24:14Z

Test build #87541 has finished for PR 20636 at commit 39a715c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-02-19T14:26:51Z

cc @dongjoon-hyun

dongjoon-hyun · 2018-02-19T18:48:28Z

Thank you for pining me and working on this issue, @kiszk .

dongjoon-hyun · 2018-02-19T18:51:27Z

...t/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala

Ping, @liufengdb and @gatorsmile .

ping @liufengdb and @gatorsmile

kiszk · 2018-03-09T15:36:16Z

ping @liufengdb and @gatorsmile

kiszk · 2018-03-19T18:02:12Z

ping @gatorsmile and @liufengdb

kiszk · 2018-03-26T00:58:18Z

ping @gatorsmile and @hvanhovell

kiszk · 2018-04-10T17:29:43Z

ping @hvanhovell

kiszk · 2018-04-10T17:35:24Z

retest this please

hvanhovell · 2018-04-10T17:35:51Z

...t/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala

Do we still support this?

Good question. Several tests still seem to use local-cluster. Is it better to use local while it may require more memory?

ping @hvanhovell

I think we support this for testing purpose since, IIRC, that's going to make separate processes for workers.

SparkQA · 2018-04-10T17:45:27Z

Test build #89146 has finished for PR 20636 at commit 39a715c.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-04-17T10:30:01Z

retest this please

SparkQA · 2018-04-17T10:39:58Z

Test build #89447 has finished for PR 20636 at commit 39a715c.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-04-18T02:22:21Z

retest this please

SparkQA · 2018-04-18T04:57:19Z

Test build #89475 has finished for PR 20636 at commit 21b3708.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-04-18T05:34:07Z

retest this please

SparkQA · 2018-04-18T06:07:02Z

Test build #89477 has finished for PR 20636 at commit 21b3708.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-04-18T07:05:02Z

Test build #89485 has finished for PR 20636 at commit 21b3708.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-04-18T08:38:38Z

retest this please

SparkQA · 2018-04-18T12:20:30Z

Test build #89494 has finished for PR 20636 at commit 21b3708.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2018-04-19T17:02:53Z

@kiszk Why do we need to allocate a large array several times? I thought the objective of this test is to check if we can safely grow to int max? I don't really see the need for resetting the array before every grow call. Am I missing something here?

kiszk · 2018-04-19T19:37:26Z

Ah, I see. I thought the objective is to check if we can safely allocate buffer with each size. According to my understanding, you think to reuse buffer is an intention of this test.

While buffer.reset() has existed before I submitted this PR, we do not need buffer.reset() since we do not update cursor. I also remove setBuffer(holder, smallBuffer) to enable reuse of a buffer.

kiszk · 2018-04-19T19:48:23Z

@hvanhovell BTW, could you favor me?
I realized that grow(roundToWord(Integer.MAX_VALUE)) does nothing. We may see integer overflow here.

At holder.grow(roundToWord(Integer.MAX_VALUE)), the given argument Integer.MAX_VALUE is passed as roundToWord(0x7FFF_FFFF) -> ByteArrayMethods.roundNumberOfBytesToNearestWord(0x7FFF_FFFF)

In the method roundNumberOfBytesToNearestWord, the return value will be 0x7FFF_FFFF + (8 - (0x7FFF_FFFF & 7) = 0x8000_0000.

  public static int roundNumberOfBytesToNearestWord(int numBytes) {
    int remainder = numBytes & 0x07;  // This is equivalent to `numBytes % 8`
    if (remainder == 0) {
      return numBytes;
    } else {
      return numBytes + (8 - remainder);
    }
  }

Then, we execute grow(0x8000_0000). Since neededSize is negative, in grow(), we do not execute both of then block.

  void grow(int neededSize) {
    if (neededSize > ARRAY_MAX - totalSize()) {
      throw new UnsupportedOperationException(
        "Cannot grow BufferHolder by size " + neededSize + " because the size after growing " +
          "exceeds size limitation " + ARRAY_MAX);
    }
    final int length = totalSize() + neededSize;
    if (buffer.length < length) {
      // This will not happen frequently, because the buffer is re-used.
      int newLength = length < ARRAY_MAX / 2 ? length * 2 : ARRAY_MAX;
      final byte[] tmp = new byte[newLength];
      Platform.copyMemory(
        buffer,
        Platform.BYTE_ARRAY_OFFSET,
        tmp,
        Platform.BYTE_ARRAY_OFFSET,
        totalSize());
      buffer = tmp;
      row.pointTo(buffer, buffer.length);
    }
  }

As a result, when roundToWord(Integer.MAX_VALUE) is passed to grow(), no allocation happens.

Should we keep this? Or, should we add check if neededSize is zero or positive? What do you think?

SparkQA · 2018-04-20T21:48:20Z

Test build #89653 has finished for PR 20636 at commit f946631.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-04-21T02:11:09Z

@hvanhovell When I added the new check code to see whether the growth value is negative, we see the following error. Finally, Integer.MAX_VALUE is changed to negative value.

How do we handle this? Should we pass Integer.MAX_VALUE - n (where n is 64 or something) instead of Integer.MAX_VALUE? WDYT?

 Exception in thread "main" java.lang.UnsupportedOperationException: Cannot grow BufferHolder by size -2147483648 because the size is nevative
        at org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:65)
        at org.apache.spark.sql.catalyst.expressions.codegen.BufferHolderSparkSubmitSuite$.main(BufferHolderSparkSubmitSuite.scala:69)
        at org.apache.spark.sql.catalyst.expressions.codegen.BufferHolderSparkSubmitSuite.main(BufferHolderSparkSubmitSuite.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:838)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:166)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:193)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:85)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:913)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:924)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

kiszk · 2018-05-02T03:24:20Z

ping @hvanhovell

kiszk · 2018-05-09T06:02:53Z

ping @hvanhovell

kiszk · 2018-05-15T02:18:54Z

ping @hvanhovell

kiszk · 2018-06-18T14:18:44Z

cc @cloud-fan

HyukjinKwon · 2018-07-16T02:55:33Z

retest this please

kiszk · 2018-07-16T17:57:18Z

retest this please

SparkQA · 2018-07-16T20:01:57Z

Test build #93123 has finished for PR 20636 at commit a134091.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-07-17T01:13:24Z

retest this please

SparkQA · 2018-07-17T04:19:07Z

Test build #93147 has finished for PR 20636 at commit a134091.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-07-17T04:59:00Z

retest this please

SparkQA · 2018-07-17T07:05:01Z

Test build #93153 has finished for PR 20636 at commit a134091.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-07-17T07:24:43Z

retest this please

SparkQA · 2018-07-17T11:30:47Z

Test build #93158 has finished for PR 20636 at commit a134091.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-07-17T12:38:55Z

cc @cloud-fan

kiszk · 2018-07-28T03:00:37Z

cc @cloud-fan

srowen · 2018-07-31T18:33:17Z

...t/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala

+      holder.grow(ARRAY_MAX + 1 - holder.totalSize())
+      assert(false)
+    } catch {
+        case _: UnsupportedOperationException => assert(true)


Fix the indents here. assert(true) is a no-op, so just omit it. assert(false) is less useful than fail(...message...), above. Let an unexpected Throwable just fly out of the method to fail it rather than swallow it. But do you really just want to use intercept here?

srowen · 2018-07-31T18:33:43Z

...t/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala

-      "--master", "local-cluster[2,1,1024]",
-      "--driver-memory", "4g",
+      "--master", "local-cluster[1,1,7168]",
+      "--driver-memory", "7g",


Hm, just wondering if it's going to be problematic that the test now spawns a job that needs more than 7G of RAM? maybe I misunderstand.

SparkQA · 2018-08-08T15:23:45Z

Test build #94426 has finished for PR 20636 at commit 84940ee.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-08-08T16:20:52Z

Test build #94434 has finished for PR 20636 at commit 2dd1b82.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-08-08T20:32:24Z

Test build #94436 has finished for PR 20636 at commit 5e59aca.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-08-09T01:09:15Z

retest this please

SparkQA · 2018-08-09T01:27:16Z

Test build #94463 has finished for PR 20636 at commit 5e59aca.

This patch fails Java style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-08-09T04:01:06Z

Test build #94465 has finished for PR 20636 at commit 81d6477.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-08-09T05:04:23Z

retest this please

SparkQA · 2018-08-09T07:05:01Z

Test build #94472 has finished for PR 20636 at commit 81d6477.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-08-09T07:19:29Z

retest this please

cloud-fan · 2018-08-09T08:26:12Z

LGTM

SparkQA · 2018-08-09T11:31:33Z

Test build #94477 has finished for PR 20636 at commit 81d6477.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-08-09T12:29:25Z

thanks, merging to master!

dongjoon-hyun reviewed Feb 19, 2018

View reviewed changes

kiszk mentioned this pull request Mar 10, 2018

[SPARK-23598][SQL] Make methods in BufferedRowIterator public to avoid runtime error for a large query #20779

Closed

hvanhovell reviewed Apr 10, 2018

View reviewed changes

kiszk force-pushed the SPARK-23415 branch from 39a715c to 21b3708 Compare April 18, 2018 02:07

srowen reviewed Jul 31, 2018

View reviewed changes

address review comments

84940ee

fix test failure at BufferHolderSuite

2dd1b82

fix build error

5e59aca

fix Java style error

81d6477

asfgit closed this in 386fbd3 Aug 9, 2018

[SPARK-23415][SQL][TEST] Make behavior of BufferHolderSparkSubmitSuite correct and stable #20636

[SPARK-23415][SQL][TEST] Make behavior of BufferHolderSparkSubmitSuite correct and stable #20636

Uh oh!

Conversation

kiszk commented Feb 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Feb 19, 2018

Uh oh!

kiszk commented Feb 19, 2018

Uh oh!

dongjoon-hyun commented Feb 19, 2018

Uh oh!

dongjoon-hyun Feb 19, 2018

Choose a reason for hiding this comment

Uh oh!

kiszk Feb 26, 2018

Choose a reason for hiding this comment

Uh oh!

kiszk commented Mar 9, 2018

Uh oh!

kiszk commented Mar 19, 2018

Uh oh!

kiszk commented Mar 26, 2018

Uh oh!

kiszk commented Apr 10, 2018

Uh oh!

kiszk commented Apr 10, 2018

Uh oh!

hvanhovell Apr 10, 2018

Choose a reason for hiding this comment

Uh oh!

kiszk Apr 10, 2018

Choose a reason for hiding this comment

Uh oh!

kiszk Apr 17, 2018

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Aug 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 10, 2018

Uh oh!

kiszk commented Apr 17, 2018

Uh oh!

SparkQA commented Apr 17, 2018

Uh oh!

kiszk commented Apr 18, 2018

Uh oh!

SparkQA commented Apr 18, 2018

Uh oh!

kiszk commented Apr 18, 2018

Uh oh!

SparkQA commented Apr 18, 2018

Uh oh!

SparkQA commented Apr 18, 2018

Uh oh!

kiszk commented Apr 18, 2018

Uh oh!

SparkQA commented Apr 18, 2018

Uh oh!

hvanhovell commented Apr 19, 2018

Uh oh!

kiszk commented Apr 19, 2018

Uh oh!

kiszk commented Apr 19, 2018

Uh oh!

SparkQA commented Apr 20, 2018

Uh oh!

kiszk commented Apr 21, 2018

Uh oh!

kiszk commented May 2, 2018

Uh oh!

kiszk commented May 9, 2018

Uh oh!

kiszk commented May 15, 2018

Uh oh!

kiszk commented Feb 19, 2018 •

edited

Loading

HyukjinKwon Aug 1, 2018 •

edited

Loading