[SPARK-20426] Lazy initialization of FileSegmentManagedBuffer for shuffle service. #17744

jinxing64 · 2017-04-24T12:53:48Z

What changes were proposed in this pull request?

When application contains large amount of shuffle blocks. NodeManager requires lots of memory to keep metadata(FileSegmentManagedBuffer) in StreamManager. When the number of shuffle blocks is big enough. NodeManager can run OOM. This pr proposes to do lazy initialization of FileSegmentManagedBuffer in shuffle service.

How was this patch tested?

Manually test.

jinxing64 · 2017-04-24T12:54:54Z

Spark jobs are running on yarn cluster in my warehouse. We enabled the external shuffle service(--conf spark.shuffle.service.enabled=true). Recently NodeManager runs OOM now and then. Dumping heap memory, we find that OneFroOneStreamManager's footprint is huge. NodeManager is configured with 5G heap memory. While OneForOneManager costs 2.5G and there are 5503233 FileSegmentManagedBuffer objects.

SparkQA · 2017-04-24T15:01:45Z

Test build #76105 has finished for PR 17744 at commit 5fb91bf.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

…ffle service.

SparkQA · 2017-04-25T11:25:27Z

Test build #76131 has finished for PR 17744 at commit e38f0e8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-25T11:39:02Z

Test build #76132 has finished for PR 17744 at commit a0a6594.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tgravescs · 2017-04-25T22:03:06Z

...work-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java

+          @Override
+          public ManagedBuffer next() {
+            final ManagedBuffer block = blockManager.getBlockData(msg.appId, msg.execId,
+              msg.blockIds[index]);


I need to look more to verify, but I don't think you can hang onto the msg here without duplicating it. TransportRequestHandler.processRpcRequest is going to release the request so I think it could get reused. @rxin can perhaps verify.

@tgravescs
Thanks a lot for taking time looking into this :)
In my understanding, the OpenBlocks will be kept in heap after initialization(https://github.com/apache/spark/blob/master/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java#L84).
Yes, TransportRequestHandler.processRpcRequest will release the ByteBuf, but the OpenBlocks will not be released.

You are right I missed that when I did a quick skim originally of this.

tgravescs · 2017-04-27T19:05:10Z

We see the same issue on some of our clusters. I was planning on doing 2 things. Something like this to reduce that memory usage and then on the other side you could change the shuffle fetcher to limit the # of blocks it fetches in one try. The second is a separate jira though and could have the affect of slowing down the shuffle and could also not be as effective if its a ton of different reducers fetching but still a good option to have.

I think this approach is good for now. I ran a bunch of manual tests on the cluster and memory has greatly improved.

+1. Thanks @jinxing64

…ffle service. ## What changes were proposed in this pull request? When application contains large amount of shuffle blocks. NodeManager requires lots of memory to keep metadata(`FileSegmentManagedBuffer`) in `StreamManager`. When the number of shuffle blocks is big enough. NodeManager can run OOM. This pr proposes to do lazy initialization of `FileSegmentManagedBuffer` in shuffle service. ## How was this patch tested? Manually test. Author: jinxing <[email protected]> Closes #17744 from jinxing64/SPARK-20426. (cherry picked from commit 85c6ce6) Signed-off-by: Tom Graves <[email protected]>

jinxing64 · 2017-05-03T10:22:46Z

@tgravescs
Thanks a lot for merging.
I proposed to resolve this by "Lazy initialization of FileSegmentManagedBuffer" and simplify the change. But after checking the code, could we remove OpenBlocks. In my understanding, OpenBlocks is just a handshake before fetching remote blocks and could be removed conceptually(did I miss something?). This is just a brainstorming and will be a bigger change.

tgravescs · 2017-05-03T13:02:49Z

Yes conceptually it could be removed but as you say is a bigger change. Are you still seeing memory issues after this change?

jinxing64 · 2017-05-03T13:26:24Z

Thanks again for help review this pr. Currently I'm not seeing memory issue on my nodemanagers. I'd report to community if there's new finding :)

jinxing64 force-pushed the SPARK-20426 branch from 5fb91bf to e38f0e8 Compare April 25, 2017 08:32

[SPARK-20426] Lazy initialization of FileSegmentManagedBuffer for shu…

a0a6594

…ffle service.

jinxing64 force-pushed the SPARK-20426 branch from e38f0e8 to a0a6594 Compare April 25, 2017 08:47

tgravescs reviewed Apr 25, 2017

View reviewed changes

asfgit closed this in 85c6ce6 Apr 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-20426] Lazy initialization of FileSegmentManagedBuffer for shuffle service. #17744

[SPARK-20426] Lazy initialization of FileSegmentManagedBuffer for shuffle service. #17744

Uh oh!

jinxing64 commented Apr 24, 2017

Uh oh!

jinxing64 commented Apr 24, 2017

Uh oh!

SparkQA commented Apr 24, 2017

Uh oh!

SparkQA commented Apr 25, 2017

Uh oh!

SparkQA commented Apr 25, 2017

Uh oh!

tgravescs Apr 25, 2017

Uh oh!

jinxing64 Apr 26, 2017

Uh oh!

tgravescs Apr 27, 2017

Uh oh!

tgravescs commented Apr 27, 2017

Uh oh!

jinxing64 commented May 3, 2017

Uh oh!

tgravescs commented May 3, 2017

Uh oh!

jinxing64 commented May 3, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-20426] Lazy initialization of FileSegmentManagedBuffer for shuffle service. #17744

[SPARK-20426] Lazy initialization of FileSegmentManagedBuffer for shuffle service. #17744

Uh oh!

Conversation

jinxing64 commented Apr 24, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

jinxing64 commented Apr 24, 2017

Uh oh!

SparkQA commented Apr 24, 2017

Uh oh!

SparkQA commented Apr 25, 2017

Uh oh!

SparkQA commented Apr 25, 2017

Uh oh!

tgravescs Apr 25, 2017

Choose a reason for hiding this comment

Uh oh!

jinxing64 Apr 26, 2017

Choose a reason for hiding this comment

Uh oh!

tgravescs Apr 27, 2017

Choose a reason for hiding this comment

Uh oh!

tgravescs commented Apr 27, 2017

Uh oh!

jinxing64 commented May 3, 2017

Uh oh!

tgravescs commented May 3, 2017

Uh oh!

jinxing64 commented May 3, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants