-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-13850] Force the sorter to Spill when number of elements in th… #13107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc- @srowen, @JoshRosen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(It seems imports might have to be reordered, https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Imports)
|
What is the root cause? Can you also add a regression test? |
|
I am not 100% sure of the root cause, but I suspect this is happening when JVM is trying to allocate very large size buffer for pointer array. The issue might be because the JVM is not able to allocate large buffer in contiguous memory location on heap and since the unsafe operations assume contiguous memory location of the objects, any unsafe operation on large buffer results in memory corruption which manifests as TimSort issue. Unfortunately, this issue is not reproducible consistently and I am not sure of the root cause. So I am not sure how can we have a regression test for it. Also, please note that this change itself is a no-op unless you override the default value of |
|
Should we have a default value that's not Long.MAX_VALUE for this? What values do you guys typically set? |
|
TimSort require a temporary buffer to store the shorter part, which could be half of the size of pointer array in worst case. This depends on the original order of rows, it's pretty hard to reproduce. I hit that twice and have a patch, but can't reproduce it anymore (without the patch). The better solution should be only use 2/3 of the pointer array, left 1/3 as temporary buffer for TimSort. We had done similar things for RadixSort, that will use 1/2 for pointer array for insert, another 1/2 will be used as temporary buffer. |
|
@rxin - We have seen this issue go away when we limit the pointer array size within 1G. Updated the PR to set that as a default value. @davies - There is another issue with the allocation of temporary buffer you mentioned. That buffer is not being managed by the MemoryManager and we often see executor OOM because of that. I have opened a JIRA (SPARK-15391) for this. Would be great to have that issue fixed. |
|
@rxin After fixing those two, we still have some other limits (the number of elements should be less than 512 mm), especially for on-heap mode. There are:
So we still need to check the number of elements, then do spilling, or the job could fail in unexpected way. |
|
@rxin - In addition to @davies's point. This feature also controls largest contiguous memory block allocated on heap which is very useful to avoid OOM when operating on large data set. We have been seeing this issue of executor OOM due to failure to allocate large amount of contiguous buffer in memory due to defragmentation. |
|
@sitalkedia I think it will trigger Full GC and eventually spilling in that case (see https://github.com/apache/spark/blob/v1.6.1/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java) , could you provide more information on that (stacktrace or logging)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That bug is fixed, could you update the comment (or changing 1G to 8G)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
e25e8af to
c5f5a69
Compare
|
@davies - Can you take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one should be 1024 * 1024 * 1024 (8G)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, fixed that, thanks
|
Test build #3133 has finished for PR 13107 at commit
|
|
Test build #61372 has finished for PR 13107 at commit
|
|
Fixed test cases. |
|
Test build #61416 has finished for PR 13107 at commit
|
|
Test build #61423 has finished for PR 13107 at commit
|
…e pointer array reach a certain size. This is to workaround the issue of timSort failing on large buffer size.
|
Test build #61484 has finished for PR 13107 at commit
|
|
jenkins retest this please. |
|
Test build #61549 has finished for PR 13107 at commit
|
|
@davies - Addressed all comments and fixed test cases. |
|
LGTM, |
Force the sorter to Spill when number of elements in the pointer array reach a certain size. This is to workaround the issue of timSort failing on large buffer size. Tested by running a job which was failing without this change due to TimSort bug. Author: Sital Kedia <[email protected]> Closes #13107 from sitalkedia/fix_TimSort. (cherry picked from commit 07f46af) Signed-off-by: Davies Liu <[email protected]>
What changes were proposed in this pull request?
Force the sorter to Spill when number of elements in the pointer array reach a certain size. This is to workaround the issue of timSort failing on large buffer size.
How was this patch tested?
Tested by running a job which was failing without this change due to TimSort bug.