Skip to content

Conversation

@sitalkedia
Copy link

What changes were proposed in this pull request?

Force the sorter to Spill when number of elements in the pointer array reach a certain size. This is to workaround the issue of timSort failing on large buffer size.

How was this patch tested?

Tested by running a job which was failing without this change due to TimSort bug.

@sitalkedia
Copy link
Author

cc- @srowen, @JoshRosen

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yhuai
Copy link
Contributor

yhuai commented May 17, 2016

What is the root cause? Can you also add a regression test?

@sitalkedia
Copy link
Author

I am not 100% sure of the root cause, but I suspect this is happening when JVM is trying to allocate very large size buffer for pointer array. The issue might be because the JVM is not able to allocate large buffer in contiguous memory location on heap and since the unsafe operations assume contiguous memory location of the objects, any unsafe operation on large buffer results in memory corruption which manifests as TimSort issue.

Unfortunately, this issue is not reproducible consistently and I am not sure of the root cause. So I am not sure how can we have a regression test for it.

Also, please note that this change itself is a no-op unless you override the default value of numElementsForSpillThreshold, which is Long.MAX_VALUE.

@rxin
Copy link
Contributor

rxin commented May 24, 2016

Should we have a default value that's not Long.MAX_VALUE for this? What values do you guys typically set?

@davies
Copy link
Contributor

davies commented May 25, 2016

TimSort require a temporary buffer to store the shorter part, which could be half of the size of pointer array in worst case. This depends on the original order of rows, it's pretty hard to reproduce. I hit that twice and have a patch, but can't reproduce it anymore (without the patch).

The better solution should be only use 2/3 of the pointer array, left 1/3 as temporary buffer for TimSort. We had done similar things for RadixSort, that will use 1/2 for pointer array for insert, another 1/2 will be used as temporary buffer.

@sitalkedia
Copy link
Author

@rxin - We have seen this issue go away when we limit the pointer array size within 1G. Updated the PR to set that as a default value.

@davies - There is another issue with the allocation of temporary buffer you mentioned. That buffer is not being managed by the MemoryManager and we often see executor OOM because of that. I have opened a JIRA (SPARK-15391) for this. Would be great to have that issue fixed.

@rxin
Copy link
Contributor

rxin commented May 26, 2016

Do we still need this pull request given the other two patches?

#13336

#13318

@davies
Copy link
Contributor

davies commented May 26, 2016

@rxin After fixing those two, we still have some other limits (the number of elements should be less than 512 mm), especially for on-heap mode. There are:

  1. the largest memory block is 8G in on-heap mode, so the number of elements should be less than 512M.
  2. for both Radix sort and time sort, the underlying array could not 2G (or overflow), the number of records should be less than 1G
  3. For sorted iterator, the underlying array could not be larger than 2G.

So we still need to check the number of elements, then do spilling, or the job could fail in unexpected way.

@sitalkedia
Copy link
Author

@rxin - In addition to @davies's point. This feature also controls largest contiguous memory block allocated on heap which is very useful to avoid OOM when operating on large data set. We have been seeing this issue of executor OOM due to failure to allocate large amount of contiguous buffer in memory due to defragmentation.

@davies
Copy link
Contributor

davies commented Jun 6, 2016

@sitalkedia I think it will trigger Full GC and eventually spilling in that case (see https://github.com/apache/spark/blob/v1.6.1/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java) , could you provide more information on that (stacktrace or logging)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That bug is fixed, could you update the comment (or changing 1G to 8G)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

@sitalkedia sitalkedia force-pushed the fix_TimSort branch 2 times, most recently from e25e8af to c5f5a69 Compare June 16, 2016 17:43
@sitalkedia
Copy link
Author

@davies - Can you take a look?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should be 1024 * 1024 * 1024 (8G)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, fixed that, thanks

@SparkQA
Copy link

SparkQA commented Jun 27, 2016

Test build #3133 has finished for PR 13107 at commit c5f5a69.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 28, 2016

Test build #61372 has finished for PR 13107 at commit c5f5a69.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sitalkedia
Copy link
Author

Fixed test cases.

@SparkQA
Copy link

SparkQA commented Jun 28, 2016

Test build #61416 has finished for PR 13107 at commit ed657a3.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 29, 2016

Test build #61423 has finished for PR 13107 at commit cab4357.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…e pointer array reach a certain size. This is to workaround the issue of timSort failing on large buffer size.
@SparkQA
Copy link

SparkQA commented Jun 29, 2016

Test build #61484 has finished for PR 13107 at commit 11dc238.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sitalkedia
Copy link
Author

jenkins retest this please.

@SparkQA
Copy link

SparkQA commented Jun 30, 2016

Test build #61549 has finished for PR 13107 at commit 11dc238.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sitalkedia
Copy link
Author

@davies - Addressed all comments and fixed test cases.

@davies
Copy link
Contributor

davies commented Jun 30, 2016

LGTM,
Merging this into master and 2.0, thanks!

asfgit pushed a commit that referenced this pull request Jun 30, 2016
Force the sorter to Spill when number of elements in the pointer array reach a certain size. This is to workaround the issue of timSort failing on large buffer size.

Tested by running a job which was failing without this change due to TimSort bug.

Author: Sital Kedia <[email protected]>

Closes #13107 from sitalkedia/fix_TimSort.

(cherry picked from commit 07f46af)
Signed-off-by: Davies Liu <[email protected]>
@asfgit asfgit closed this in 07f46af Jun 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants