-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-7913][Core]Increase the maximum capacity of PartitionedPairBuffe, PartitionedSerializedPairBuffer and AppendOnlyMap #6456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…dSerializedPairBuffer
|
cc @JoshRosen |
|
Test build #33658 has finished for PR 6456 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have too many records, we'll end up failing when we try to do a put, right? Can we make this fail with a more explicit message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. How about now?
|
Test build #33692 has finished for PR 6456 at commit
|
Increase the maximum capacity of AppendOnlyMap from 0.7 * (2 ^ 29) to (2 ^ 29)
|
Test build #33729 has finished for PR 6456 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we prefer min(capacity * 2, MAXIMUM_CAPACITY) over this kind of Scala syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean capacity * 2).min(MAXIMUM_CAPACITY), right? Updated.
|
This LGTM. @JoshRosen do you want to take a look before I merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a note on how you arrived at this number?
|
LGTM2 |
|
Test build #33781 has finished for PR 6456 at commit
|
|
ping @andrewor14 |
|
Sorry slipped on this a little. I'm merging it into master now thanks Ryan! |
|
@zsxwing @andrewor14 this is very related to another recent change: These classes do similar things but with different approaches. For example, why is 2^29 the max size here instead of 2^30? They check args and fail on growth differently too. If Spark is maintaining these custom collection classes I think they need to at least be more consistent. |
|
@srowen I will make For
|
…uffe, PartitionedSerializedPairBuffer and AppendOnlyMap The previous growing strategy is alway doubling the capacity. This PR adjusts the growing strategy: doubling the capacity but if overflow, use the maximum capacity as the new capacity. It increases the maximum capacity of PartitionedPairBuffer from `2 ^ 29` to `2 ^ 30 - 1`, the maximum capacity of PartitionedSerializedPairBuffer from `2 ^ 28` to `(2 ^ 29) - 1`, and the maximum capacity of AppendOnlyMap from `0.7 * (2 ^ 29)` to `(2 ^ 29)`. Author: zsxwing <[email protected]> Closes apache#6456 from zsxwing/SPARK-7913 and squashes the following commits: abcb932 [zsxwing] Address comments e30b61b [zsxwing] Increase the maximum capacity of AppendOnlyMap 05b6420 [zsxwing] Update the exception message 64fe227 [zsxwing] Increase the maximum capacity of PartitionedPairBuffer and PartitionedSerializedPairBuffer
…f OpenHashSet and consistent exception message This is a follow up PR for #6456 to make AppendOnlyMap consistent with OpenHashSet. /cc srowen andrewor14 Author: zsxwing <[email protected]> Closes #6879 from zsxwing/append-only-map and squashes the following commits: 912c0ad [zsxwing] Fix the doc dd4385b [zsxwing] Make AppendOnlyMap use the same growth strategy of OpenHashSet and consistent exception message
The previous growing strategy is alway doubling the capacity.
This PR adjusts the growing strategy: doubling the capacity but if overflow, use the maximum capacity as the new capacity. It increases the maximum capacity of PartitionedPairBuffer from
2 ^ 29to2 ^ 30 - 1, the maximum capacity of PartitionedSerializedPairBuffer from2 ^ 28to(2 ^ 29) - 1, and the maximum capacity of AppendOnlyMap from0.7 * (2 ^ 29)to(2 ^ 29).