[SPARK-7913][Core]Increase the maximum capacity of PartitionedPairBuffe, PartitionedSerializedPairBuffer and AppendOnlyMap #6456

zsxwing · 2015-05-28T14:39:51Z

The previous growing strategy is alway doubling the capacity.

This PR adjusts the growing strategy: doubling the capacity but if overflow, use the maximum capacity as the new capacity. It increases the maximum capacity of PartitionedPairBuffer from 2 ^ 29 to 2 ^ 30 - 1, the maximum capacity of PartitionedSerializedPairBuffer from 2 ^ 28 to (2 ^ 29) - 1, and the maximum capacity of AppendOnlyMap from 0.7 * (2 ^ 29) to (2 ^ 29).

…dSerializedPairBuffer

zsxwing · 2015-05-28T15:28:18Z

cc @JoshRosen

SparkQA · 2015-05-28T16:34:43Z

Test build #33658 has finished for PR 6456 at commit 64fe227.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sryza · 2015-05-28T18:13:35Z

core/src/main/scala/org/apache/spark/util/collection/PartitionedSerializedPairBuffer.scala

If we have too many records, we'll end up failing when we try to do a put, right? Can we make this fail with a more explicit message?

Updated. How about now?

SparkQA · 2015-05-29T00:47:12Z

Test build #33692 has finished for PR 6456 at commit 05b6420.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Increase the maximum capacity of AppendOnlyMap from 0.7 * (2 ^ 29) to (2 ^ 29)

SparkQA · 2015-05-29T11:04:34Z

Test build #33729 has finished for PR 6456 at commit e30b61b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sryza · 2015-05-29T16:51:50Z

core/src/main/scala/org/apache/spark/util/collection/AppendOnlyMap.scala

I think we prefer min(capacity * 2, MAXIMUM_CAPACITY) over this kind of Scala syntax.

I think you mean capacity * 2).min(MAXIMUM_CAPACITY), right? Updated.

sryza · 2015-05-29T16:59:29Z

This LGTM. @JoshRosen do you want to take a look before I merge?

andrewor14 · 2015-05-29T18:26:10Z

core/src/main/scala/org/apache/spark/util/collection/AppendOnlyMap.scala

can you add a note on how you arrived at this number?

andrewor14 · 2015-05-29T18:35:42Z

LGTM2

SparkQA · 2015-05-30T01:29:26Z

Test build #33781 has finished for PR 6456 at commit abcb932.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2015-06-11T04:31:30Z

ping @andrewor14

andrewor14 · 2015-06-17T21:02:28Z

Sorry slipped on this a little. I'm merging it into master now thanks Ryan!

srowen · 2015-06-18T07:54:29Z

@zsxwing @andrewor14 this is very related to another recent change:
c13da20

These classes do similar things but with different approaches. For example, why is 2^29 the max size here instead of 2^30? They check args and fail on growth differently too. If Spark is maintaining these custom collection classes I think they need to at least be more consistent.

zsxwing · 2015-06-18T08:18:56Z

@srowen I will make AppendOnlyMap consistent with OpenHashMap later.

For 2^29, because unlike OpenHashMap, AppendOnlyMap use a single array to store both keys and values: private var data = new Array[AnyRef](2 * capacity). As the max length of an array is (2^31 -1), the max of capacity is (2^31 -1) / 2 = 2 ^ 29.

PartitionedPairBuffer and PartitionedSerializedPairBuffer are something like Array. The growth strategy is growing the buffer when reaching the capacity.

…uffe, PartitionedSerializedPairBuffer and AppendOnlyMap The previous growing strategy is alway doubling the capacity. This PR adjusts the growing strategy: doubling the capacity but if overflow, use the maximum capacity as the new capacity. It increases the maximum capacity of PartitionedPairBuffer from `2 ^ 29` to `2 ^ 30 - 1`, the maximum capacity of PartitionedSerializedPairBuffer from `2 ^ 28` to `(2 ^ 29) - 1`, and the maximum capacity of AppendOnlyMap from `0.7 * (2 ^ 29)` to `(2 ^ 29)`. Author: zsxwing <[email protected]> Closes apache#6456 from zsxwing/SPARK-7913 and squashes the following commits: abcb932 [zsxwing] Address comments e30b61b [zsxwing] Increase the maximum capacity of AppendOnlyMap 05b6420 [zsxwing] Update the exception message 64fe227 [zsxwing] Increase the maximum capacity of PartitionedPairBuffer and PartitionedSerializedPairBuffer

…f OpenHashSet and consistent exception message This is a follow up PR for #6456 to make AppendOnlyMap consistent with OpenHashSet. /cc srowen andrewor14 Author: zsxwing <[email protected]> Closes #6879 from zsxwing/append-only-map and squashes the following commits: 912c0ad [zsxwing] Fix the doc dd4385b [zsxwing] Make AppendOnlyMap use the same growth strategy of OpenHashSet and consistent exception message

Increase the maximum capacity of PartitionedPairBuffer and Partitione…

64fe227

…dSerializedPairBuffer

sryza reviewed May 28, 2015
View reviewed changes

Update the exception message

05b6420

Increase the maximum capacity of AppendOnlyMap

e30b61b

Increase the maximum capacity of AppendOnlyMap from 0.7 * (2 ^ 29) to (2 ^ 29)

zsxwing changed the title ~~[SPARK-7913][Core]Increase the maximum capacity of PartitionedPairBuffer and PartitionedSerializedPairBuffer~~ [SPARK-7913][Core]Increase the maximum capacity of PartitionedPairBuffe, PartitionedSerializedPairBuffer and AppendOnlyMap May 29, 2015

sryza reviewed May 29, 2015
View reviewed changes

andrewor14 reviewed May 29, 2015
View reviewed changes

Address comments

abcb932

asfgit closed this in a411a40 Jun 17, 2015

zsxwing deleted the SPARK-7913 branch June 18, 2015 02:29

zsxwing mentioned this pull request Jun 18, 2015

[SPARK-7913][Core]Make AppendOnlyMap use the same growth strategy of OpenHashSet and consistent exception message #6879

Closed

[SPARK-7913][Core]Increase the maximum capacity of PartitionedPairBuffe, PartitionedSerializedPairBuffer and AppendOnlyMap #6456

[SPARK-7913][Core]Increase the maximum capacity of PartitionedPairBuffe, PartitionedSerializedPairBuffer and AppendOnlyMap #6456

Uh oh!

Conversation

zsxwing commented May 28, 2015

Uh oh!

zsxwing commented May 28, 2015

Uh oh!

SparkQA commented May 28, 2015

Uh oh!

sryza May 28, 2015

Choose a reason for hiding this comment

Uh oh!

zsxwing May 28, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 29, 2015

Uh oh!

SparkQA commented May 29, 2015

Uh oh!

sryza May 29, 2015

Choose a reason for hiding this comment

Uh oh!

zsxwing May 29, 2015

Choose a reason for hiding this comment

Uh oh!

sryza commented May 29, 2015

Uh oh!

andrewor14 May 29, 2015

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented May 29, 2015

Uh oh!

SparkQA commented May 30, 2015

Uh oh!

zsxwing commented Jun 11, 2015

Uh oh!

andrewor14 commented Jun 17, 2015

Uh oh!

srowen commented Jun 18, 2015

Uh oh!

zsxwing commented Jun 18, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants