-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-5201][CORE] deal with int overflow in the ParallelCollectionRDD.slice method #4002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
e66e60a
196f8a8
651c959
7d39b9e
b3f5577
e143d7a
96265a1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -130,13 +130,14 @@ private object ParallelCollectionRDD { | |
| case r: Range => { | ||
| // 1 to Int.MaxValue and (-2 to Int.MinValue by -1) can trigger exclusive range int overflow | ||
| val needsInclusiveRange = r.isInclusive && (r.end == Int.MaxValue || r.end == Int.MinValue) | ||
| positions(r.length, numSlices).zipWithIndex.map({ | ||
| // we need an inclusive range to avoid int overflow and setting the last range to be | ||
| // inclusive is suffice | ||
| case ((start, end), index) if needsInclusiveRange && index == numSlices - 1 => | ||
| positions(r.length, numSlices).zipWithIndex.map({ case ((start, end), index) => | ||
| // If the range needs to be inclusive, include the last element in the last slice | ||
| if (needsInclusiveRange && index == numSlices - 1) { | ||
| new Range.Inclusive(r.start + start * r.step, r.end, r.step) | ||
| case ((start, end), _) => | ||
| } | ||
| else { | ||
| new Range(r.start + start * r.step, r.start + end * r.step, r.step) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No need to match multiple cases here if we just ignore the index for the non-inclusive case. I think it's sufficient to do
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No doubt that your version is more straightforward than mine. When I wrote my code, I didn't consider splitting normal inclusive range using inclusive range. However the benefit of my implementation is that the splitting result will be same as in the master for normal inclusive ranges. I wonder there may be some spark code rely on the exclusive range output. And of course, I think we should update the corresponding document for this kind of change. I will covert the pattern matching to one case and update the implementation when we decided which one fits better.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think as long as we don't change the behavior it's preferrable to rewrite it in a readable manner. Here it's pretty clear to me that if the range is inclusive we should include the last element in the last slice, regardless of whether the range ends in a special value like |
||
| } | ||
| }).toSeq.asInstanceOf[Seq[Seq[T]]] | ||
| } | ||
| case nr: NumericRange[_] => { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this need not be specific to
Int.MaxValueandInt.MinValue. We can just do a special case for inclusive range in general and I think the code will be significantly more straightforward to read. I'll explain below.