-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-5201][CORE] deal with int overflow in the ParallelCollectionRDD.slice method #4002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
e66e60a
196f8a8
651c959
7d39b9e
b3f5577
e143d7a
96265a1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
… inclusive and the end of the range is (Int.MaxValue or Int.MinValue), we should use inclusive range instead of exclusive
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -127,18 +127,12 @@ private object ParallelCollectionRDD { | |
| }) | ||
| } | ||
| seq match { | ||
| case r: Range.Inclusive => { | ||
| val sign = if (r.step < 0) { | ||
| -1 | ||
| } else { | ||
| 1 | ||
| } | ||
| slice(new Range( | ||
| r.start, r.end + sign, r.step).asInstanceOf[Seq[T]], numSlices) | ||
| } | ||
| case r: Range => { | ||
| positions(r.length, numSlices).map({ | ||
| case (start, end) => | ||
| val sign = r.isInclusive && (r.end == Int.MaxValue || r.end == Int.MinValue) | ||
| positions(r.length, numSlices).zipWithIndex.map({ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not your code, but style should be |
||
| case ((start, end), index) if sign && index == numSlices - 1 => | ||
| new Range.Inclusive(r.start + start * r.step, r.end, r.step) | ||
| case ((start, end), _) => | ||
| new Range(r.start + start * r.step, r.start + end * r.step, r.step) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No need to match multiple cases here if we just ignore the index for the non-inclusive case. I think it's sufficient to do
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No doubt that your version is more straightforward than mine. When I wrote my code, I didn't consider splitting normal inclusive range using inclusive range. However the benefit of my implementation is that the splitting result will be same as in the master for normal inclusive ranges. I wonder there may be some spark code rely on the exclusive range output. And of course, I think we should update the corresponding document for this kind of change. I will covert the pattern matching to one case and update the implementation when we decided which one fits better.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think as long as we don't change the behavior it's preferrable to rewrite it in a readable manner. Here it's pretty clear to me that if the range is inclusive we should include the last element in the last slice, regardless of whether the range ends in a special value like |
||
| }).toSeq.asInstanceOf[Seq[Seq[T]]] | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing it but why is
IntMinValuea special case here? Also the({on the next line is redundant. Just one is needed.signisn't terribly descriptive here either.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to convert this inclusive range
to exclusive range will be
-1 + Int.MinValue will overflow.
As for sign, which name would you recommend? How about inclusiveRangeWithIntBoundary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, the range can go backwards. Yeah, something like
needsInclusiveRangeorexceptionalBoundaryor something.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. Will change the name.
As for redundant
({, there is a infix operatortoSeq, so I prefer the redundant one. And the previous code used({