-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16071][SQL] Checks size limit when doubling the array size in BufferHolder #13829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #60984 has finished for PR 13829 at commit
|
| int bitsetWidthInBytes = UnsafeRow.calculateBitSetWidthInBytes(row.numFields()); | ||
| if (row.numFields() > (Integer.MAX_VALUE - initialSize) / 8) { | ||
| throw new UnsupportedOperationException( | ||
| "Cannot create BufferHolder from input UnsafeRow because it is too big."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...too big might be a bit to vague.... Can you use something like ...exceeds the maximum number of variables (268435455).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...BufferHolder from input UnsafeRow... -> ...BufferHolder for input UnsafeRow...
We only get the numFields from the unsafe row and allocate memory for it.
|
One small comment. LGTM otherwise. |
|
Test build #60993 has finished for PR 13829 at commit
|
| * Grows the buffer by at least neededSize and points the row to the buffer. | ||
| */ | ||
| public void grow(int neededSize) { | ||
| if (neededSize > Integer.MAX_VALUE / 2 - totalSize()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we move this check into the if branch below? then we can just check length * 2 <= Integer.MAX_VALUE and others can understand it very easily as there is a final byte[] tmp = new byte[length * 2]; next line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final int length = totalSize() + neededSize;, this can cause integer overflow, as well as length * 2
| public BufferHolder(UnsafeRow row, int initialSize) { | ||
| this.fixedSize = UnsafeRow.calculateBitSetWidthInBytes(row.numFields()) + 8 * row.numFields(); | ||
| int bitsetWidthInBytes = UnsafeRow.calculateBitSetWidthInBytes(row.numFields()); | ||
| if (row.numFields() > (Integer.MAX_VALUE - initialSize - bitsetWidthInBytes) / 8) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand this, we are trying to avoid overflow of this.fixedSize = UnsafeRow.calculateBitSetWidthInBytes(row.numFields()) + 8 * row.numFields(); right? Why we - initialSize here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.buffer = new byte[fixedSize + initialSize];
|
Test build #61428 has finished for PR 13829 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thought: Can we grow the buffer to Integer.MAX_VALUE if we can't double its size? Then we have another chance to continue the execution and finish it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the limit for neededSize + totalSize is Integer.MAX_VALUE / 2, I don't see there is a big difference to enlarge the limit to Integer.MAX_VALUE.
Integer.MAX_VALUE / 2 is about 1 GB, it is quite rare for a single row to exceed this limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good to try our best to finish user's job instead of failing it. And it's not a lot of work, should be worth it, just grow the buffer to Integer.MAX_VALUE when neededSize + totalSize is between Integer.MAX_VALUE / 2 + 1 and Integer.MAX_VALUE
|
Test build #61452 has finished for PR 13829 at commit
|
3a831e0 to
4265771
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Integer.MAX_VALUE /2 -> Integer.MAX_VALUE / 2, you missed a space...
|
LGTM except some style comments |
|
Test build #61514 has finished for PR 13829 at commit
|
|
Test build #61515 has finished for PR 13829 at commit
|
|
Test build #61517 has finished for PR 13829 at commit
|
…BufferHolder ## What changes were proposed in this pull request? This PR Checks the size limit when doubling the array size in BufferHolder to avoid integer overflow. ## How was this patch tested? Manual test. Author: Sean Zhong <[email protected]> Closes #13829 from clockfly/SPARK-16071_2. (cherry picked from commit 5320adc) Signed-off-by: Wenchen Fan <[email protected]>
|
thanks, merging to master/2.0! |
| var e = intercept[UnsupportedOperationException] { | ||
| new BufferHolder(new UnsafeRow(Int.MaxValue / 8)) | ||
| } | ||
| assert(e.getMessage.contains("too many fields")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this string be defined in BufferHolder and referenced here so that the test wouldn't break if the exception message is modified ?
What changes were proposed in this pull request?
This PR Checks the size limit when doubling the array size in BufferHolder to avoid integer overflow.
How was this patch tested?
Manual test.