Skip to content

Conversation

@clockfly
Copy link
Contributor

What changes were proposed in this pull request?

This PR Checks the size limit when doubling the array size in BufferHolder to avoid integer overflow.

How was this patch tested?

Manual test.

@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #60984 has finished for PR 13829 at commit 24dd723.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

int bitsetWidthInBytes = UnsafeRow.calculateBitSetWidthInBytes(row.numFields());
if (row.numFields() > (Integer.MAX_VALUE - initialSize) / 8) {
throw new UnsupportedOperationException(
"Cannot create BufferHolder from input UnsafeRow because it is too big.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...too big might be a bit to vague.... Can you use something like ...exceeds the maximum number of variables (268435455).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...BufferHolder from input UnsafeRow... -> ...BufferHolder for input UnsafeRow...

We only get the numFields from the unsafe row and allocate memory for it.

@hvanhovell
Copy link
Contributor

One small comment. LGTM otherwise.

@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #60993 has finished for PR 13829 at commit 6473e6d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class BufferHolderSuite extends SparkFunSuite

* Grows the buffer by at least neededSize and points the row to the buffer.
*/
public void grow(int neededSize) {
if (neededSize > Integer.MAX_VALUE / 2 - totalSize()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we move this check into the if branch below? then we can just check length * 2 <= Integer.MAX_VALUE and others can understand it very easily as there is a final byte[] tmp = new byte[length * 2]; next line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final int length = totalSize() + neededSize;, this can cause integer overflow, as well as length * 2

public BufferHolder(UnsafeRow row, int initialSize) {
this.fixedSize = UnsafeRow.calculateBitSetWidthInBytes(row.numFields()) + 8 * row.numFields();
int bitsetWidthInBytes = UnsafeRow.calculateBitSetWidthInBytes(row.numFields());
if (row.numFields() > (Integer.MAX_VALUE - initialSize - bitsetWidthInBytes) / 8) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand this, we are trying to avoid overflow of this.fixedSize = UnsafeRow.calculateBitSetWidthInBytes(row.numFields()) + 8 * row.numFields(); right? Why we - initialSize here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SparkQA
Copy link

SparkQA commented Jun 29, 2016

Test build #61428 has finished for PR 13829 at commit b831e85.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@cloud-fan cloud-fan Jun 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thought: Can we grow the buffer to Integer.MAX_VALUE if we can't double its size? Then we have another chance to continue the execution and finish it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan

Currently the limit for neededSize + totalSize is Integer.MAX_VALUE / 2, I don't see there is a big difference to enlarge the limit to Integer.MAX_VALUE.

Integer.MAX_VALUE / 2 is about 1 GB, it is quite rare for a single row to exceed this limit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to try our best to finish user's job instead of failing it. And it's not a lot of work, should be worth it, just grow the buffer to Integer.MAX_VALUE when neededSize + totalSize is between Integer.MAX_VALUE / 2 + 1 and Integer.MAX_VALUE

@SparkQA
Copy link

SparkQA commented Jun 29, 2016

Test build #61452 has finished for PR 13829 at commit 336986d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@clockfly clockfly force-pushed the SPARK-16071_2 branch 2 times, most recently from 3a831e0 to 4265771 Compare June 30, 2016 01:37
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Integer.MAX_VALUE /2 -> Integer.MAX_VALUE / 2, you missed a space...

@cloud-fan
Copy link
Contributor

LGTM except some style comments

@SparkQA
Copy link

SparkQA commented Jun 30, 2016

Test build #61514 has finished for PR 13829 at commit 3a831e0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 30, 2016

Test build #61515 has finished for PR 13829 at commit 4265771.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 30, 2016

Test build #61517 has finished for PR 13829 at commit 943f7de.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Jun 30, 2016
…BufferHolder

## What changes were proposed in this pull request?

This PR Checks the size limit when doubling the array size in BufferHolder to avoid integer overflow.

## How was this patch tested?

Manual test.

Author: Sean Zhong <[email protected]>

Closes #13829 from clockfly/SPARK-16071_2.

(cherry picked from commit 5320adc)
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan
Copy link
Contributor

thanks, merging to master/2.0!

@asfgit asfgit closed this in 5320adc Jun 30, 2016
var e = intercept[UnsupportedOperationException] {
new BufferHolder(new UnsafeRow(Int.MaxValue / 8))
}
assert(e.getMessage.contains("too many fields"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this string be defined in BufferHolder and referenced here so that the test wouldn't break if the exception message is modified ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants