Skip to content

Conversation

@kiszk
Copy link
Member

@kiszk kiszk commented Apr 13, 2018

What changes were proposed in this pull request?

This PR detects length overflow if total elements in inputs are not acceptable.

For example, when the three inputs has 0x7FFF_FF00, 0x7FFF_FF00, and 0xE00, we should detect length overflow since we cannot allocate such a large structure on byte[].
On the other hand, the current algorithm can allocate the result structure with 0x1000-byte length due to integer sum overflow.

How was this patch tested?

Existing UTs.
If we would create UTs, we need large heap (6-8GB). It may make test environment unstable.
If it is necessary to create UTs, I will create them.

@kiszk
Copy link
Member Author

kiszk commented Apr 13, 2018

cc @ueshi @hvanhovell

@SparkQA
Copy link

SparkQA commented Apr 13, 2018

Test build #89338 has finished for PR 21064 at commit 6ba94f3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Apr 17, 2018

ping @hvanhovell

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - merging to master.

@jiangxb1987
Copy link
Contributor

@hvanhovell seems this accidentally not get merged?

@kiszk
Copy link
Member Author

kiszk commented Apr 30, 2018

ping @hvanhovell

1 similar comment
@kiszk
Copy link
Member Author

kiszk commented May 2, 2018

ping @hvanhovell

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - merging to master. Thanks!

@hvanhovell
Copy link
Contributor

@kiszk sorry about the delay.

@asfgit asfgit closed this in 9215ee7 May 2, 2018
@cloud-fan
Copy link
Contributor

shall we backport it to 2.3?

zzcclp added a commit to zzcclp/spark that referenced this pull request Dec 6, 2018
…eArray.concat() apache#21064

This PR detects length overflow if total elements in inputs are not acceptable.

For example, when the three inputs has 0x7FFF_FF00, 0x7FFF_FF00, and 0xE00, we should detect length overflow since we cannot allocate such a large structure on byte[].
On the other hand, the current algorithm can allocate the result structure with 0x1000-byte length due to integer sum overflow.
zzcclp added a commit to zzcclp/spark that referenced this pull request Sep 20, 2019
…)/ByteArray.concat() apache#21064

This PR detects length overflow if total elements in inputs are not acceptable.

For example, when the three inputs has 0x7FFF_FF00, 0x7FFF_FF00, and 0xE00, we should detect length overflow since we cannot allocate such a large structure on byte[].
On the other hand, the current algorithm can allocate the result structure with 0x1000-byte length due to integer sum overflow.
MaxGekk pushed a commit that referenced this pull request Apr 12, 2021
…erflow

### What changes were proposed in this pull request?

Add check if the byte length over `int`.

### Why are the changes needed?

We encounter a very extreme case with expression `concat_ws`, and the error msg is
```
Caused by: java.lang.NegativeArraySizeException
	at org.apache.spark.unsafe.types.UTF8String.concatWs
```
Seems the `UTF8String.concat` has already done the length check at [#21064](#21064), so it's better to add in `concatWs`.

### Does this PR introduce _any_ user-facing change?

Yes

### How was this patch tested?

It's too heavy to add the test.

Closes #32106 from ulysses-you/SPARK-35005.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
a0x8o added a commit to a0x8o/spark that referenced this pull request Apr 12, 2021
…erflow

### What changes were proposed in this pull request?

Add check if the byte length over `int`.

### Why are the changes needed?

We encounter a very extreme case with expression `concat_ws`, and the error msg is
```
Caused by: java.lang.NegativeArraySizeException
	at org.apache.spark.unsafe.types.UTF8String.concatWs
```
Seems the `UTF8String.concat` has already done the length check at [#21064](apache/spark#21064), so it's better to add in `concatWs`.

### Does this PR introduce _any_ user-facing change?

Yes

### How was this patch tested?

It's too heavy to add the test.

Closes #32106 from ulysses-you/SPARK-35005.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants