[SPARK-23976][Core] Detect length overflow in UTF8String.concat()/ByteArray.concat() #21064

kiszk · 2018-04-13T12:06:32Z

What changes were proposed in this pull request?

This PR detects length overflow if total elements in inputs are not acceptable.

For example, when the three inputs has 0x7FFF_FF00, 0x7FFF_FF00, and 0xE00, we should detect length overflow since we cannot allocate such a large structure on byte[].
On the other hand, the current algorithm can allocate the result structure with 0x1000-byte length due to integer sum overflow.

How was this patch tested?

Existing UTs.
If we would create UTs, we need large heap (6-8GB). It may make test environment unstable.
If it is necessary to create UTs, I will create them.

kiszk · 2018-04-13T12:06:51Z

cc @ueshi @hvanhovell

SparkQA · 2018-04-13T15:57:14Z

Test build #89338 has finished for PR 21064 at commit 6ba94f3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-04-17T10:28:31Z

ping @hvanhovell

hvanhovell

LGTM - merging to master.

jiangxb1987 · 2018-04-25T03:47:35Z

@hvanhovell seems this accidentally not get merged?

kiszk · 2018-04-30T04:10:09Z

ping @hvanhovell

kiszk · 2018-05-02T02:03:27Z

ping @hvanhovell

hvanhovell

LGTM - merging to master. Thanks!

hvanhovell · 2018-05-02T08:42:25Z

@kiszk sorry about the delay.

cloud-fan · 2018-05-09T05:28:00Z

shall we backport it to 2.3?

…eArray.concat() apache#21064 This PR detects length overflow if total elements in inputs are not acceptable. For example, when the three inputs has 0x7FFF_FF00, 0x7FFF_FF00, and 0xE00, we should detect length overflow since we cannot allocate such a large structure on byte[]. On the other hand, the current algorithm can allocate the result structure with 0x1000-byte length due to integer sum overflow.

…)/ByteArray.concat() apache#21064 This PR detects length overflow if total elements in inputs are not acceptable. For example, when the three inputs has 0x7FFF_FF00, 0x7FFF_FF00, and 0xE00, we should detect length overflow since we cannot allocate such a large structure on byte[]. On the other hand, the current algorithm can allocate the result structure with 0x1000-byte length due to integer sum overflow.

…erflow ### What changes were proposed in this pull request? Add check if the byte length over `int`. ### Why are the changes needed? We encounter a very extreme case with expression `concat_ws`, and the error msg is ``` Caused by: java.lang.NegativeArraySizeException at org.apache.spark.unsafe.types.UTF8String.concatWs ``` Seems the `UTF8String.concat` has already done the length check at [#21064](#21064), so it's better to add in `concatWs`. ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? It's too heavy to add the test. Closes #32106 from ulysses-you/SPARK-35005. Authored-by: ulysses-you <[email protected]> Signed-off-by: Max Gekk <[email protected]>

…erflow ### What changes were proposed in this pull request? Add check if the byte length over `int`. ### Why are the changes needed? We encounter a very extreme case with expression `concat_ws`, and the error msg is ``` Caused by: java.lang.NegativeArraySizeException at org.apache.spark.unsafe.types.UTF8String.concatWs ``` Seems the `UTF8String.concat` has already done the length check at [#21064](apache/spark#21064), so it's better to add in `concatWs`. ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? It's too heavy to add the test. Closes #32106 from ulysses-you/SPARK-35005. Authored-by: ulysses-you <[email protected]> Signed-off-by: Max Gekk <[email protected]>

initial commit

6ba94f3

hvanhovell approved these changes Apr 19, 2018

View reviewed changes

hvanhovell approved these changes May 2, 2018

View reviewed changes

asfgit closed this in 9215ee7 May 2, 2018

ulysses-you mentioned this pull request Apr 9, 2021

[SPARK-35005][SQL] Improve error msg if UTF8String concatWs length overflow #32106

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-23976][Core] Detect length overflow in UTF8String.concat()/ByteArray.concat() #21064

[SPARK-23976][Core] Detect length overflow in UTF8String.concat()/ByteArray.concat() #21064

Uh oh!

kiszk commented Apr 13, 2018

Uh oh!

kiszk commented Apr 13, 2018

Uh oh!

SparkQA commented Apr 13, 2018

Uh oh!

kiszk commented Apr 17, 2018

Uh oh!

hvanhovell left a comment •

edited

Loading

Uh oh!

jiangxb1987 commented Apr 25, 2018

Uh oh!

kiszk commented Apr 30, 2018

Uh oh!

kiszk commented May 2, 2018

Uh oh!

hvanhovell left a comment

Uh oh!

hvanhovell commented May 2, 2018

Uh oh!

cloud-fan commented May 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-23976][Core] Detect length overflow in UTF8String.concat()/ByteArray.concat() #21064

[SPARK-23976][Core] Detect length overflow in UTF8String.concat()/ByteArray.concat() #21064

Uh oh!

Conversation

kiszk commented Apr 13, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

kiszk commented Apr 13, 2018

Uh oh!

SparkQA commented Apr 13, 2018

Uh oh!

kiszk commented Apr 17, 2018

Uh oh!

hvanhovell left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jiangxb1987 commented Apr 25, 2018

Uh oh!

kiszk commented Apr 30, 2018

Uh oh!

kiszk commented May 2, 2018

Uh oh!

hvanhovell left a comment

Choose a reason for hiding this comment

Uh oh!

hvanhovell commented May 2, 2018

Uh oh!

cloud-fan commented May 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hvanhovell left a comment •

edited

Loading