[SPARK-28067][SPARK-32018] Fix decimal overflow issues #29026

cloud-fan · 2020-07-07T16:27:12Z

What changes were proposed in this pull request?

This is a followup of #27627 to fix the remaining issues. There are 2 issues fixed in this PR:

UnsafeRow.setDecimal can set an overflowed decimal and causes an error when reading it. The expected behavior is to return null.
The update/merge expression for decimal type in Sum is wrong. We shouldn't turn the sum value back to 0 after it becomes null due to overflow. This issue was hidden because:
2.1 for hash aggregate, the buffer is unsafe row. Due to the first bug, we fail when overflow happens, so there is no chance to mistakenly turn null back to 0.
2.2 for sort-based aggregate, the buffer is generic row. The decimal can overflow (the Decimal class has unlimited precision) and we don't have the null problem.

If we only fix the first bug, then the second bug is exposed and test fails. If we only fix the second bug, there is no way to test it. This PR fixes these 2 bugs together.

Why are the changes needed?

Fix issues during decimal sum when overflow happens

Does this PR introduce any user-facing change?

Yes. Now decimal sum can return null correctly for overflow under non-ansi mode.

How was this patch tested?

new test and updated test

cloud-fan · 2020-07-07T16:29:40Z

cc @skambha @rednaxelafx @viirya

cloud-fan · 2020-07-07T16:30:52Z

sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java

      Platform.putLong(baseObject, baseOffset + cursor + 8, 0L);

-      if (value == null) {
+      if (value == null || !value.changePrecision(precision, value.scale())) {


Thanks to @allisonwang-db for catching this bug!

This looks valid for branch-3.0 and branch-2.4. Do you think we need to backport to to branch-2.4?

Yes we should

dongjoon-hyun · 2020-07-07T16:41:43Z

Hi, @cloud-fan . SPARK-28067 is merged for 3.1.0 only. This also aims 3.1.0 only?

SparkQA · 2020-07-07T20:31:32Z

Test build #125235 has finished for PR 29026 at commit 3717fc6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2020-07-07T20:47:55Z

retest this please

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala

viirya

Good catch!

rednaxelafx

LGTM, thanks for fixing this!

skambha · 2020-07-08T03:00:33Z

@cloud-fan, Thanks for looking into covering some more cases. If we change the UnsafeRow.setDecimal overflow logic, I agree the implementation of the sum needs to change. This is coming back to the discussion we had earlier here (skambha#1), the sum overflow logic is very much dependent on the underlying implementation of overflow logic in UnsafeRowWriter and UnsafeRow etc.

I have a few comments related to the fix in UnsafeRow.

So now we have UnsafeRow.setDecimal silently returns a null for an overflowed decimal value in setDecimal, but getDecimal throws error. There is inconsistency here. Why is that ok? Also, they dont honor the ansi mode.
In this scenario, now we are more aggressive in the checking of the overflow. We have moved the overflow check further down to return null. Earlier I think the decision was to not do the checking per row, but now dont we end up doing that in some of the cases, right?

skambha · 2020-07-08T03:02:27Z

sql/core/src/test/scala/org/apache/spark/sql/UnsafeRowSuite.scala

+    assert(unsafeRow.getDecimal(0, 38, 18) === d1)
+    val d2 = (d1 * Decimal(10)).toPrecision(39, 18)
+    unsafeRow.setDecimal(0, d2, 38)
+    assert(unsafeRow.getDecimal(0, 38, 18) === null)


What happens with ansi mode true?

UnsafeRow is a low-level entity and doesn't respect ansi flag.

cloud-fan · 2020-07-08T03:24:08Z

So now we have UnsafeRow.setDecimal silently returns a null for an overflowed decimal value in setDecimal, but getDecimal throws error. There is inconsistency here. Why is that ok?

Correction: UnsafeRow.setDecimal returns void. This PR fixes UnsafeRow.setDecimal so that getDecimal can return null if the value is overflowed.

Earlier I think the decision was to not do the checking per row, but now dont we end up doing that in some of the cases

Under ansi mode, you have to check overflow per-row, as it's done by the Add expression. This is not changed in this PR.

cloud-fan · 2020-07-08T03:25:27Z

retest this please

SparkQA · 2020-07-08T04:04:50Z

Test build #125254 has finished for PR 29026 at commit 3717fc6.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-07-08T05:27:30Z

retest this please

HyukjinKwon · 2020-07-08T05:34:41Z

retest this please

SparkQA · 2020-07-08T06:03:38Z

Test build #125301 has finished for PR 29026 at commit 3717fc6.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-08T07:05:02Z

Test build #125302 has finished for PR 29026 at commit 3717fc6.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-08T07:05:03Z

Test build #125292 has finished for PR 29026 at commit 3717fc6.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-07-08T07:19:26Z

retest this please

SparkQA · 2020-07-08T07:20:04Z

Test build #125315 has started for PR 29026 at commit 3717fc6.

dongjoon-hyun · 2020-07-08T15:58:48Z

It's a build time out during PySpark test.

Build timed out (after 500 minutes). Marking the build as aborted.
Build was aborted

dongjoon-hyun · 2020-07-08T15:59:10Z

Retest this please.

SparkQA · 2020-07-08T18:01:32Z

Test build #125383 has finished for PR 29026 at commit 3717fc6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-07-09T00:01:47Z

retest this please

SparkQA · 2020-07-09T06:29:15Z

Test build #125422 has finished for PR 29026 at commit 3717fc6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2020-07-09T06:56:21Z

Merged to master.

…rflowed value partially backport #29026 Closes #29125 from cloud-fan/backport. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…rflowed value partially backport apache#29026 Closes apache#29125 from cloud-fan/backport. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…erflow in sum aggregation ### What changes were proposed in this pull request? Add migration guide for decimal value overflow behavior in sum aggregation, introduced in #29026 ### Why are the changes needed? Add migration guide for the behavior changes from 3.0 to 3.1. See also: #29450 (comment) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Build docs and preview: ![image](https://user-images.githubusercontent.com/1097932/90589256-8b7e3380-e192-11ea-8ff1-05a447c20722.png) Closes #29458 from gengliangwang/migrationGuideDecimalOverflow. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Gengliang Wang <[email protected]>

anuragmantri · 2020-10-07T23:22:31Z

@cloud-fan @dongjoon-hyun - This PR (at least part of it) seems relevant to branch-2.4. Has there been a backport PR for this? If not, I would like to start a backport PR.

maropu · 2020-10-07T23:35:45Z

Hi, @anuragmantri, which part? Have you checked that discussion in the jira? e.g., https://issues.apache.org/jira/browse/SPARK-32018?focusedCommentId=17179002&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17179002

anuragmantri · 2020-10-08T00:14:27Z

@maropu - Thanks for pointing me to the discussion thread. Sorry I got lost a bit with so many threads. I will park this idea for now since SPARK-28067 is not in any release branches yet.

cloud-fan · 2020-10-08T03:21:43Z

There was an attempt to backport this fix but we decided not to as it breaks streaming backward compatibility. We can't backport part of the fix either, see #29448 (comment)

* KE-24858 [SPARK-28067][SQL] Fix incorrect results for decimal aggregate sum by returning null on decimal overflow * [SPARK-28067][SPARK-32018] Fix decimal overflow issues ### What changes were proposed in this pull request? This is a followup of apache#27627 to fix the remaining issues. There are 2 issues fixed in this PR: 1. `UnsafeRow.setDecimal` can set an overflowed decimal and causes an error when reading it. The expected behavior is to return null. 2. The update/merge expression for decimal type in `Sum` is wrong. We shouldn't turn the `sum` value back to 0 after it becomes null due to overflow. This issue was hidden because: 2.1 for hash aggregate, the buffer is unsafe row. Due to the first bug, we fail when overflow happens, so there is no chance to mistakenly turn null back to 0. 2.2 for sort-based aggregate, the buffer is generic row. The decimal can overflow (the Decimal class has unlimited precision) and we don't have the null problem. If we only fix the first bug, then the second bug is exposed and test fails. If we only fix the second bug, there is no way to test it. This PR fixes these 2 bugs together. ### Why are the changes needed? Fix issues during decimal sum when overflow happens ### Does this PR introduce _any_ user-facing change? Yes. Now decimal sum can return null correctly for overflow under non-ansi mode. ### How was this patch tested? new test and updated test Closes apache#29026 from cloud-fan/decimal. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> * KE-24858 fix error: java.lang.IllegalArgumentException: Can not interpolate java.lang.Boolean into code block. * KE-24858 fix ci error * KE-24858 update pom version Co-authored-by: Sunitha Kambhampati <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Co-authored-by: longfei.jiang <[email protected]>

fix decimal overflow issues

3717fc6

probot-autolabeler bot added the SQL label Jul 7, 2020

cloud-fan commented Jul 7, 2020

View reviewed changes

viirya reviewed Jul 7, 2020

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala Show resolved Hide resolved

viirya approved these changes Jul 7, 2020

View reviewed changes

HyukjinKwon changed the title ~~[SPARK-28067][SPARK-32018] fix decimal overflow issues~~ [SPARK-28067][SPARK-32018] Fix decimal overflow issues Jul 8, 2020

HyukjinKwon approved these changes Jul 8, 2020

View reviewed changes

rednaxelafx reviewed Jul 8, 2020

View reviewed changes

skambha reviewed Jul 8, 2020

View reviewed changes

HyukjinKwon closed this in 8c5bee5 Jul 9, 2020

cloud-fan mentioned this pull request Jul 15, 2020

[SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value #29125

Closed

gengliangwang mentioned this pull request Aug 18, 2020

[SPARK-32018][FOLLOWUP][Doc] Add migration guide for decimal value overflow in sum aggregation #29458

Closed

linhongliu-db mentioned this pull request Oct 19, 2020

[SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns #30053

Closed

jlfsdtc mentioned this pull request Jul 16, 2021

KE-24858 Decimal precision problems cause the build to fail Kyligence/spark#295

Merged

WangGuangxin mentioned this pull request May 21, 2022

[SPARK-39249][SQL] Improve subexpression elimination for conditional expressions #36626

Closed

[SPARK-28067][SPARK-32018] Fix decimal overflow issues #29026

[SPARK-28067][SPARK-32018] Fix decimal overflow issues #29026

Uh oh!

Conversation

cloud-fan commented Jul 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

cloud-fan commented Jul 7, 2020

Uh oh!

cloud-fan Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jul 8, 2020

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jul 8, 2020

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jul 7, 2020

Uh oh!

SparkQA commented Jul 7, 2020

Uh oh!

viirya commented Jul 7, 2020

Uh oh!

Uh oh!

viirya left a comment

Choose a reason for hiding this comment

Uh oh!

rednaxelafx left a comment

Choose a reason for hiding this comment

Uh oh!

skambha commented Jul 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skambha Jul 8, 2020

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jul 8, 2020

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Jul 8, 2020

Uh oh!

cloud-fan commented Jul 8, 2020

Uh oh!

SparkQA commented Jul 8, 2020

Uh oh!

cloud-fan commented Jul 8, 2020

Uh oh!

HyukjinKwon commented Jul 8, 2020

Uh oh!

SparkQA commented Jul 8, 2020

Uh oh!

SparkQA commented Jul 8, 2020

Uh oh!

SparkQA commented Jul 8, 2020

Uh oh!

cloud-fan commented Jul 8, 2020

Uh oh!

SparkQA commented Jul 8, 2020

Uh oh!

dongjoon-hyun commented Jul 8, 2020

Uh oh!

dongjoon-hyun commented Jul 8, 2020

Uh oh!

SparkQA commented Jul 8, 2020

Uh oh!

maropu commented Jul 9, 2020

Uh oh!

SparkQA commented Jul 9, 2020

Uh oh!

HyukjinKwon commented Jul 9, 2020

Uh oh!

anuragmantri commented Oct 7, 2020

Uh oh!

maropu commented Oct 7, 2020

Uh oh!

cloud-fan commented Jul 7, 2020 •

edited

Loading

skambha commented Jul 8, 2020 •

edited

Loading

cloud-fan commented Oct 8, 2020 •

edited

Loading