-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28067][SPARK-32018] Fix decimal overflow issues #29026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Platform.putLong(baseObject, baseOffset + cursor + 8, 0L); | ||
|
|
||
| if (value == null) { | ||
| if (value == null || !value.changePrecision(precision, value.scale())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to @allisonwang-db for catching this bug!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks valid for branch-3.0 and branch-2.4. Do you think we need to backport to to branch-2.4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should
|
Hi, @cloud-fan . SPARK-28067 is merged for 3.1.0 only. This also aims 3.1.0 only? |
|
Test build #125235 has finished for PR 29026 at commit
|
|
retest this please |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala
Show resolved
Hide resolved
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
rednaxelafx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for fixing this!
|
@cloud-fan, Thanks for looking into covering some more cases. If we change the UnsafeRow.setDecimal overflow logic, I agree the implementation of the sum needs to change. This is coming back to the discussion we had earlier here (skambha#1), the sum overflow logic is very much dependent on the underlying implementation of overflow logic in UnsafeRowWriter and UnsafeRow etc. I have a few comments related to the fix in UnsafeRow.
|
| assert(unsafeRow.getDecimal(0, 38, 18) === d1) | ||
| val d2 = (d1 * Decimal(10)).toPrecision(39, 18) | ||
| unsafeRow.setDecimal(0, d2, 38) | ||
| assert(unsafeRow.getDecimal(0, 38, 18) === null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens with ansi mode true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UnsafeRow is a low-level entity and doesn't respect ansi flag.
Correction:
Under ansi mode, you have to check overflow per-row, as it's done by the |
|
retest this please |
|
Test build #125254 has finished for PR 29026 at commit
|
|
retest this please |
1 similar comment
|
retest this please |
|
Test build #125301 has finished for PR 29026 at commit
|
|
Test build #125302 has finished for PR 29026 at commit
|
|
Test build #125292 has finished for PR 29026 at commit
|
|
retest this please |
|
Test build #125315 has started for PR 29026 at commit |
|
It's a build time out during PySpark test. |
|
Retest this please. |
|
Test build #125383 has finished for PR 29026 at commit
|
|
retest this please |
|
Test build #125422 has finished for PR 29026 at commit
|
|
Merged to master. |
…rflowed value partially backport #29026 Closes #29125 from cloud-fan/backport. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…rflowed value partially backport apache#29026 Closes apache#29125 from cloud-fan/backport. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…erflow in sum aggregation ### What changes were proposed in this pull request? Add migration guide for decimal value overflow behavior in sum aggregation, introduced in #29026 ### Why are the changes needed? Add migration guide for the behavior changes from 3.0 to 3.1. See also: #29450 (comment) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Build docs and preview:  Closes #29458 from gengliangwang/migrationGuideDecimalOverflow. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Gengliang Wang <[email protected]>
|
@cloud-fan @dongjoon-hyun - This PR (at least part of it) seems relevant to branch-2.4. Has there been a backport PR for this? If not, I would like to start a backport PR. |
|
Hi, @anuragmantri, which part? Have you checked that discussion in the jira? e.g., https://issues.apache.org/jira/browse/SPARK-32018?focusedCommentId=17179002&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17179002 |
|
@maropu - Thanks for pointing me to the discussion thread. Sorry I got lost a bit with so many threads. I will park this idea for now since SPARK-28067 is not in any release branches yet. |
|
There was an attempt to backport this fix but we decided not to as it breaks streaming backward compatibility. We can't backport part of the fix either, see #29448 (comment) |
* KE-24858 [SPARK-28067][SQL] Fix incorrect results for decimal aggregate sum by returning null on decimal overflow * [SPARK-28067][SPARK-32018] Fix decimal overflow issues ### What changes were proposed in this pull request? This is a followup of apache#27627 to fix the remaining issues. There are 2 issues fixed in this PR: 1. `UnsafeRow.setDecimal` can set an overflowed decimal and causes an error when reading it. The expected behavior is to return null. 2. The update/merge expression for decimal type in `Sum` is wrong. We shouldn't turn the `sum` value back to 0 after it becomes null due to overflow. This issue was hidden because: 2.1 for hash aggregate, the buffer is unsafe row. Due to the first bug, we fail when overflow happens, so there is no chance to mistakenly turn null back to 0. 2.2 for sort-based aggregate, the buffer is generic row. The decimal can overflow (the Decimal class has unlimited precision) and we don't have the null problem. If we only fix the first bug, then the second bug is exposed and test fails. If we only fix the second bug, there is no way to test it. This PR fixes these 2 bugs together. ### Why are the changes needed? Fix issues during decimal sum when overflow happens ### Does this PR introduce _any_ user-facing change? Yes. Now decimal sum can return null correctly for overflow under non-ansi mode. ### How was this patch tested? new test and updated test Closes apache#29026 from cloud-fan/decimal. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> * KE-24858 fix error: java.lang.IllegalArgumentException: Can not interpolate java.lang.Boolean into code block. * KE-24858 fix ci error * KE-24858 update pom version Co-authored-by: Sunitha Kambhampati <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Co-authored-by: longfei.jiang <[email protected]>
What changes were proposed in this pull request?
This is a followup of #27627 to fix the remaining issues. There are 2 issues fixed in this PR:
UnsafeRow.setDecimalcan set an overflowed decimal and causes an error when reading it. The expected behavior is to return null.Sumis wrong. We shouldn't turn thesumvalue back to 0 after it becomes null due to overflow. This issue was hidden because:2.1 for hash aggregate, the buffer is unsafe row. Due to the first bug, we fail when overflow happens, so there is no chance to mistakenly turn null back to 0.
2.2 for sort-based aggregate, the buffer is generic row. The decimal can overflow (the Decimal class has unlimited precision) and we don't have the null problem.
If we only fix the first bug, then the second bug is exposed and test fails. If we only fix the second bug, there is no way to test it. This PR fixes these 2 bugs together.
Why are the changes needed?
Fix issues during decimal sum when overflow happens
Does this PR introduce any user-facing change?
Yes. Now decimal sum can return null correctly for overflow under non-ansi mode.
How was this patch tested?
new test and updated test