-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25454][SQL] Avoid precision loss in division with decimal with negative scale #22450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7c4b454
520b64e
27a9ea6
4e240d9
dd19f7f
b01cbc3
97b9c56
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -40,10 +40,13 @@ import org.apache.spark.sql.types._ | |
| * e1 + e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2) | ||
| * e1 - e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2) | ||
| * e1 * e2 p1 + p2 + 1 s1 + s2 | ||
| * e1 / e2 p1 - s1 + s2 + max(6, s1 + p2 + 1) max(6, s1 + p2 + 1) | ||
| * e1 / e2 max(p1-s1+s2, 0) + max(6, s1+adjP2+1) max(6, s1+adjP2+1) | ||
| * e1 % e2 min(p1-s1, p2-s2) + max(s1, s2) max(s1, s2) | ||
| * e1 union e2 max(s1, s2) + max(p1-s1, p2-s2) max(s1, s2) | ||
| * | ||
| * Where adjP2 is p2 - s2 if s2 < 0, p2 otherwise. This adjustment is needed because Spark does not | ||
| * forbid decimals with negative scale, while MS SQL and Hive do. | ||
| * | ||
| * When `spark.sql.decimalOperations.allowPrecisionLoss` is set to true, if the precision / scale | ||
| * needed are out of the range of available values, the scale is reduced up to 6, in order to | ||
| * prevent the truncation of the integer part of the decimals. | ||
|
|
@@ -129,16 +132,17 @@ object DecimalPrecision extends TypeCoercionRule { | |
| resultType) | ||
|
|
||
| case Divide(e1 @ DecimalType.Expression(p1, s1), e2 @ DecimalType.Expression(p2, s2)) => | ||
| val adjP2 = if (s2 < 0) p2 - s2 else p2 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This rule was added long time ago, do you mean this is a long-standing bug?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I think this is more clear in the related JIRA description and comments. The problem is that here we have never handled properly decimals with negative scale. The point is: before 2.3, this could happen only if someone was creating some specific literal from a BigDecimal, like Another solution would be avoiding having decimals with a negative scale. But this is quite a breaking change, so I'd avoid until a 3.0 release at least.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah i see. Can we add a test in
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we update the document of this rule to reflect this change?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sure, but if you agree I'll try and find a better place than
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SGTM |
||
| val resultType = if (SQLConf.get.decimalOperationsAllowPrecisionLoss) { | ||
| // Precision: p1 - s1 + s2 + max(6, s1 + p2 + 1) | ||
| // Scale: max(6, s1 + p2 + 1) | ||
| val intDig = p1 - s1 + s2 | ||
| val scale = max(DecimalType.MINIMUM_ADJUSTED_SCALE, s1 + p2 + 1) | ||
| val intDig = max(p1 - s1 + s2, 0) // can be negative if s2 < 0 | ||
| val scale = max(DecimalType.MINIMUM_ADJUSTED_SCALE, s1 + adjP2 + 1) | ||
| val prec = intDig + scale | ||
| DecimalType.adjustPrecisionScale(prec, scale) | ||
| } else { | ||
| var intDig = min(DecimalType.MAX_SCALE, p1 - s1 + s2) | ||
| var decDig = min(DecimalType.MAX_SCALE, max(6, s1 + p2 + 1)) | ||
| var intDig = max(min(DecimalType.MAX_SCALE, p1 - s1 + s2), 0) // can be negative if s2 < 0 | ||
| var decDig = min(DecimalType.MAX_SCALE, max(6, s1 + adjP2 + 1)) | ||
| val diff = (intDig + decDig) - DecimalType.MAX_SCALE | ||
| if (diff > 0) { | ||
| decDig -= diff / 2 + 1 | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very critical. Is there any other database using this formula?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think as the other DBs I know the formula of are Hive and MS SQL which don't allow negative scales so they don't have this problem. The formula is not changed from before actually, it just handles a negative scale.