-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-45905][SQL] Least common type between decimal types should retain integral digits first #43781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-45905][SQL] Least common type between decimal types should retain integral digits first #43781
Changes from 2 commits
70affef
41e0a3c
cf5cb7b
d61da82
5426242
f75fe5d
c6f4295
0400e8c
42477eb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -240,6 +240,25 @@ The least common type resolution is used to: | |
| - Derive the result type for expressions such as the case expression. | ||
| - Derive the element, key, or value types for array and map constructors. | ||
| Special rules are applied if the least common type resolves to FLOAT. With float type values, if any of the types is INT, BIGINT, or DECIMAL the least common type is pushed to DOUBLE to avoid potential loss of digits. | ||
|
|
||
| Decimal type is a bit more complicated here, as it's not a simple type but has parameters: precision and scale. | ||
| A `decimal(precision, scale)` means the value can has at most `precision - scale` digits in the integral part and `scale` digits in the fractional part. | ||
| A least common type between decimal types should have enough digits in both integral and fractional parts to represent all values. | ||
| More precisely, a least common type between `decimal(p1, s1)` and `decimal(p2, s2)` has the scale of `max(s1, s2)` and precision of `max(s1, s2) + max(p1 - s1, p2 - s2)`. | ||
| However, decimal types in Spark has a maximum precision: 38. If the final decimal type needs more precision, we must do truncation. | ||
cloud-fan marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| Since the digits in the integral part are more significant, Spark truncates the digits in the fractional part first. For example, `decimal(48, 20)` will be reduced to `decimal(38, 10)`. | ||
|
|
||
| Note, arithmetic operations have special rules to calculate the least common type for decimal inputs: | ||
|
|
||
| | Operation | Result precision | Result scale | | ||
| |------------|------------------------------------------|---------------------| | ||
| | e1 + e2 | max(s1, s2) + max(p1 - s1, p2 - s2) + 1 | max(s1, s2) | | ||
| | e1 - e2 | max(s1, s2) + max(p1 - s1, p2 - s2) + 1 | max(s1, s2) | | ||
| | e1 * e2 | p1 + p2 + 1 | s1 + s2 | | ||
cloud-fan marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| | e1 / e2 | p1 - s1 + s2 + max(6, s1 + p2 + 1) | max(6, s1 + p2 + 1) | | ||
| | e1 % e2 | min(p1 - s1, p2 - s2) + max(s1, s2) | max(s1, s2) | | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AFAIK, the arithmetic operations did not strictly follow this rule.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. which one does not follow? The final decimal type can be different as there is one more truncation step.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not the Spark SQL multiple. Please take a look at |
||
|
|
||
| The truncation rule is also different for arithmetic operations: they retain at least 6 digits in the fractional part, which means we can only reduce `scale` to 6. | ||
|
||
|
|
||
| ```sql | ||
| -- The coalesce function accepts any set of argument types as long as they share a least common type. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -146,6 +146,18 @@ object DecimalType extends AbstractDataType { | |
| DecimalType(min(precision, MAX_PRECISION), min(scale, MAX_SCALE)) | ||
| } | ||
|
|
||
| private[sql] def boundedPreferIntegralDigits(precision: Int, scale: Int): DecimalType = { | ||
| if (precision <= MAX_PRECISION) { | ||
| DecimalType(precision, scale) | ||
| } else { | ||
| // If we have to reduce the precision, we should retain the digits in the integral part first, | ||
| // as they are more significant to the value. Here we reduce the scale as well to drop the | ||
| // digits in the fractional part. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks good. |
||
| val diff = precision - MAX_PRECISION | ||
| DecimalType(MAX_PRECISION, math.max(0, scale - diff)) | ||
| } | ||
| } | ||
|
|
||
| private[sql] def checkNegativeScale(scale: Int): Unit = { | ||
| if (scale < 0 && !SqlApiConf.get.allowNegativeScaleOfDecimalEnabled) { | ||
| throw DataTypeErrors.negativeScaleNotAllowedError(scale) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.analysis | |
| import org.apache.spark.sql.catalyst.expressions._ | ||
| import org.apache.spark.sql.catalyst.expressions.Literal._ | ||
| import org.apache.spark.sql.catalyst.types.DataTypeUtils | ||
| import org.apache.spark.sql.internal.SQLConf | ||
| import org.apache.spark.sql.types._ | ||
|
|
||
|
|
||
|
|
@@ -64,7 +65,11 @@ object DecimalPrecision extends TypeCoercionRule { | |
| def widerDecimalType(p1: Int, s1: Int, p2: Int, s2: Int): DecimalType = { | ||
| val scale = max(s1, s2) | ||
| val range = max(p1 - s1, p2 - s2) | ||
| DecimalType.bounded(range + scale, scale) | ||
| if (conf.getConf(SQLConf.LEGACY_RETAIN_FRACTION_DIGITS_FIRST)) { | ||
| DecimalType.bounded(range + scale, scale) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are many usages of
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To limit the scope to type coercion only. Some arithmetic operations also call it to determine the result decimal type and I don't want to change that part. |
||
| } else { | ||
| DecimalType.boundedPreferIntegralDigits(range + scale, scale) | ||
| } | ||
| } | ||
|
|
||
| override def transform: PartialFunction[Expression, Expression] = { | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -4541,6 +4541,15 @@ object SQLConf { | |||||||||||||
| .booleanConf | ||||||||||||||
| .createWithDefault(false) | ||||||||||||||
|
|
||||||||||||||
| val LEGACY_RETAIN_FRACTION_DIGITS_FIRST = | ||||||||||||||
| buildConf("spark.sql.legacy.decimalLeastCommonType.retainFractionDigitsFirst") | ||||||||||||||
|
||||||||||||||
| .internal() | ||||||||||||||
| .doc("When set to true, we will try to retain the fraction digits first rather than " + | ||||||||||||||
| "integral digits, when getting a least common type between decimal types, and the " + | ||||||||||||||
| "result decimal precision exceeds the max precision.") | ||||||||||||||
|
||||||||||||||
| .doc("When set to true, we will try to retain the fraction digits first rather than " + | |
| "integral digits, when getting a least common type between decimal types, and the " + | |
| "result decimal precision exceeds the max precision.") | |
| .doc("When set to true, we will try to retain the fraction digits first rather than " + | |
| "integral digits as prior Spark 4.0, when getting a least common type between decimal types, and the " + | |
| "result decimal precision exceeds the max precision.") |
cloud-fan marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noise? (whitespace changes best made in a non-bugfix PR?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since I touched this file, I just fixed the wrong indentation.
Uh oh!
There was an error while loading. Please reload this page.