-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26538][SQL] Set default precision and scale for elements of postgres numeric array #23456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…stgres numeric array
mgaido91
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a minor comment, otherwise seems reasonable. cc @gatorsmile @srowen for triggering the build when they are comfortable with the change.
| case "timestamp" | "timestamptz" | "time" | "timetz" => Some(TimestampType) | ||
| case "date" => Some(DateType) | ||
| case "numeric" | "decimal" => Some(DecimalType.bounded(precision, scale)) | ||
| case "numeric" | "decimal" if precision != 0 => Some(DecimalType.bounded(precision, scale)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about
case "numeric" | "decimal" => if (precision > 0) {
Some(DecimalType.bounded(precision, scale))
} else {
// Here a small comment explaining when this can happen and why we do this.
Some(DecimalType. SYSTEM_DEFAULT)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated per your suggestion.
|
Would it make sense to add the tests from 74215de as well ? |
|
@a-shkarupin yes, I think so. cc @dongjoon-hyun who prepared that patch and can have other/better suggestions. |
|
ok to test |
|
@a-shkarupin Please add the @dongjoon-hyun 's test in |
| case "timestamp" | "timestamptz" | "time" | "timetz" => Some(TimestampType) | ||
| case "date" => Some(DateType) | ||
| case "numeric" | "decimal" => Some(DecimalType.bounded(precision, scale)) | ||
| case "numeric" | "decimal" => if (precision > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to confirm just in case; we don't check scale in this pr? Probably, this might be related to the discussion: #23458 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Postgres doc says that
The precision must be positive, the scale zero or positive.
What the postgres jdbc driver returned in case of numeric was 0 for both scale and precision.
The condition proposed in the linked ticket and currently used here was roughly precision > 0 || scale > 0, but I can not come up with a valid case having precision <=0 while having scale > 0.
Is there another case where we would have a decimal with precision 0?
Could someone explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, but, I think we'd be better to add the check scale > 0, too, just for safeguards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine actually,. We do not support decimals with precision < 0, so this is most likely enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I still agree with @maropu (#23456 (comment)), but it looks okay because this is PostgresDialect.scala.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we'd add || scale > 0, we'd allow a percision <= 0,which doesn't make any sense and it is not supported by Spark's decimal. So I think this is fine.
|
nit: Could you clean up the title (plz move |
|
Test build #100956 has finished for PR 23456 at commit
|
Added the test. Ran tests as follows: Got following result:
Cleaned up. |
|
Test build #100976 has finished for PR 23456 at commit
|
|
Can you update the PR description ( |
| } else { | ||
| // SPARK-26538: handle numeric without explicit precision and scale. | ||
| Some(DecimalType. SYSTEM_DEFAULT) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @a-shkarupin . Thank you for your first contribution.
Could you follow the existing succinct style? What I mean is having two case "numeric" | "decimal"s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated per your suggestion, kept the comment as suggested here.
Updated. |
|
Test build #101030 has finished for PR 23456 at commit
|
|
LGTM |
|
LGTM too, thanks. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM
|
Thank you all! Merged to |
…stgres numeric array ## What changes were proposed in this pull request? When determining CatalystType for postgres columns with type `numeric[]` set the type of array element to `DecimalType(38, 18)` instead of `DecimalType(0,0)`. ## How was this patch tested? Tested with modified `org.apache.spark.sql.jdbc.JDBCSuite`. Ran the `PostgresIntegrationSuite` manually. Closes #23456 from a-shkarupin/postgres_numeric_array. Lead-authored-by: Oleksii Shkarupin <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 5b37092) Signed-off-by: Dongjoon Hyun <[email protected]>
…stgres numeric array ## What changes were proposed in this pull request? When determining CatalystType for postgres columns with type `numeric[]` set the type of array element to `DecimalType(38, 18)` instead of `DecimalType(0,0)`. ## How was this patch tested? Tested with modified `org.apache.spark.sql.jdbc.JDBCSuite`. Ran the `PostgresIntegrationSuite` manually. Closes #23456 from a-shkarupin/postgres_numeric_array. Lead-authored-by: Oleksii Shkarupin <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 5b37092) Signed-off-by: Dongjoon Hyun <[email protected]>
|
Hi, @a-shkarupin . What is your Apache JIRA id? |
|
Hi @dongjoon-hyun . My Apache JIRA username is alsh. I reported SPARK-26538. |
|
HI, @a-shkarupin . Yep. |
…stgres numeric array ## What changes were proposed in this pull request? When determining CatalystType for postgres columns with type `numeric[]` set the type of array element to `DecimalType(38, 18)` instead of `DecimalType(0,0)`. ## How was this patch tested? Tested with modified `org.apache.spark.sql.jdbc.JDBCSuite`. Ran the `PostgresIntegrationSuite` manually. Closes apache#23456 from a-shkarupin/postgres_numeric_array. Lead-authored-by: Oleksii Shkarupin <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…stgres numeric array ## What changes were proposed in this pull request? When determining CatalystType for postgres columns with type `numeric[]` set the type of array element to `DecimalType(38, 18)` instead of `DecimalType(0,0)`. ## How was this patch tested? Tested with modified `org.apache.spark.sql.jdbc.JDBCSuite`. Ran the `PostgresIntegrationSuite` manually. Closes apache#23456 from a-shkarupin/postgres_numeric_array. Lead-authored-by: Oleksii Shkarupin <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 5b37092) Signed-off-by: Dongjoon Hyun <[email protected]>
…stgres numeric array ## What changes were proposed in this pull request? When determining CatalystType for postgres columns with type `numeric[]` set the type of array element to `DecimalType(38, 18)` instead of `DecimalType(0,0)`. ## How was this patch tested? Tested with modified `org.apache.spark.sql.jdbc.JDBCSuite`. Ran the `PostgresIntegrationSuite` manually. Closes apache#23456 from a-shkarupin/postgres_numeric_array. Lead-authored-by: Oleksii Shkarupin <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 5b37092) Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
When determining CatalystType for postgres columns with type
numeric[]set the type of array element toDecimalType(38, 18)instead ofDecimalType(0,0).How was this patch tested?
Tested with modified
org.apache.spark.sql.jdbc.JDBCSuite.Ran the
PostgresIntegrationSuitemanually.