-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26540][SQL] Support PostgreSQL numeric arrays without precision/scale #23458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26540][SQL] Support PostgreSQL numeric arrays without precision/scale #23458
Conversation
|
cc @maropu |
|
Thanks, @dongjoon-hyun! I’ll check tonight (away from a keyboard now) |
maropu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior is the same with v1.6?
|
Yes. Exactly. It uses |
|
Also, did you check if PostgresIntegrationSuite passed in your env? It seems Jenkins doesnot run the integration tests before. |
|
Yes. I ran it inside IntelliJ manually. |
|
Also, cc @gatorsmile and @mgaido91 . |
|
Test build #100764 has finished for PR 23458 at commit
|
|
Test build #100765 has finished for PR 23458 at commit
|
|
Retest this please. |
|
Test build #100772 has finished for PR 23458 at commit
|
|
Retest this please. |
| case "timestamp" | "timestamptz" | "time" | "timetz" => Some(TimestampType) | ||
| case "date" => Some(DateType) | ||
| case "numeric" | "decimal" => Some(DecimalType.bounded(precision, scale)) | ||
| case "numeric" | "decimal" if precision != 0 || scale != 0 => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check scale != 0, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Please see non-array type logic here https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L224-L226.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I see. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a very important point, but I think we best use precision > 0 || scale > 0. I mean, there may happen that scale is negative. In that case, precision must be > 0. In the above code a numeric(0, -10) which is an invalid combination is parsed as DECIMAL(0, -10). Probably, we can rely that this case never happens, but I think having code robust also for error conditions is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. If we consider negative scale cases, we should change all of the existing code together to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on this. We can probably also do that in a followup PR.
|
LGTM except for one question. |
|
Can you also add 'Closes #23456' in the description? I found the duplicate jira ticket. |
|
Test build #100777 has finished for PR 23458 at commit
|
|
Oh, I didn't notice that. I see. Thanks. |
|
Retest this please |
|
my bad, I didn't notice the existing jira ticket, sorry. |
|
BTW, we need to backport this to older branches, right? |
|
Yea, I think so. Since v1.6 accepts this query, this is a kind of regression? |
|
Yep. I agree. |
mgaido91
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just not sure why we have this PR duplicating #23456. Is that PR stale? It has been opened few hours ago, so I wouldn't consider it like that. Probably I am missing something. Thanks.
| case "timestamp" | "timestamptz" | "time" | "timetz" => Some(TimestampType) | ||
| case "date" => Some(DateType) | ||
| case "numeric" | "decimal" => Some(DecimalType.bounded(precision, scale)) | ||
| case "numeric" | "decimal" if precision != 0 || scale != 0 => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a very important point, but I think we best use precision > 0 || scale > 0. I mean, there may happen that scale is negative. In that case, precision must be > 0. In the above code a numeric(0, -10) which is an invalid combination is parsed as DECIMAL(0, -10). Probably, we can rely that this case never happens, but I think having code robust also for error conditions is better.
|
Actually, two JIRA issues are created in a short time and I received a call on SPARK-26540. So, I didn't check other JIRA and other PR. |
|
If @mgaido91 has objection for this PR, I'll close this one. I'm okay. It's fair. |
|
@maropu . I close this PR and JIRA issue together~ |
mgaido91
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mine was just a question @dongjoon-hyun. I was wondering if the other PR had something wrong which I couldn't figure out. I'd really appreciate if you could provide your feedback and help there then. Thanks.
| case "timestamp" | "timestamptz" | "time" | "timetz" => Some(TimestampType) | ||
| case "date" => Some(DateType) | ||
| case "numeric" | "decimal" => Some(DecimalType.bounded(precision, scale)) | ||
| case "numeric" | "decimal" if precision != 0 || scale != 0 => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on this. We can probably also do that in a followup PR.
|
Test build #100783 has finished for PR 23458 at commit
|
What changes were proposed in this pull request?
Currently, Spark cannot handle
numeric[]type with decimal data because both precision and scale are considered as 0.BEFORE
AFTER
How was this patch tested?
Manual. Run the
PostgresIntegrationSuite.Closes #23456