[SPARK-26540][SQL] Support PostgreSQL numeric arrays without precision/scale #23458

dongjoon-hyun · 2019-01-05T02:02:12Z

What changes were proposed in this pull request?

Currently, Spark cannot handle numeric[] type with decimal data because both precision and scale are considered as 0.

postgres=# CREATE TABLE t (v numeric[], d  numeric);
CREATE TABLE
postgres=# INSERT INTO t VALUES('{1111.222,2222.332}', 222.4555);
INSERT 0 1

BEFORE

scala> val pgTable = spark.read.jdbc("jdbc:postgresql:postgres", "t", options)
pgTable: org.apache.spark.sql.DataFrame = [v: array<decimal(0,0)>, d: decimal(38,18)]

scala> pgTable.show
19/01/04 18:07:38 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.IllegalArgumentException: requirement failed: Decimal precision 4 exceeds max precision 0

AFTER

scala> val pgTable = spark.read.jdbc("jdbc:postgresql:postgres", "t", options)
pgTable: org.apache.spark.sql.DataFrame = [v: array<decimal(38,18)>, d: decimal(38,18)]

scala> pgTable.show()
+--------------------+--------------------+
|                   v|                   d|
+--------------------+--------------------+
|[1111.22200000000...|222.4555000000000...|
+--------------------+--------------------+

How was this patch tested?

Manual. Run the PostgresIntegrationSuite.

Closes #23456

…n/scale

dongjoon-hyun · 2019-01-05T02:03:55Z

cc @maropu

maropu · 2019-01-05T02:41:19Z

Thanks, @dongjoon-hyun! I’ll check tonight (away from a keyboard now)

maropu

This behavior is the same with v1.6?

dongjoon-hyun · 2019-01-05T02:44:55Z

Yes. Exactly. It uses decimal(38,18).

maropu · 2019-01-05T02:50:41Z

Also, did you check if PostgresIntegrationSuite passed in your env? It seems Jenkins doesnot run the integration tests before.

dongjoon-hyun · 2019-01-05T02:53:00Z

Yes. I ran it inside IntelliJ manually.

dongjoon-hyun · 2019-01-05T03:15:29Z

Also, cc @gatorsmile and @mgaido91 .

SparkQA · 2019-01-05T05:58:21Z

Test build #100764 has finished for PR 23458 at commit 9e1b8ce.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-01-05T06:41:40Z

Test build #100765 has finished for PR 23458 at commit 74215de.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-01-05T06:55:05Z

Retest this please.

SparkQA · 2019-01-05T08:05:01Z

Test build #100772 has finished for PR 23458 at commit 74215de.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-01-05T09:01:16Z

Retest this please.

maropu · 2019-01-05T11:16:01Z

sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala

    case "timestamp" | "timestamptz" | "time" | "timetz" => Some(TimestampType)
    case "date" => Some(DateType)
-    case "numeric" | "decimal" => Some(DecimalType.bounded(precision, scale))
+    case "numeric" | "decimal" if precision != 0 || scale != 0 =>


We need to check scale != 0, too?

Yes. Please see non-array type logic here https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L224-L226.

Aha, I see. Thanks.

Not a very important point, but I think we best use precision > 0 || scale > 0. I mean, there may happen that scale is negative. In that case, precision must be > 0. In the above code a numeric(0, -10) which is an invalid combination is parsed as DECIMAL(0, -10). Probably, we can rely that this case never happens, but I think having code robust also for error conditions is better.

Good point. If we consider negative scale cases, we should change all of the existing code together to be consistent.

+1 on this. We can probably also do that in a followup PR.

maropu · 2019-01-05T11:16:17Z

LGTM except for one question.

maropu · 2019-01-05T12:31:23Z

Can you also add 'Closes #23456' in the description? I found the duplicate jira ticket.

SparkQA · 2019-01-05T12:33:36Z

Test build #100777 has finished for PR 23458 at commit 74215de.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-01-05T12:34:36Z

Oh, I didn't notice that. I see. Thanks.

dongjoon-hyun · 2019-01-05T12:35:05Z

Retest this please

maropu · 2019-01-05T13:29:01Z

my bad, I didn't notice the existing jira ticket, sorry.

dongjoon-hyun · 2019-01-05T14:11:58Z

BTW, we need to backport this to older branches, right?

maropu · 2019-01-05T14:28:00Z

Yea, I think so. Since v1.6 accepts this query, this is a kind of regression?

dongjoon-hyun · 2019-01-05T14:30:19Z

Yep. I agree.

mgaido91

I am just not sure why we have this PR duplicating #23456. Is that PR stale? It has been opened few hours ago, so I wouldn't consider it like that. Probably I am missing something. Thanks.

mgaido91 · 2019-01-05T15:19:27Z

sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala

    case "timestamp" | "timestamptz" | "time" | "timetz" => Some(TimestampType)
    case "date" => Some(DateType)
-    case "numeric" | "decimal" => Some(DecimalType.bounded(precision, scale))
+    case "numeric" | "decimal" if precision != 0 || scale != 0 =>


Not a very important point, but I think we best use precision > 0 || scale > 0. I mean, there may happen that scale is negative. In that case, precision must be > 0. In the above code a numeric(0, -10) which is an invalid combination is parsed as DECIMAL(0, -10). Probably, we can rely that this case never happens, but I think having code robust also for error conditions is better.

dongjoon-hyun · 2019-01-05T15:38:41Z

Actually, two JIRA issues are created in a short time and I received a call on SPARK-26540. So, I didn't check other JIRA and other PR.

dongjoon-hyun · 2019-01-05T15:41:51Z

If @mgaido91 has objection for this PR, I'll close this one. I'm okay. It's fair.
Just tell him to receive this code together.

dongjoon-hyun · 2019-01-05T15:47:58Z

@maropu . I close this PR and JIRA issue together~

mgaido91

mine was just a question @dongjoon-hyun. I was wondering if the other PR had something wrong which I couldn't figure out. I'd really appreciate if you could provide your feedback and help there then. Thanks.

mgaido91 · 2019-01-05T15:59:52Z

sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala

    case "timestamp" | "timestamptz" | "time" | "timetz" => Some(TimestampType)
    case "date" => Some(DateType)
-    case "numeric" | "decimal" => Some(DecimalType.bounded(precision, scale))
+    case "numeric" | "decimal" if precision != 0 || scale != 0 =>


+1 on this. We can probably also do that in a followup PR.

SparkQA · 2019-01-05T16:31:27Z

Test build #100783 has finished for PR 23458 at commit 74215de.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

[SPARK-26540][SQL] Support PostgreSQL numeric arrays without precisio…

9e1b8ce

…n/scale

Add a test at PostgresIntegrationSuite

74215de

maropu reviewed Jan 5, 2019

View reviewed changes

mgaido91 reviewed Jan 5, 2019

View reviewed changes

dongjoon-hyun closed this Jan 5, 2019

dongjoon-hyun deleted the SPARK-26540-DECIMAL-ARRAY branch January 5, 2019 15:45

mgaido91 reviewed Jan 5, 2019

View reviewed changes

maropu mentioned this pull request Jan 9, 2019

[SPARK-26538][SQL] Set default precision and scale for elements of postgres numeric array #23456

Closed

[SPARK-26540][SQL] Support PostgreSQL numeric arrays without precision/scale #23458

[SPARK-26540][SQL] Support PostgreSQL numeric arrays without precision/scale #23458

Uh oh!

Conversation

dongjoon-hyun commented Jan 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

dongjoon-hyun commented Jan 5, 2019

Uh oh!

maropu commented Jan 5, 2019

Uh oh!

maropu left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jan 5, 2019

Uh oh!

maropu commented Jan 5, 2019

Uh oh!

dongjoon-hyun commented Jan 5, 2019

Uh oh!

dongjoon-hyun commented Jan 5, 2019

Uh oh!

SparkQA commented Jan 5, 2019

Uh oh!

SparkQA commented Jan 5, 2019

Uh oh!

dongjoon-hyun commented Jan 5, 2019

Uh oh!

SparkQA commented Jan 5, 2019

Uh oh!

dongjoon-hyun commented Jan 5, 2019

Uh oh!

maropu Jan 5, 2019

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jan 5, 2019

Choose a reason for hiding this comment

Uh oh!

maropu Jan 5, 2019

Choose a reason for hiding this comment

Uh oh!

mgaido91 Jan 5, 2019

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jan 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgaido91 Jan 5, 2019

Choose a reason for hiding this comment

Uh oh!

maropu commented Jan 5, 2019

Uh oh!

maropu commented Jan 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jan 5, 2019

Uh oh!

dongjoon-hyun commented Jan 5, 2019

Uh oh!

dongjoon-hyun commented Jan 5, 2019

Uh oh!

maropu commented Jan 5, 2019

Uh oh!

dongjoon-hyun commented Jan 5, 2019

Uh oh!

maropu commented Jan 5, 2019

Uh oh!

dongjoon-hyun commented Jan 5, 2019

Uh oh!

mgaido91 left a comment

Choose a reason for hiding this comment

Uh oh!

mgaido91 Jan 5, 2019

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jan 5, 2019

Uh oh!

dongjoon-hyun commented Jan 5, 2019 •

edited

Loading

dongjoon-hyun Jan 5, 2019 •

edited

Loading

maropu commented Jan 5, 2019 •

edited

Loading

dongjoon-hyun commented Jan 5, 2019 •

edited

Loading