Skip to content

Conversation

@a-shkarupin
Copy link
Contributor

@a-shkarupin a-shkarupin commented Jan 4, 2019

What changes were proposed in this pull request?

When determining CatalystType for postgres columns with type numeric[] set the type of array element to DecimalType(38, 18) instead of DecimalType(0,0).

How was this patch tested?

Tested with modified org.apache.spark.sql.jdbc.JDBCSuite.
Ran the PostgresIntegrationSuite manually.

Copy link
Contributor

@mgaido91 mgaido91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a minor comment, otherwise seems reasonable. cc @gatorsmile @srowen for triggering the build when they are comfortable with the change.

case "timestamp" | "timestamptz" | "time" | "timetz" => Some(TimestampType)
case "date" => Some(DateType)
case "numeric" | "decimal" => Some(DecimalType.bounded(precision, scale))
case "numeric" | "decimal" if precision != 0 => Some(DecimalType.bounded(precision, scale))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about

case "numeric" | "decimal" => if (precision > 0) {
    Some(DecimalType.bounded(precision, scale))
  } else {
    // Here a small comment explaining when this can happen and why we do this.
    Some(DecimalType. SYSTEM_DEFAULT)
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated per your suggestion.

@a-shkarupin
Copy link
Contributor Author

Would it make sense to add the tests from 74215de as well ?

@mgaido91
Copy link
Contributor

mgaido91 commented Jan 8, 2019

@a-shkarupin yes, I think so. cc @dongjoon-hyun who prepared that patch and can have other/better suggestions.

@maropu
Copy link
Member

maropu commented Jan 9, 2019

ok to test

@maropu
Copy link
Member

maropu commented Jan 9, 2019

@a-shkarupin Please add the @dongjoon-hyun 's test in PostgresIntegrationSuite.scala? Also, since Jenkins don't run the test, please check if the test passed in your env?

case "timestamp" | "timestamptz" | "time" | "timetz" => Some(TimestampType)
case "date" => Some(DateType)
case "numeric" | "decimal" => Some(DecimalType.bounded(precision, scale))
case "numeric" | "decimal" => if (precision > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to confirm just in case; we don't check scale in this pr? Probably, this might be related to the discussion: #23458 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres doc says that

The precision must be positive, the scale zero or positive.

What the postgres jdbc driver returned in case of numeric was 0 for both scale and precision.
The condition proposed in the linked ticket and currently used here was roughly precision > 0 || scale > 0, but I can not come up with a valid case having precision <=0 while having scale > 0.
Is there another case where we would have a decimal with precision 0?
Could someone explain?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, but, I think we'd be better to add the check scale > 0, too, just for safeguards.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine actually,. We do not support decimals with precision < 0, so this is most likely enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I still agree with @maropu (#23456 (comment)), but it looks okay because this is PostgresDialect.scala.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we'd add || scale > 0, we'd allow a percision <= 0,which doesn't make any sense and it is not supported by Spark's decimal. So I think this is fine.

@maropu
Copy link
Member

maropu commented Jan 9, 2019

nit: Could you clean up the title (plz move …stgres numeric array into the title)?

@SparkQA
Copy link

SparkQA commented Jan 9, 2019

Test build #100956 has finished for PR 23456 at commit 31b0b04.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@a-shkarupin a-shkarupin changed the title [SPARK-26538][SQL] Set default precision and scale for elements of po… [SPARK-26538][SQL] Set default precision and scale for elements of postgres numeric array Jan 9, 2019
@a-shkarupin
Copy link
Contributor Author

@a-shkarupin Please add the @dongjoon-hyun 's test in PostgresIntegrationSuite.scala? Also, since Jenkins don't run the test, please check if the test passed in your env?

Added the test.

Ran tests as follows:

./build/mvn install -DskipTests
./build/mvn test -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12

Got following result:

Run completed in 3 minutes, 36 seconds.
Total number of tests run: 21
Suites: completed 5, aborted 0
Tests: succeeded 21, failed 0, canceled 0, ignored 5, pending 0
All tests passed.

nit: Could you clean up the title (plz move …stgres numeric array into the title)?

Cleaned up.

@SparkQA
Copy link

SparkQA commented Jan 9, 2019

Test build #100976 has finished for PR 23456 at commit 77bbcb5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jan 9, 2019

Can you update the PR description (How was this patch tested?) ?

} else {
// SPARK-26538: handle numeric without explicit precision and scale.
Some(DecimalType. SYSTEM_DEFAULT)
}
Copy link
Member

@dongjoon-hyun dongjoon-hyun Jan 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @a-shkarupin . Thank you for your first contribution.
Could you follow the existing succinct style? What I mean is having two case "numeric" | "decimal"s.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated per your suggestion, kept the comment as suggested here.

@a-shkarupin
Copy link
Contributor Author

Can you update the PR description (How was this patch tested?) ?

Updated.

@SparkQA
Copy link

SparkQA commented Jan 10, 2019

Test build #101030 has finished for PR 23456 at commit c72e214.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jan 10, 2019

LGTM

@mgaido91
Copy link
Contributor

LGTM too, thanks.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 12, 2019

Thank you all! Merged to master/branch-2.4/branch-2.3.

dongjoon-hyun added a commit that referenced this pull request Jan 12, 2019
…stgres numeric array

## What changes were proposed in this pull request?

When determining CatalystType for postgres columns with type `numeric[]` set the type of array element to `DecimalType(38, 18)` instead of `DecimalType(0,0)`.

## How was this patch tested?

Tested with modified `org.apache.spark.sql.jdbc.JDBCSuite`.
Ran the `PostgresIntegrationSuite` manually.

Closes #23456 from a-shkarupin/postgres_numeric_array.

Lead-authored-by: Oleksii Shkarupin <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 5b37092)
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun added a commit that referenced this pull request Jan 12, 2019
…stgres numeric array

## What changes were proposed in this pull request?

When determining CatalystType for postgres columns with type `numeric[]` set the type of array element to `DecimalType(38, 18)` instead of `DecimalType(0,0)`.

## How was this patch tested?

Tested with modified `org.apache.spark.sql.jdbc.JDBCSuite`.
Ran the `PostgresIntegrationSuite` manually.

Closes #23456 from a-shkarupin/postgres_numeric_array.

Lead-authored-by: Oleksii Shkarupin <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 5b37092)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

Hi, @a-shkarupin . What is your Apache JIRA id?
I'm trying to add you to Apache Spark contributor group.

@a-shkarupin
Copy link
Contributor Author

Hi @dongjoon-hyun . My Apache JIRA username is alsh. I reported SPARK-26538.
Thanks.

@dongjoon-hyun
Copy link
Member

HI, @a-shkarupin . Yep. alsh is added to Spark contributor group and SPARK-26538 is assigned to you. If you are not in the contributor group, we cannot assign you an issue. Since you are added now, there is no problem in assigning.

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…stgres numeric array

## What changes were proposed in this pull request?

When determining CatalystType for postgres columns with type `numeric[]` set the type of array element to `DecimalType(38, 18)` instead of `DecimalType(0,0)`.

## How was this patch tested?

Tested with modified `org.apache.spark.sql.jdbc.JDBCSuite`.
Ran the `PostgresIntegrationSuite` manually.

Closes apache#23456 from a-shkarupin/postgres_numeric_array.

Lead-authored-by: Oleksii Shkarupin <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
…stgres numeric array

## What changes were proposed in this pull request?

When determining CatalystType for postgres columns with type `numeric[]` set the type of array element to `DecimalType(38, 18)` instead of `DecimalType(0,0)`.

## How was this patch tested?

Tested with modified `org.apache.spark.sql.jdbc.JDBCSuite`.
Ran the `PostgresIntegrationSuite` manually.

Closes apache#23456 from a-shkarupin/postgres_numeric_array.

Lead-authored-by: Oleksii Shkarupin <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 5b37092)
Signed-off-by: Dongjoon Hyun <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
…stgres numeric array

## What changes were proposed in this pull request?

When determining CatalystType for postgres columns with type `numeric[]` set the type of array element to `DecimalType(38, 18)` instead of `DecimalType(0,0)`.

## How was this patch tested?

Tested with modified `org.apache.spark.sql.jdbc.JDBCSuite`.
Ran the `PostgresIntegrationSuite` manually.

Closes apache#23456 from a-shkarupin/postgres_numeric_array.

Lead-authored-by: Oleksii Shkarupin <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 5b37092)
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants