Skip to content

Conversation

@gengliangwang
Copy link
Member

@gengliangwang gengliangwang commented Jun 8, 2021

What changes were proposed in this pull request?

In the PR, I propose to extend Spark SQL API to accept java.time.LocalDateTime as an external type of recently added new Catalyst type - TimestampWithoutTZ. The Java class java.time.LocalDateTime has a similar semantic to ANSI SQL timestamp without timezone type, and it is the most suitable to be an external type for TimestampWithoutTZType. In more details:

  • Added TimestampWithoutTZConverter which converts java.time.LocalDateTime instances to/from internal representation of the Catalyst type TimestampWithoutTZType (to Long type). The TimestampWithoutTZConverter object uses new methods of DateTimeUtils:
    • localDateTimeToMicros() converts the input date time to the total length in microseconds.
    • microsToLocalDateTime() obtains a java.time.LocalDateTime
  • Support new type TimestampWithoutTZType in RowEncoder via the methods createDeserializerForLocalDateTime() and createSerializerForLocalDateTime().
  • Extended the Literal API to construct literals from java.time.LocalDateTime instances.

Why are the changes needed?

To allow users parallelization of java.time.LocalDateTime collections, and construct timestamp without time zone columns. Also to collect such columns back to the driver side.

Does this PR introduce any user-facing change?

The PR extends existing functionality. So, users can parallelize instances of the java.time.LocalDateTime class and collect them back.

scala> val ds = Seq(java.time.LocalDateTime.parse("1970-01-01T00:00:00")).toDS
ds: org.apache.spark.sql.Dataset[java.time.LocalDateTime] = [value: timestampwithouttz]

scala> ds.collect()
res0: Array[java.time.LocalDateTime] = Array(1970-01-01T00:00)

How was this patch tested?

New unit tests

@github-actions github-actions bot added the SQL label Jun 8, 2021
@gengliangwang gengliangwang changed the title [SPARK-35664][SQL] Support java.time. LocalDateTime as an external type of TimestampWithoutTZ type [SPARK-35664][SQL] Support java.time.LocalDateTime as an external type of TimestampWithoutTZ type Jun 8, 2021
@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43970/

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43970/

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Test build #139447 has finished for PR 32814 at commit 1101f55.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43990/

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Test build #139469 has finished for PR 32814 at commit ba44cb1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43992/

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43992/

@MaxGekk
Copy link
Member

MaxGekk commented Jun 8, 2021

@gengliangwang Have you done similar changes as in #31729, correct? Just want to avoid comparing of this PR to similar PRs for intervals.

@gengliangwang
Copy link
Member Author

@MaxGekk Yes I follow #31729

Comment on lines 79 to 80
localDateTime.toEpochSecond(ZoneOffset.UTC) * MICROS_PER_SECOND +
localDateTime.getNano / NANOS_PER_MICROS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially, * and + can overflow. Could you use Math.multiplyExact

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please, add tests to check long overflow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I replaced * by multiplyExact, and got:

java.lang.ArithmeticException: long overflow
	at java.lang.Math.multiplyExact(Math.java:892)
	at org.apache.spark.sql.catalyst.util.DateTimeUtils$.localDateTimeToMicros(DateTimeUtils.scala:79)
	at org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite.$anonfun$new$51(DateTimeUtilsSuite.scala:660)

on your new test:

    assert(DateTimeUtils.localDateTimeToMicros(LocalDateTime.parse("-290308-12-21T19:59:05.224192"))
      == Long.MinValue)

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Test build #139487 has finished for PR 32814 at commit 0418133.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member

MaxGekk commented Jun 8, 2021

@gengliangwang I wonder why did you remove the checks for Long.MaxValue/MinValue:
Screenshot 2021-06-08 at 17 37 07

Does instantToMicros overflow on the values?

@gengliangwang
Copy link
Member Author

gengliangwang commented Jun 8, 2021

Does instantToMicros overflow on the values?

@MaxGekk Yes. The Long.MinValue case contains negative seconds and positive nanoseconds. Thus it overflows when we try to convert it back. We can fix it but it is too corner. The fix makes performance worse, too.

scala> import org.apache.spark.sql.catalyst.util.DateTimeUtils._
import org.apache.spark.sql.catalyst.util.DateTimeUtils._

scala> instantToMicros(microsToInstant(Long.MinValue))
java.lang.ArithmeticException: long overflow
  at java.lang.Math.multiplyExact(Math.java:892)
  at org.apache.spark.sql.catalyst.util.DateTimeUtils$.instantToMicros(DateTimeUtils.scala:389)
  ... 49 elided

scala> microsToInstant(Long.MinValue).getNano
res10: Int = 224192000

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44012/

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44010/

@MaxGekk
Copy link
Member

MaxGekk commented Jun 8, 2021

Yes. The Long.MinValue case contains negative seconds and positive nanoseconds.

@gengliangwang Could you open an JIRA for the issue, so, we will fix this separately from this PR.

@gengliangwang
Copy link
Member Author

@MaxGekk https://issues.apache.org/jira/browse/SPARK-35679

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44012/

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44010/

gengliangwang added a commit that referenced this pull request Jun 9, 2021
…UserDefinedTypeSuite

### What changes were proposed in this pull request?

Refactor LocalDateTimeUDT as YearUDT in UserDefinedTypeSuite

### Why are the changes needed?

As we are going to support java.time.LocalDateTime as an external type of TimestampWithoutTZ type #32814, registering java.time.LocalDateTime as UDT will cause test failures: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139469/testReport/
This PR is to unblock #32814.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #32824 from gengliangwang/UDTFollowUp.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44045/

@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44045/

@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Test build #139538 has finished for PR 32814 at commit a2326e0.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44062/

@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Test build #139520 has finished for PR 32814 at commit 35b78f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

@cloud-fan @MaxGekk Thanks for the review.
Merging to master.

@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44062/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants