-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31221][SQL] Rebase any date-times in conversions to/from Java types #27980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #120166 has finished for PR 27980 at commit
|
| Timestamp.valueOf("2018-10-28 03:00:00"), | ||
| Timestamp.valueOf("2018-10-28 03:30:00"))) | ||
| ts("2018-10-28 01:30:00"), | ||
| ts("2018-10-28 02:00:00", noDST = true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to change the test because of the 2 lines. I wasn't able to build 2 timestamps via Timestamp.valueOf after rebasing. Timestamp class does normalization underneath, and replaces milliseconds since the epoch, and as a consequence of that checking of the results fails. Even textual representation, and year, month, ... nanos are the same.
| // 2016-11-06 01:00:00 PST could be 2016-11-06 08:00:00 UTC or 2016-11-06 09:00:00 UTC | ||
| test("2016-11-06 01:00:00", "PST", "2016-11-06 09:00:00.0") | ||
| test("2016-11-06 01:59:59", "PST", "2016-11-06 09:59:59.0") | ||
| test("2016-11-06 01:00:00", "PST", "2016-11-06 08:00:00.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comment above. 08 or 09 depends on the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my only concern.
From what I know, Pacific Standard Time (PST) is UTC-8:00, and Pacific Daylight Time (PDT) is UTC-7:00. So 2016-11-06 09:00:00 UTC is the only corrected answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, I have found the reason this behavior. While resolving PST to zone offsets, this short name is converted to America/Los_Angeles by the map:
https://github.com/frohoff/jdk8u-jdk/blob/master/src/share/classes/java/time/ZoneId.java#L248
And as a consequence, PST is not a constant offset:
val pst = getZoneId("PST")
println(pst.getRules.getOffset(LocalDateTime.of(2016, 11, 5, 23, 59, 59)))
println(pst.getRules.getOffset(LocalDateTime.of(2016, 11, 6, 0, 0, 0)))
println(pst.getRules.getOffset(LocalDateTime.of(2016, 11, 6, 1, 0, 0)))
println(pst.getRules.getOffset(LocalDateTime.of(2016, 11, 6, 2, 0, 0)))
println(pst.getRules.getOffset(LocalDateTime.of(2016, 11, 6, 3, 0, 0)))-07:00
-07:00
-07:00
-08:00
-08:00
At the test local timestamp 2016-11-06 01:00:00, the offset is -07:00.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to work around this JDK bug in the test? e.g. not use PST
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan I got 2016-11-06 09:00:00.0 in the test by modifying the map SHORT_IDS:
def getZoneId(timeZoneId: String): ZoneId = {
import scala.collection.JavaConverters._
val m = Map("PST" -> "-08:00")
ZoneId.of(timeZoneId, m.asJava)
}
def getDefaultZoneId(): ZoneId = {
getZoneId(defaultTimeZone().getID)
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following works:
withDefaultTimeZone(TimeZone.getTimeZone("-08:00")) {
test("2016-11-06 01:00:00", "-08:00", "2016-11-06 09:00:00.0")
test("2016-11-06 01:59:59", "-08:00", "2016-11-06 09:59:59.0")
}@cloud-fan Should I replace PST by -08:00 in the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| def checkFromToJavaDate(d1: Date): Unit = { | ||
| val d2 = toJavaDate(fromJavaDate(d1)) | ||
| assert(format(d2) === format(d1)) | ||
| assert(d2.toString === d1.toString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I restored the code of Spark 2.4: https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala#L88
|
Test build #120169 has finished for PR 27980 at commit
|
|
Test build #120170 has finished for PR 27980 at commit
|
|
@cloud-fan @HyukjinKwon Please, review this PR. |
|
Test build #120197 has finished for PR 27980 at commit
|
|
Test build #120251 has finished for PR 27980 at commit
|
|
The last commit updated a scala test, which is not related to the pyspark tests. The pip is a known issue. Thanks, merging to master/3.0! |
…types In the PR, I propose to apply rebasing for all dates/timestamps in conversion functions `fromJavaDate()`, `toJavaDate()`, `toJavaTimestamp()` and `fromJavaTimestamp()`. The rebasing is performed via building a local date-time in an original calendar, extracting date-time fields from the result, and creating new local date-time in the target calendar. The changes are need to be compatible with previous Spark version (2.4.5 and earlier versions) not only before the Gregorian cutover date `1582-10-15` but also for dates after the date. For instance, Gregorian calendar implementation in Java 7 `java.util.GregorianCalendar` is not accurate in resolving time zone offsets as Gregorian calendar introduced since Java 8. Yes, this PR can introduce behavior changes for dates after `1582-10-15`, in particular conversions of zone ids to zone offsets will be much more accurate. By existing test suites `DateTimeUtilsSuite`, `DateFunctionsSuite`, `DateExpressionsSuite`, `CollectionExpressionsSuite`, `HiveOrcHadoopFsRelationSuite`, `ParquetIOSuite`. Closes #27980 from MaxGekk/reuse-rebase-funcs-in-java-funcs. Authored-by: Maxim Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 1fd4607) Signed-off-by: Wenchen Fan <[email protected]>
…types ### What changes were proposed in this pull request? In the PR, I propose to apply rebasing for all dates/timestamps in conversion functions `fromJavaDate()`, `toJavaDate()`, `toJavaTimestamp()` and `fromJavaTimestamp()`. The rebasing is performed via building a local date-time in an original calendar, extracting date-time fields from the result, and creating new local date-time in the target calendar. ### Why are the changes needed? The changes are need to be compatible with previous Spark version (2.4.5 and earlier versions) not only before the Gregorian cutover date `1582-10-15` but also for dates after the date. For instance, Gregorian calendar implementation in Java 7 `java.util.GregorianCalendar` is not accurate in resolving time zone offsets as Gregorian calendar introduced since Java 8. ### Does this PR introduce any user-facing change? Yes, this PR can introduce behavior changes for dates after `1582-10-15`, in particular conversions of zone ids to zone offsets will be much more accurate. ### How was this patch tested? By existing test suites `DateTimeUtilsSuite`, `DateFunctionsSuite`, `DateExpressionsSuite`, `CollectionExpressionsSuite`, `HiveOrcHadoopFsRelationSuite`, `ParquetIOSuite`. Closes apache#27980 from MaxGekk/reuse-rebase-funcs-in-java-funcs. Authored-by: Maxim Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
In the PR, I propose to apply rebasing for all dates/timestamps in conversion functions
fromJavaDate(),toJavaDate(),toJavaTimestamp()andfromJavaTimestamp(). The rebasing is performed via building a local date-time in an original calendar, extracting date-time fields from the result, and creating new local date-time in the target calendar.Why are the changes needed?
The changes are need to be compatible with previous Spark version (2.4.5 and earlier versions) not only before the Gregorian cutover date
1582-10-15but also for dates after the date. For instance, Gregorian calendar implementation in Java 7java.util.GregorianCalendaris not accurate in resolving time zone offsets as Gregorian calendar introduced since Java 8.Does this PR introduce any user-facing change?
Yes, this PR can introduce behavior changes for dates after
1582-10-15, in particular conversions of zone ids to zone offsets will be much more accurate.How was this patch tested?
By existing test suites
DateTimeUtilsSuite,DateFunctionsSuite,DateExpressionsSuite,CollectionExpressionsSuite,HiveOrcHadoopFsRelationSuite,ParquetIOSuite.