-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26243][SQL] Use java.time API for parsing timestamps and dates from JSON #23196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
+422
−335
Closed
Changes from 1 commit
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
fb10b91
Adding DateTimeFormatter
MaxGekk a9b39ec
Support DateTimeFormatter by JacksonParser and JacksonGenerator
MaxGekk ff589f5
Make test independent from current time zone
MaxGekk 4646ded
Fix a test by new fallback
MaxGekk 1c838e0
Set time zone explicitly
MaxGekk 142f301
Updating the migration guide
MaxGekk 606da21
Fix the migration guide by replacing CSV by JSON
MaxGekk f326042
Inlining method's arguments
MaxGekk 4120228
A test for roundtrip timestamp parsing
MaxGekk 6689747
Merge remote-tracking branch 'origin/master' into json-time-parser
MaxGekk e575162
Set time zone to GMT to eliminate of situation when time zone offset …
MaxGekk a35d5bf
UTC -> GMT
MaxGekk 2a2085d
Using floorDiv to take days from seconds
MaxGekk 55f2eac
Removing unnecessary time zone settings
MaxGekk 57600e2
Merge remote-tracking branch 'origin/master' into json-time-parser
MaxGekk 07fcf46
Using legacy parser in HiveCompatibilitySuite
MaxGekk 6b6ea8a
Enable new parser in HiveCompatibilitySuit
MaxGekk 244654b
Remove saving legacy parser settings
MaxGekk 015fdce
Updating migration guide
MaxGekk 96529f5
Making date parser independent from time zones
MaxGekk 07d6031
Test refactoring
MaxGekk d761dee
protected is added
MaxGekk 24b1e3d
toInstant -> toInstantWithZoneId
MaxGekk 9a11515
Set time zone in the test
MaxGekk 4b01d05
GMT -> UTC
MaxGekk 0c7b96b
DateTimeFormatter -> TimestampFormatter
MaxGekk bbaff09
timeParser -> timestampParser
MaxGekk 8af9df9
Round trip tests
MaxGekk 363482e
Renaming test suite
MaxGekk 07e0bf8
Added withClue
MaxGekk c12da1f
Put test under legacy time parser
MaxGekk 60ab5b1
TODO
MaxGekk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next
Next commit
Adding DateTimeFormatter
- Loading branch information
commit fb10b91502b67b98f2904a06b017a6e56dd6e39f
There are no files selected for viewing
173 changes: 173 additions & 0 deletions
173
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatter.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,173 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||
| * contributor license agreements. See the NOTICE file distributed with | ||
| * this work for additional information regarding copyright ownership. | ||
| * The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| * (the "License"); you may not use this file except in compliance with | ||
| * the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.spark.sql.catalyst.util | ||
|
|
||
| import java.time._ | ||
| import java.time.format.DateTimeFormatterBuilder | ||
| import java.time.temporal.{ChronoField, TemporalQueries} | ||
| import java.util.{Locale, TimeZone} | ||
|
|
||
| import scala.util.Try | ||
|
|
||
| import org.apache.commons.lang3.time.FastDateFormat | ||
|
|
||
| import org.apache.spark.sql.internal.SQLConf | ||
|
|
||
| sealed trait DateTimeFormatter { | ||
| def parse(s: String): Long // returns microseconds since epoch | ||
| def format(us: Long): String | ||
| } | ||
|
|
||
| class Iso8601DateTimeFormatter( | ||
| pattern: String, | ||
| timeZone: TimeZone, | ||
| locale: Locale) extends DateTimeFormatter { | ||
| val formatter = new DateTimeFormatterBuilder() | ||
| .appendPattern(pattern) | ||
| .parseDefaulting(ChronoField.YEAR_OF_ERA, 1970) | ||
| .parseDefaulting(ChronoField.MONTH_OF_YEAR, 1) | ||
| .parseDefaulting(ChronoField.DAY_OF_MONTH, 1) | ||
| .parseDefaulting(ChronoField.HOUR_OF_DAY, 0) | ||
| .parseDefaulting(ChronoField.MINUTE_OF_HOUR, 0) | ||
| .parseDefaulting(ChronoField.SECOND_OF_MINUTE, 0) | ||
| .toFormatter(locale) | ||
|
|
||
| def toInstant(s: String): Instant = { | ||
| val temporalAccessor = formatter.parse(s) | ||
| if (temporalAccessor.query(TemporalQueries.offset()) == null) { | ||
| val localDateTime = LocalDateTime.from(temporalAccessor) | ||
| val zonedDateTime = ZonedDateTime.of(localDateTime, timeZone.toZoneId) | ||
| Instant.from(zonedDateTime) | ||
| } else { | ||
| Instant.from(temporalAccessor) | ||
| } | ||
| } | ||
|
|
||
| private def instantToMicros(instant: Instant, secMul: Long, nanoDiv: Long): Long = { | ||
| val sec = Math.multiplyExact(instant.getEpochSecond, secMul) | ||
| val result = Math.addExact(sec, instant.getNano / nanoDiv) | ||
| result | ||
| } | ||
|
|
||
| def parse(s: String): Long = { | ||
| instantToMicros(toInstant(s), DateTimeUtils.MICROS_PER_SECOND, DateTimeUtils.NANOS_PER_MICROS) | ||
| } | ||
|
|
||
| def format(us: Long): String = { | ||
| val secs = Math.floorDiv(us, DateTimeUtils.MICROS_PER_SECOND) | ||
| val mos = Math.floorMod(us, DateTimeUtils.MICROS_PER_SECOND) | ||
| val instant = Instant.ofEpochSecond(secs, mos * DateTimeUtils.NANOS_PER_MICROS) | ||
|
|
||
| formatter.withZone(timeZone.toZoneId).format(instant) | ||
| } | ||
| } | ||
|
|
||
| class LegacyDateTimeFormatter( | ||
| pattern: String, | ||
| timeZone: TimeZone, | ||
| locale: Locale) extends DateTimeFormatter { | ||
| val format = FastDateFormat.getInstance(pattern, timeZone, locale) | ||
|
|
||
| protected def toMillis(s: String): Long = format.parse(s).getTime | ||
|
|
||
| def parse(s: String): Long = toMillis(s) * DateTimeUtils.MICROS_PER_MILLIS | ||
|
|
||
| def format(us: Long): String = { | ||
| format.format(DateTimeUtils.toJavaTimestamp(us)) | ||
| } | ||
| } | ||
|
|
||
| class LegacyFallbackDateTimeFormatter( | ||
| pattern: String, | ||
| timeZone: TimeZone, | ||
| locale: Locale) extends LegacyDateTimeFormatter(pattern, timeZone, locale) { | ||
| override def toMillis(s: String): Long = { | ||
| Try {super.toMillis(s)}.getOrElse(DateTimeUtils.stringToTime(s).getTime) | ||
| } | ||
| } | ||
|
|
||
| object DateTimeFormatter { | ||
| def apply(format: String, timeZone: TimeZone, locale: Locale): DateTimeFormatter = { | ||
| if (SQLConf.get.legacyTimeParserEnabled) { | ||
| new LegacyFallbackDateTimeFormatter(format, timeZone, locale) | ||
| } else { | ||
| new Iso8601DateTimeFormatter(format, timeZone, locale) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| sealed trait DateFormatter { | ||
| def parse(s: String): Int // returns days since epoch | ||
| def format(days: Int): String | ||
| } | ||
|
|
||
| class Iso8601DateFormatter( | ||
| pattern: String, | ||
| timeZone: TimeZone, | ||
| locale: Locale) extends DateFormatter { | ||
|
|
||
| val dateTimeFormatter = new Iso8601DateTimeFormatter(pattern, timeZone, locale) | ||
|
|
||
| override def parse(s: String): Int = { | ||
| val seconds = dateTimeFormatter.toInstant(s).getEpochSecond | ||
| (seconds / DateTimeUtils.SECONDS_PER_DAY).toInt | ||
| } | ||
|
|
||
| override def format(days: Int): String = { | ||
| val instant = Instant.ofEpochSecond(days * DateTimeUtils.SECONDS_PER_DAY) | ||
| dateTimeFormatter.formatter.withZone(timeZone.toZoneId).format(instant) | ||
| } | ||
| } | ||
|
|
||
| class LegacyDateFormatter( | ||
| pattern: String, | ||
| timeZone: TimeZone, | ||
| locale: Locale) extends DateFormatter { | ||
| val format = FastDateFormat.getInstance(pattern, timeZone, locale) | ||
|
|
||
| def parse(s: String): Int = { | ||
| val milliseconds = format.parse(s).getTime | ||
| DateTimeUtils.millisToDays(milliseconds) | ||
| } | ||
|
|
||
| def format(days: Int): String = { | ||
| val date = DateTimeUtils.toJavaDate(days) | ||
| format.format(date) | ||
| } | ||
| } | ||
|
|
||
| class LegacyFallbackDateFormatter( | ||
| pattern: String, | ||
| timeZone: TimeZone, | ||
| locale: Locale) extends LegacyDateFormatter(pattern, timeZone, locale) { | ||
| override def parse(s: String): Int = { | ||
| Try(super.parse(s)).getOrElse { | ||
| DateTimeUtils.millisToDays(DateTimeUtils.stringToTime(s).getTime) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| object DateFormatter { | ||
| def apply(format: String, timeZone: TimeZone, locale: Locale): DateFormatter = { | ||
| if (SQLConf.get.legacyTimeParserEnabled) { | ||
| new LegacyFallbackDateFormatter(format, timeZone, locale) | ||
| } else { | ||
| new Iso8601DateFormatter(format, timeZone, locale) | ||
| } | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I'm not very familiar with this API. what does this condition mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zone offset is unknown after parsing. For example, if you parse
13-12-2018 09:55:00, it is unclear in which timezone it is.