-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26243][SQL] Use java.time API for parsing timestamps and dates from JSON #23196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 1 commit
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
fb10b91
Adding DateTimeFormatter
MaxGekk a9b39ec
Support DateTimeFormatter by JacksonParser and JacksonGenerator
MaxGekk ff589f5
Make test independent from current time zone
MaxGekk 4646ded
Fix a test by new fallback
MaxGekk 1c838e0
Set time zone explicitly
MaxGekk 142f301
Updating the migration guide
MaxGekk 606da21
Fix the migration guide by replacing CSV by JSON
MaxGekk f326042
Inlining method's arguments
MaxGekk 4120228
A test for roundtrip timestamp parsing
MaxGekk 6689747
Merge remote-tracking branch 'origin/master' into json-time-parser
MaxGekk e575162
Set time zone to GMT to eliminate of situation when time zone offset …
MaxGekk a35d5bf
UTC -> GMT
MaxGekk 2a2085d
Using floorDiv to take days from seconds
MaxGekk 55f2eac
Removing unnecessary time zone settings
MaxGekk 57600e2
Merge remote-tracking branch 'origin/master' into json-time-parser
MaxGekk 07fcf46
Using legacy parser in HiveCompatibilitySuite
MaxGekk 6b6ea8a
Enable new parser in HiveCompatibilitySuit
MaxGekk 244654b
Remove saving legacy parser settings
MaxGekk 015fdce
Updating migration guide
MaxGekk 96529f5
Making date parser independent from time zones
MaxGekk 07d6031
Test refactoring
MaxGekk d761dee
protected is added
MaxGekk 24b1e3d
toInstant -> toInstantWithZoneId
MaxGekk 9a11515
Set time zone in the test
MaxGekk 4b01d05
GMT -> UTC
MaxGekk 0c7b96b
DateTimeFormatter -> TimestampFormatter
MaxGekk bbaff09
timeParser -> timestampParser
MaxGekk 8af9df9
Round trip tests
MaxGekk 363482e
Renaming test suite
MaxGekk 07e0bf8
Added withClue
MaxGekk c12da1f
Put test under legacy time parser
MaxGekk 60ab5b1
TODO
MaxGekk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
GMT -> UTC
- Loading branch information
commit 4b01d05e306906f20372f1b3a7c987a3f5ce1c89
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little worried here. This test is a round-trip test, do you mean if we write out a date/timestamp to json and read it back, the values will be different if session timezone is not UTC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be same, if the session local timezone doesn't change between write and read back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not only JSON parser/formatter involved in the loop but also converting milliseconds to Java's
Timestampand to something else.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these don't matter once the dataframe is created.
The problem is, if we have a dataframe(no matter how it is generated), we write it out and read it back. If it becomes different, we have a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me look at it deeper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove the timezone setting here? Then we can look at jenkens report and see which seed can reproduce the bug and debug it locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran it locally many times. It is almost 100% reproducible for any seed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about to put the test under the flag
spark.sql.legacy.timeParser.enabledand create a separate JIRA ticket? I would believe the bug somewhere in Spark's home made date/time functions rather than Java 8 implementation of timestamps parsing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. Can you create the ticket? And put a TODO here which refers to the ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the ticket: https://issues.apache.org/jira/browse/SPARK-26374 and I added TODO