-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-39731][SQL] Fix issue in CSV and JSON data sources when parsing dates in "yyyyMMdd" format with CORRECTED time parser policy #37147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 1 commit
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
1193ce7
fix issue
sadikovi b714b7f
update code
sadikovi 8a10a68
fix issue in json
sadikovi 9b65761
update code
sadikovi 45011a0
add a config option to control legacy behavior
sadikovi 40d07bd
add a config for json
sadikovi 55c5579
update docs and comments
sadikovi ef91606
fix scalastyle
sadikovi 15c07f7
trigger ci
sadikovi a83288b
update comments
sadikovi bf9351d
trigger ci
sadikovi ac63b63
Merge remote-tracking branch 'upstream/master' into fix-csv-date-infe…
sadikovi a447b08
fix tests for SPARK-39469
sadikovi 8feb707
Merge remote-tracking branch 'upstream/master' into fix-csv-date-infe…
sadikovi 8a01ece
Merge remote-tracking branch 'upstream/master' into fix-csv-date-infe…
sadikovi fbdf9d8
update comments
sadikovi 2962cd9
Revert "fix tests for SPARK-39469"
sadikovi 10ca4a4
update the priority order for SPARK-39469
sadikovi b2a3db2
trigger ci
sadikovi 739e7db
Merge remote-tracking branch 'upstream/master' into fix-csv-date-infe…
sadikovi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
update code
- Loading branch information
commit b714b7fdf29ff77920e2b10acad806548477bca9
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this technically a breaking change for users who could previously specify an invalid pattern without LEGACY mode?
Before -- ignore the invalid pattern and parse with
DateTimeUtils.stringToTimestampNow -- it throws an error
We don't support invalid patterns but as a user I would be unhappy to see my code break. I'm unsure if this is actually considered a breaking change because this is such an edge case and the user is already doing something invalid. I'm curious to hear your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. It would be a breaking change for users if they were relying on the compatibility fallback.
There could an alternative fix, maybe we can look into updating
DateTimeUtils.stringToDatebut I am not sure.I can also add a feature flag to control this behaviour in JSON and CSV connectors so users can always opt in to use legacy behaviour. For example, I can a data source option "useLegacyParsing" or something similar. The option could be disabled by default, the exception would contain a message saying that you can enable the option to maintain the previous behaviour. Maybe this could be a good solution.
Let me know if something like that could work, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should work. It feels weird that users have to opt-in to the correct behavior but hopefully this is a small percentage of users. Maybe @kamcheungting-db or @cloud-fan can weigh in.
I personally wouldn't be confident updating
DateTimeUtils.stringToDatebecause there are so many usages elsewhere. But if you are familiar with the other use cases ofDateTimeUtils.stringToDatethen this could work.I'll loop back if I think of an alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the safest option is to copy-paste the old code of
stringToDatebefore #32959 and use it here, but that's really ugly and hard to maintain.I'd like to understand more about the invalid pattern behavior. Will we trigger the fallback for every input row? That sounds like a big perf problem...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the invalid pattern and before this PR, yes, the fallback code would be triggered on every pattern mismatch. With the change, we will just throw an exception parsing those values as nulls. Yes, it does sound like a performance issue but it has been there for some time.
I agree with copy-paste of stringToDate, I proposed to add a data source config to keep the old behaviour. What do you think?