-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34605][SQL] Support java.time.Duration as an external type of the day-time interval type
#31729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #135721 has finished for PR 31729 at commit
|
java.time.Duration as an external type of the day-time interval typejava.time.Duration as an external type of the day-time interval type
|
@cloud-fan @HyukjinKwon @dongjoon-hyun @viirya @maropu @yaooqinn Could you review this PR, please. |
|
I believe that |
@yaooqinn I think of java.time.Period (https://docs.oracle.com/javase/8/docs/api/java/time/Period.html) |
Does it mean that we only use the period with year and month filed and fail for the day part? BTW, what is the behavior of Dataframe.show(), will it be something like |
Yes, we will use only year and month fields.
Let's discuss this in https://issues.apache.org/jira/browse/SPARK-34615
No-no. It will be in the format that conforms to the ANSI SQL standard like: BTW, the format |
| Seq( | ||
| "PT20.123456S", | ||
| "P2DT3H4M", | ||
| "P2D").foreach { intervalStr => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think many people are not familiar with this format. Can we add comments or use a different way to create Duration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cam we add some negative cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan I created durations using another Duration methods. Also added negative tests.
| } | ||
|
|
||
| test("SPARK-34605: construct literals from arrays of java.time.Duration") { | ||
| val duration0 = Duration.parse("P2DT3H4M") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
It's better we update the |
@yaooqinn Let's update docs separately. I opened the sub-task SPARK-34619 for that. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #135752 has finished for PR 31729 at commit
|
|
thanks, merging to master! |
What changes were proposed in this pull request?
In the PR, I propose to extend Spark SQL API to accept
java.time.Durationas an external type of recently added new Catalyst type -DayTimeIntervalType(see #31614). The Java classjava.time.Durationhas similar semantic to ANSI SQL day-time interval type, and it is the most suitable to be an external type forDayTimeIntervalType. In more details:DurationConverterwhich convertsjava.time.Durationinstances to/from internal representation of the Catalyst typeDayTimeIntervalType(toLongtype). TheDurationConverterobject uses new methods ofIntervalUtils:durationToMicros()converts the input duration to the total length in microseconds. If this duration is too large to fitLong, the method throws the exceptionArithmeticException. Note: the input duration has nanosecond precision, the method casts the nanos part to microseconds by dividing by 1000.microsToDuration()obtains ajava.time.Durationrepresenting a number of microseconds.DayTimeIntervalTypeinRowEncodervia the methodscreateDeserializerForDuration()andcreateSerializerForJavaDuration().java.time.Durationinstances.Why are the changes needed?
java.time.Durationcollections, and construct day-time interval columns. Also to collect such columns back to the driver side.Does this PR introduce any user-facing change?
The PR extends existing functionality. So, users can parallelize instances of the
java.time.Durationclass and collect them back:How was this patch tested?
CatalystTypeConvertersSuiteto check conversion from/tojava.time.Duration.RowEncoderSuite.DayTimeIntervalTypeare tested inLiteralExpressionSuiteDatasetSuiteandJavaDatasetSuite.