-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34615][SQL] Support java.time.Period as an external type of the year-month interval type
#31765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
java.time.Period as an external type of the year-month interval typejava.time.Period as an external type of the year-month interval type
|
@cloud-fan @yaooqinn @srielau @HyukjinKwon Could you review this PR, please |
| * @return The total number of months in the period, may be negative | ||
| * @throws ArithmeticException If numeric overflow occurs | ||
| */ | ||
| def periodToMonths(period: Period): Int = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we fail if the day field is not 0? Or at least give a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't fail when we convert:
java.sql.Datehas time component with millisecond precision but we ignore it when we convert to days atspark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Line 93 in 56e664c
val julianDays = Math.toIntExact(Math.floorDiv(millisLocal, MILLIS_PER_DAY)) java.sql.Timestampwhich has nanoseconds precision:spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Line 166 in 56e664c
val micros = millisToMicros(t.getTime) + (t.getNanos / NANOS_PER_MICROS) % MICROS_PER_MILLIS java.time.Instantwhich contains nanoseconds, and we don't fail when we convert it to microseconds:spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Line 383 in 56e664c
val result = Math.addExact(us, NANOSECONDS.toMicros(instant.getNano))
To be consistent with current implementation for other types, I do believe we should not fail.
Or at least give a warning?
This will just fill in the logs by useless records, and this is again inconsistent with current implementation.
|
thanks, merging to master! |
|
Late LGTM! |
What changes were proposed in this pull request?
In the PR, I propose to extend Spark SQL API to accept
java.time.Periodas an external type of recently added new Catalyst type -YearMonthIntervalType(see #31614). The Java classjava.time.Periodhas similar semantic to ANSI SQL year-month interval type, and it is the most suitable to be an external type forYearMonthIntervalType. In more details:PeriodConverterwhich convertsjava.time.Periodinstances to/from internal representation of the Catalyst typeYearMonthIntervalType(toInttype). ThePeriodConverterobject uses new methods ofIntervalUtils:periodToMonths()converts the input period to the total length in months. If this period is too large to fitInt, the method throws the exceptionArithmeticException. Note: the input period has "days" precision, the method just ignores the days unit.monthToPeriod()obtains ajava.time.Periodrepresenting a number of months.YearMonthIntervalTypeinRowEncodervia the methodscreateDeserializerForPeriod()andcreateSerializerForJavaPeriod().java.time.Periodinstances.Why are the changes needed?
java.time.Periodcollections, and construct year-month interval columns. Also to collect such columns back to the driver side.Does this PR introduce any user-facing change?
The PR extends existing functionality. So, users can parallelize instances of the
java.time.Durationclass and collect them back:How was this patch tested?
CatalystTypeConvertersSuiteto check conversion from/tojava.time.Period.RowEncoderSuite.YearMonthIntervalTypeare tested inLiteralExpressionSuite.DatasetSuiteandJavaDatasetSuite.IntervalUtilsSuitesto check conversionsjava.time.Period<-> months.