-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-30869][SQL] Convert dates to/from timestamps in microseconds precision #27618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #118631 has finished for PR 27618 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
Outdated
Show resolved
Hide resolved
cloud-fan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except some minor comments
|
Test build #118758 has finished for PR 27618 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Show resolved
Hide resolved
|
Test build #118764 has finished for PR 27618 at commit
|
|
This is a refactor PR that should go to master only. But we need to fix the usages of utils like |
Here is the PR #27676 |
…lis-by-micros # Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala # sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
|
Test build #118852 has finished for PR 27618 at commit
|
|
Test build #118854 has finished for PR 27618 at commit
|
| } | ||
| // using milliseconds can cause precision loss with more than 8 digits | ||
| // we follow Hive's implementation which uses seconds | ||
| val secondsInDay1 = MILLISECONDS.toSeconds(millis1 - daysToMillis(date1, zoneId)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we call Math.floorDiv?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highly likely, yes. I will prepare a separate fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error of rounding is invisible in dividing by DAYS.toSeconds(31). At least, I haven't reproduce the issue yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, this is "seconds in day", so it's always positive.
| level match { | ||
| case TRUNC_TO_MICROSECOND => t | ||
| case TRUNC_TO_MILLISECOND => | ||
| millisToMicros(microsToMillis(t)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it faster than t - Math.floorMod(t, MICROS_PER_MILLI)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is almost the same:
- millisToMicros(microsToMillis(t)) ~= multiplyExact + floorDiv
- t - Math.floorMod(t, MICROS_PER_MILLI) = floorDiv + *
I can replaced it by floorMod for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's pick t - Math.floorMod(t, MICROS_PER_MILLI) for consistency.
|
jenkins, retest this, please |
|
Test build #118867 has finished for PR 27618 at commit
|
|
Test build #118896 has finished for PR 27618 at commit
|
|
retest this please |
|
Test build #118913 has finished for PR 27618 at commit
|
|
thanks, merging to master! |
…recision ### What changes were proposed in this pull request? In the PR, I propose to replace: 1. `millisToDays()` by `microsToDays()` which accepts microseconds since the epoch and returns days since the epoch in the specified time zone. The last one is the internal representation of Catalyst's DateType. 2. `daysToMillis()` by `daysToMicros()` which accepts days since the epoch in some time zone and returns the number of microseconds since the epoch. The last one is internal representation of Catalyst's TimestampType. 3. `fromMillis()` by `millisToMicros()` 4. `toMillis()` by `microsToMillis()` ### Why are the changes needed? Spark stores timestamps in microseconds precision, so, there is no actual need to convert dates to milliseconds, and then to microseconds. As examples, look at DateTimeUtils functions `monthsBetween()` and `truncTimestamp()`. ### Does this PR introduce any user-facing change? No ### How was this patch tested? By existing test suites UnivocityParserSuite, DateExpressionsSuite, ComputeCurrentTimeSuite, DateTimeUtilsSuite, DateFunctionsSuite, JsonSuite, StreamSuite. Closes apache#27618 from MaxGekk/replace-millis-by-micros. Authored-by: Maxim Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
In the PR, I propose to replace:
millisToDays()bymicrosToDays()which accepts microseconds since the epoch and returns days since the epoch in the specified time zone. The last one is the internal representation of Catalyst's DateType.daysToMillis()bydaysToMicros()which accepts days since the epoch in some time zone and returns the number of microseconds since the epoch. The last one is internal representation of Catalyst's TimestampType.fromMillis()bymillisToMicros()toMillis()bymicrosToMillis()Why are the changes needed?
Spark stores timestamps in microseconds precision, so, there is no actual need to convert dates to milliseconds, and then to microseconds. As examples, look at DateTimeUtils functions
monthsBetween()andtruncTimestamp().Does this PR introduce any user-facing change?
No
How was this patch tested?
By existing test suites UnivocityParserSuite, DateExpressionsSuite, ComputeCurrentTimeSuite, DateTimeUtilsSuite, DateFunctionsSuite, JsonSuite, StreamSuite.