Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Feb 18, 2020

What changes were proposed in this pull request?

In the PR, I propose to replace:

  1. millisToDays() by microsToDays() which accepts microseconds since the epoch and returns days since the epoch in the specified time zone. The last one is the internal representation of Catalyst's DateType.
  2. daysToMillis() by daysToMicros() which accepts days since the epoch in some time zone and returns the number of microseconds since the epoch. The last one is internal representation of Catalyst's TimestampType.
  3. fromMillis() by millisToMicros()
  4. toMillis() by microsToMillis()

Why are the changes needed?

Spark stores timestamps in microseconds precision, so, there is no actual need to convert dates to milliseconds, and then to microseconds. As examples, look at DateTimeUtils functions monthsBetween() and truncTimestamp().

Does this PR introduce any user-facing change?

No

How was this patch tested?

By existing test suites UnivocityParserSuite, DateExpressionsSuite, ComputeCurrentTimeSuite, DateTimeUtilsSuite, DateFunctionsSuite, JsonSuite, StreamSuite.

@SparkQA
Copy link

SparkQA commented Feb 18, 2020

Test build #118631 has finished for PR 27618 at commit 7d812a3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk MaxGekk changed the title [WIP][SQL] Convert dates to/from timestamps in microseconds precision [SPARK-30869][SQL] Convert dates to/from timestamps in microseconds precision Feb 18, 2020
Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except some minor comments

@SparkQA
Copy link

SparkQA commented Feb 21, 2020

Test build #118758 has finished for PR 27618 at commit f78d773.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 21, 2020

Test build #118764 has finished for PR 27618 at commit 29caa4e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

This is a refactor PR that should go to master only. But we need to fix the usages of utils like MILLISECONDS.toMicros to make them overflow-safe. Since the fix should go to 3.0 as well, shall we merge bug fix PRs before the refactor ones? This can avoid code conflict when working with bug fix PRs.

@MaxGekk
Copy link
Member Author

MaxGekk commented Feb 23, 2020

Since the fix should go to 3.0 as well, shall we merge bug fix PRs before the refactor ones? This can avoid code conflict when working with bug fix PRs.

Here is the PR #27676

…lis-by-micros

# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
#	sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
@SparkQA
Copy link

SparkQA commented Feb 24, 2020

Test build #118852 has finished for PR 27618 at commit 9a4582e.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 24, 2020

Test build #118854 has finished for PR 27618 at commit 15bedb3.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
// using milliseconds can cause precision loss with more than 8 digits
// we follow Hive's implementation which uses seconds
val secondsInDay1 = MILLISECONDS.toSeconds(millis1 - daysToMillis(date1, zoneId))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we call Math.floorDiv?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highly likely, yes. I will prepare a separate fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error of rounding is invisible in dividing by DAYS.toSeconds(31). At least, I haven't reproduce the issue yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, this is "seconds in day", so it's always positive.

level match {
case TRUNC_TO_MICROSECOND => t
case TRUNC_TO_MILLISECOND =>
millisToMicros(microsToMillis(t))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it faster than t - Math.floorMod(t, MICROS_PER_MILLI)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is almost the same:

  • millisToMicros(microsToMillis(t)) ~= multiplyExact + floorDiv
  • t - Math.floorMod(t, MICROS_PER_MILLI) = floorDiv + *
    I can replaced it by floorMod for consistency

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's pick t - Math.floorMod(t, MICROS_PER_MILLI) for consistency.

@MaxGekk
Copy link
Member Author

MaxGekk commented Feb 24, 2020

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Feb 24, 2020

Test build #118867 has finished for PR 27618 at commit 15bedb3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 25, 2020

Test build #118896 has finished for PR 27618 at commit 17d9566.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Feb 25, 2020

Test build #118913 has finished for PR 27618 at commit 17d9566.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in ffc0935 Feb 25, 2020
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…recision

### What changes were proposed in this pull request?
In the PR, I propose to replace:

1. `millisToDays()` by `microsToDays()` which accepts microseconds since the epoch and returns days since the epoch in the specified time zone. The last one is the internal representation of Catalyst's DateType.
2. `daysToMillis()` by `daysToMicros()` which accepts days since the epoch in some time zone and returns the number of microseconds since the epoch. The last one is internal representation of Catalyst's TimestampType.
3. `fromMillis()` by `millisToMicros()`
4. `toMillis()` by `microsToMillis()`

### Why are the changes needed?
Spark stores timestamps in microseconds precision, so, there is no actual need to convert dates to milliseconds, and then to microseconds. As examples, look at DateTimeUtils functions `monthsBetween()` and `truncTimestamp()`.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
By existing test suites UnivocityParserSuite, DateExpressionsSuite, ComputeCurrentTimeSuite, DateTimeUtilsSuite, DateFunctionsSuite, JsonSuite, StreamSuite.

Closes apache#27618 from MaxGekk/replace-millis-by-micros.

Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@MaxGekk MaxGekk deleted the replace-millis-by-micros branch June 5, 2020 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants