-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31284][SQL][TESTS] Check rebasing of timestamps in ORC datasource #28047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@cloud-fan @HyukjinKwon @dongjoon-hyun Please, review this PR. |
|
So, like SPARK-31238 added test cases for |
@dongjoon-hyun Right. |
|
Test build #120473 has finished for PR 28047 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @MaxGekk and @cloud-fan .
Merged to master/3.0.
### What changes were proposed in this pull request?
In the PR, I propose 2 tests to check that rebasing of timestamps from/to the hybrid calendar (Julian + Gregorian) to/from Proleptic Gregorian calendar works correctly.
1. The test `compatibility with Spark 2.4 in reading timestamps` load ORC file saved by Spark 2.4.5 via:
```shell
$ export TZ="America/Los_Angeles"
```
```scala
scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
scala> val df = Seq("1001-01-01 01:02:03.123456").toDF("tsS").select($"tsS".cast("timestamp").as("ts"))
df: org.apache.spark.sql.DataFrame = [ts: timestamp]
scala> df.write.orc("/Users/maxim/tmp/before_1582/2_4_5_ts_orc")
scala> spark.read.orc("/Users/maxim/tmp/before_1582/2_4_5_ts_orc").show(false)
+--------------------------+
|ts |
+--------------------------+
|1001-01-01 01:02:03.123456|
+--------------------------+
```
2. The test `rebasing timestamps in write` is round trip test. Since the previous test confirms correct rebasing of timestamps in read. This test should pass only if rebasing works correctly in write.
### Why are the changes needed?
To guarantee that rebasing works correctly for timestamps in ORC datasource.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
By running `OrcSourceSuite` for Hive 1.2 and 2.3 via the commands:
```
$ build/sbt -Phive-2.3 "test:testOnly *OrcSourceSuite"
```
and
```
$ build/sbt -Phive-1.2 "test:testOnly *OrcSourceSuite"
```
Closes #28047 from MaxGekk/rebase-ts-orc-test.
Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit fc2a974)
Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request?
In the PR, I propose 2 tests to check that rebasing of timestamps from/to the hybrid calendar (Julian + Gregorian) to/from Proleptic Gregorian calendar works correctly.
1. The test `compatibility with Spark 2.4 in reading timestamps` load ORC file saved by Spark 2.4.5 via:
```shell
$ export TZ="America/Los_Angeles"
```
```scala
scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
scala> val df = Seq("1001-01-01 01:02:03.123456").toDF("tsS").select($"tsS".cast("timestamp").as("ts"))
df: org.apache.spark.sql.DataFrame = [ts: timestamp]
scala> df.write.orc("/Users/maxim/tmp/before_1582/2_4_5_ts_orc")
scala> spark.read.orc("/Users/maxim/tmp/before_1582/2_4_5_ts_orc").show(false)
+--------------------------+
|ts |
+--------------------------+
|1001-01-01 01:02:03.123456|
+--------------------------+
```
2. The test `rebasing timestamps in write` is round trip test. Since the previous test confirms correct rebasing of timestamps in read. This test should pass only if rebasing works correctly in write.
### Why are the changes needed?
To guarantee that rebasing works correctly for timestamps in ORC datasource.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
By running `OrcSourceSuite` for Hive 1.2 and 2.3 via the commands:
```
$ build/sbt -Phive-2.3 "test:testOnly *OrcSourceSuite"
```
and
```
$ build/sbt -Phive-1.2 "test:testOnly *OrcSourceSuite"
```
Closes apache#28047 from MaxGekk/rebase-ts-orc-test.
Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
In the PR, I propose 2 tests to check that rebasing of timestamps from/to the hybrid calendar (Julian + Gregorian) to/from Proleptic Gregorian calendar works correctly.
compatibility with Spark 2.4 in reading timestampsload ORC file saved by Spark 2.4.5 via:rebasing timestamps in writeis round trip test. Since the previous test confirms correct rebasing of timestamps in read. This test should pass only if rebasing works correctly in write.Why are the changes needed?
To guarantee that rebasing works correctly for timestamps in ORC datasource.
Does this PR introduce any user-facing change?
No
How was this patch tested?
By running
OrcSourceSuitefor Hive 1.2 and 2.3 via the commands:and