-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31885][SQL][3.0] Fix filter push down for old millis timestamps to Parquet #28699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…Parquet
Fixed conversions of `java.sql.Timestamp` to milliseconds in `ParquetFilter` by using existing functions from `DateTimeUtils` `fromJavaTimestamp()` and `microsToMillis()`.
The changes fix the bug:
```scala
scala> spark.conf.set("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MILLIS")
scala> spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "CORRECTED")
scala> Seq(java.sql.Timestamp.valueOf("1000-06-14 08:28:53.123")).toDF("ts").write.mode("overwrite").parquet("/Users/maximgekk/tmp/ts_millis_old_filter")
scala> spark.read.parquet("/Users/maximgekk/tmp/ts_millis_old_filter").filter($"ts" === "1000-06-14 08:28:53.123").show(false)
+---+
|ts |
+---+
+---+
```
Yes, after the changes (for the example above):
```scala
scala> spark.read.parquet("/Users/maximgekk/tmp/ts_millis_old_filter").filter($"ts" === "1000-06-14 08:28:53.123").show(false)
+-----------------------+
|ts |
+-----------------------+
|1000-06-14 08:28:53.123|
+-----------------------+
```
Modified tests in `ParquetFilterSuite` to check old timestamps.
Closes apache#28693 from MaxGekk/parquet-ts-millis-filter.
Authored-by: Max Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 9c0dc28)
Signed-off-by: Max Gekk <[email protected]>
| private def timestampToMillis(v: Any): JLong = { | ||
| val timestamp = v.asInstanceOf[Timestamp] | ||
| val micros = DateTimeUtils.fromJavaTimestamp(timestamp) | ||
| val millis = DateTimeUtils.toMillis(micros) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line causes the problem.
|
Test build #123390 has finished for PR 28699 at commit
|
|
jenkins, retest this, please |
|
retest this please |
|
Test build #123397 has finished for PR 28699 at commit
|
|
thanks, merging to 3.0! |
…s to Parquet
### What changes were proposed in this pull request?
Fixed conversions of `java.sql.Timestamp` to milliseconds in `ParquetFilter` by using existing functions from `DateTimeUtils` `fromJavaTimestamp()` and `microsToMillis()`.
### Why are the changes needed?
The changes fix the bug:
```scala
scala> spark.conf.set("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MILLIS")
scala> spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "CORRECTED")
scala> Seq(java.sql.Timestamp.valueOf("1000-06-14 08:28:53.123")).toDF("ts").write.mode("overwrite").parquet("/Users/maximgekk/tmp/ts_millis_old_filter")
scala> spark.read.parquet("/Users/maximgekk/tmp/ts_millis_old_filter").filter($"ts" === "1000-06-14 08:28:53.123").show(false)
+---+
|ts |
+---+
+---+
```
### Does this PR introduce _any_ user-facing change?
Yes, after the changes (for the example above):
```scala
scala> spark.read.parquet("/Users/maximgekk/tmp/ts_millis_old_filter").filter($"ts" === "1000-06-14 08:28:53.123").show(false)
+-----------------------+
|ts |
+-----------------------+
|1000-06-14 08:28:53.123|
+-----------------------+
```
### How was this patch tested?
Modified tests in `ParquetFilterSuite` to check old timestamps.
Closes #28699 from MaxGekk/parquet-ts-millis-filter-3.0.
Authored-by: Max Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
|
Test build #123411 has finished for PR 28699 at commit
|
What changes were proposed in this pull request?
Fixed conversions of
java.sql.Timestampto milliseconds inParquetFilterby using existing functions fromDateTimeUtilsfromJavaTimestamp()andmicrosToMillis().Why are the changes needed?
The changes fix the bug:
Does this PR introduce any user-facing change?
Yes, after the changes (for the example above):
How was this patch tested?
Modified tests in
ParquetFilterSuiteto check old timestamps.