Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jun 1, 2020

What changes were proposed in this pull request?

Fixed conversions of java.sql.Timestamp to milliseconds in ParquetFilter by using existing functions from DateTimeUtils fromJavaTimestamp() and microsToMillis().

Why are the changes needed?

The changes fix the bug:

scala> spark.conf.set("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MILLIS")
scala> spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "CORRECTED")
scala> Seq(java.sql.Timestamp.valueOf("1000-06-14 08:28:53.123")).toDF("ts").write.mode("overwrite").parquet("/Users/maximgekk/tmp/ts_millis_old_filter")
scala> spark.read.parquet("/Users/maximgekk/tmp/ts_millis_old_filter").filter($"ts" === "1000-06-14 08:28:53.123").show(false)
+---+
|ts |
+---+
+---+

Does this PR introduce any user-facing change?

Yes, after the changes (for the example above):

scala> spark.read.parquet("/Users/maximgekk/tmp/ts_millis_old_filter").filter($"ts" === "1000-06-14 08:28:53.123").show(false)
+-----------------------+
|ts                     |
+-----------------------+
|1000-06-14 08:28:53.123|
+-----------------------+

How was this patch tested?

Modified tests in ParquetFilterSuite to check old timestamps.

@MaxGekk MaxGekk changed the title [WIP][SQL] Fix filter push down for old millis timestamps to Parquet [SPARK-31885][SQL] Fix filter push down for old millis timestamps to Parquet Jun 1, 2020
@MaxGekk
Copy link
Member Author

MaxGekk commented Jun 1, 2020

@cloud-fan @HyukjinKwon Please, review this bug fix.

@SparkQA
Copy link

SparkQA commented Jun 1, 2020

Test build #123371 has finished for PR 28693 at commit c041ff7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in 9c0dc28 Jun 1, 2020
cloud-fan pushed a commit that referenced this pull request Jun 1, 2020
…Parquet

### What changes were proposed in this pull request?
Fixed conversions of `java.sql.Timestamp` to milliseconds in `ParquetFilter` by using existing functions from `DateTimeUtils` `fromJavaTimestamp()` and `microsToMillis()`.

### Why are the changes needed?
The changes fix the bug:
```scala
scala> spark.conf.set("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MILLIS")
scala> spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "CORRECTED")
scala> Seq(java.sql.Timestamp.valueOf("1000-06-14 08:28:53.123")).toDF("ts").write.mode("overwrite").parquet("/Users/maximgekk/tmp/ts_millis_old_filter")
scala> spark.read.parquet("/Users/maximgekk/tmp/ts_millis_old_filter").filter($"ts" === "1000-06-14 08:28:53.123").show(false)
+---+
|ts |
+---+
+---+
```

### Does this PR introduce _any_ user-facing change?
Yes, after the changes (for the example above):
```scala
scala> spark.read.parquet("/Users/maximgekk/tmp/ts_millis_old_filter").filter($"ts" === "1000-06-14 08:28:53.123").show(false)
+-----------------------+
|ts                     |
+-----------------------+
|1000-06-14 08:28:53.123|
+-----------------------+
```

### How was this patch tested?
Modified tests in `ParquetFilterSuite` to check old timestamps.

Closes #28693 from MaxGekk/parquet-ts-millis-filter.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 9c0dc28)
Signed-off-by: Wenchen Fan <[email protected]>
@jiangxb1987
Copy link
Contributor

This breaks branch-3.0, I've reverted the commit.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jun 1, 2020

@jiangxb1987 Thanks. I guess this #27618 causes the trouble again. I will open a separate PR for 3.0

@jiangxb1987
Copy link
Contributor

Thanks @MaxGekk !

MaxGekk added a commit to MaxGekk/spark that referenced this pull request Jun 1, 2020
…Parquet

Fixed conversions of `java.sql.Timestamp` to milliseconds in `ParquetFilter` by using existing functions from `DateTimeUtils` `fromJavaTimestamp()` and `microsToMillis()`.

The changes fix the bug:
```scala
scala> spark.conf.set("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MILLIS")
scala> spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "CORRECTED")
scala> Seq(java.sql.Timestamp.valueOf("1000-06-14 08:28:53.123")).toDF("ts").write.mode("overwrite").parquet("/Users/maximgekk/tmp/ts_millis_old_filter")
scala> spark.read.parquet("/Users/maximgekk/tmp/ts_millis_old_filter").filter($"ts" === "1000-06-14 08:28:53.123").show(false)
+---+
|ts |
+---+
+---+
```

Yes, after the changes (for the example above):
```scala
scala> spark.read.parquet("/Users/maximgekk/tmp/ts_millis_old_filter").filter($"ts" === "1000-06-14 08:28:53.123").show(false)
+-----------------------+
|ts                     |
+-----------------------+
|1000-06-14 08:28:53.123|
+-----------------------+
```

Modified tests in `ParquetFilterSuite` to check old timestamps.

Closes apache#28693 from MaxGekk/parquet-ts-millis-filter.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 9c0dc28)
Signed-off-by: Max Gekk <[email protected]>
@MaxGekk
Copy link
Member Author

MaxGekk commented Jun 1, 2020

Here is the PR for 3.0: #28699

@MaxGekk MaxGekk deleted the parquet-ts-millis-filter branch June 5, 2020 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants