Skip to content

Commit 9d80735

Browse files
rdblueHyukjinKwon
authored andcommitted
[SPARK-26677][BUILD] Update Parquet to 1.10.1 with notEq pushdown fix.
## What changes were proposed in this pull request? Update to Parquet Java 1.10.1. ## How was this patch tested? Added a test from HyukjinKwon that validates the notEq case from SPARK-26677. Closes apache#23704 from rdblue/SPARK-26677-fix-noteq-parquet-bug. Lead-authored-by: Ryan Blue <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Co-authored-by: Ryan Blue <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit f72d217) Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent 82f6e61 commit 9d80735

File tree

4 files changed

+26
-11
lines changed

4 files changed

+26
-11
lines changed

dev/deps/spark-deps-hadoop-2.7

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -160,13 +160,13 @@ orc-shims-1.5.4.jar
160160
oro-2.0.8.jar
161161
osgi-resource-locator-1.0.1.jar
162162
paranamer-2.8.jar
163-
parquet-column-1.10.0.jar
164-
parquet-common-1.10.0.jar
165-
parquet-encoding-1.10.0.jar
163+
parquet-column-1.10.1.jar
164+
parquet-common-1.10.1.jar
165+
parquet-encoding-1.10.1.jar
166166
parquet-format-2.4.0.jar
167-
parquet-hadoop-1.10.0.jar
167+
parquet-hadoop-1.10.1.jar
168168
parquet-hadoop-bundle-1.6.0.jar
169-
parquet-jackson-1.10.0.jar
169+
parquet-jackson-1.10.1.jar
170170
protobuf-java-2.5.0.jar
171171
py4j-0.10.7.jar
172172
pyrolite-4.13.jar

dev/deps/spark-deps-hadoop-3.1

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -178,13 +178,13 @@ orc-shims-1.5.4.jar
178178
oro-2.0.8.jar
179179
osgi-resource-locator-1.0.1.jar
180180
paranamer-2.8.jar
181-
parquet-column-1.10.0.jar
182-
parquet-common-1.10.0.jar
183-
parquet-encoding-1.10.0.jar
181+
parquet-column-1.10.1.jar
182+
parquet-common-1.10.1.jar
183+
parquet-encoding-1.10.1.jar
184184
parquet-format-2.4.0.jar
185-
parquet-hadoop-1.10.0.jar
185+
parquet-hadoop-1.10.1.jar
186186
parquet-hadoop-bundle-1.6.0.jar
187-
parquet-jackson-1.10.0.jar
187+
parquet-jackson-1.10.1.jar
188188
protobuf-java-2.5.0.jar
189189
py4j-0.10.7.jar
190190
pyrolite-4.13.jar

pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@
131131
<!-- Version used for internal directory structure -->
132132
<hive.version.short>3.0.0.1</hive.version.short>
133133
<derby.version>10.12.1.1</derby.version>
134-
<parquet.version>1.10.0</parquet.version>
134+
<parquet.version>1.10.1</parquet.version>
135135
<orc.version>1.5.4</orc.version>
136136
<orc.classifier></orc.classifier>
137137
<hive.parquet.version>1.6.0</hive.parquet.version>

sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -891,6 +891,21 @@ class ParquetQuerySuite extends QueryTest with ParquetTest with SharedSQLContext
891891
}
892892
}
893893
}
894+
895+
test("SPARK-26677: negated null-safe equality comparison should not filter matched row groups") {
896+
(true :: false :: Nil).foreach { vectorized =>
897+
withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> vectorized.toString) {
898+
withTempPath { path =>
899+
// Repeated values for dictionary encoding.
900+
Seq(Some("A"), Some("A"), None).toDF.repartition(1)
901+
.write.parquet(path.getAbsolutePath)
902+
val df = spark.read.parquet(path.getAbsolutePath)
903+
checkAnswer(stripSparkFilter(df.where("NOT (value <=> 'A')")), df)
904+
}
905+
}
906+
}
907+
}
908+
894909
}
895910

896911
object TestingUDT {

0 commit comments

Comments
 (0)