-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23852][SQL] Upgrade to Parquet 1.8.3 #21302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #90522 has finished for PR 21302 at commit
|
|
Test build #90521 has finished for PR 21302 at commit
|
|
@henryr, why not backport the test case in this commit? I don't think it makes sense to separate the two because that test verifies this commit. |
|
+1 when tests are passing. |
|
Sounds good, done. |
|
Apache Parquet 1.8.3 release only contains apache/parquet-java#465 and apache/parquet-java#468, right? |
|
LGTM pending tests. |
| // parquet-1217.parquet contains a single column with values -1, 0, 1, 2 and null. | ||
| // The row-group statistics include null counts, but not min and max values, which | ||
| // triggers PARQUET-1217. | ||
| val df = readResourceParquetFile("test-data/parquet-1217.parquet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this test case assumes spark.sql.parquet.filterPushdown=true, let's use the followings. Otherwise, this test case will fail when we change the default configuration value.
withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true",There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be done in master (and backported to 2.3 if desired).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR for master is #21323. My guess is there's no reason to block this backport and 2.3.1 by waiting for it to land, but happy to do whatever.
|
cc @liancheng @michal-databricks @cloud-fan Please double check and confirm the risk of these two Parquet PRs is low. |
|
Test build #90523 has finished for PR 21302 at commit
|
|
retest this please |
|
Test build #90527 has finished for PR 21302 at commit
|
|
Test build #90536 has finished for PR 21302 at commit
|
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too
|
Any remaining feedback here? Otherwise I'd like to get this in soon-ish. |
|
@henryr could you update the PR description (part about the test backport)? Thx |
|
Done. |
|
Merging to 2.3. In the unlikely event of issues, we can address them later. |
## What changes were proposed in this pull request? Upgrade Parquet dependency to 1.8.3 to avoid PARQUET-1217 ## How was this patch tested? Ran the included new test case. Author: Henry Robinson <[email protected]> Closes #21302 from henryr/branch-2.3.
|
Also, please close the PR manually (github doesn't do that for branches). |
What changes were proposed in this pull request?
Upgrade Parquet dependency to 1.8.3 to avoid PARQUET-1217
How was this patch tested?
Ran the included new test case.