Skip to content

Conversation

@henryr
Copy link
Contributor

@henryr henryr commented May 11, 2018

What changes were proposed in this pull request?

Upgrade Parquet dependency to 1.8.3 to avoid PARQUET-1217

How was this patch tested?

Ran the included new test case.

@SparkQA
Copy link

SparkQA commented May 11, 2018

Test build #90522 has finished for PR 21302 at commit 8f4b3db.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 11, 2018

Test build #90521 has finished for PR 21302 at commit 35e2149.

  • This patch fails build dependency tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@rdblue
Copy link
Contributor

rdblue commented May 11, 2018

@henryr, why not backport the test case in this commit? I don't think it makes sense to separate the two because that test verifies this commit.

@rdblue
Copy link
Contributor

rdblue commented May 11, 2018

+1 when tests are passing.

@henryr
Copy link
Contributor Author

henryr commented May 11, 2018

Sounds good, done.

@gatorsmile
Copy link
Member

Apache Parquet 1.8.3 release only contains apache/parquet-java#465 and apache/parquet-java#468, right?

@rdblue
Copy link
Contributor

rdblue commented May 11, 2018

@gatorsmile, that is correct. https://github.com/apache/parquet-mr/commits/apache-parquet-1.8.3

@vanzin
Copy link
Contributor

vanzin commented May 11, 2018

LGTM pending tests.

// parquet-1217.parquet contains a single column with values -1, 0, 1, 2 and null.
// The row-group statistics include null counts, but not min and max values, which
// triggers PARQUET-1217.
val df = readResourceParquetFile("test-data/parquet-1217.parquet")
Copy link
Member

@dongjoon-hyun dongjoon-hyun May 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this test case assumes spark.sql.parquet.filterPushdown=true, let's use the followings. Otherwise, this test case will fail when we change the default configuration value.

withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true",

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be done in master (and backported to 2.3 if desired).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR for master is #21323. My guess is there's no reason to block this backport and 2.3.1 by waiting for it to land, but happy to do whatever.

@gatorsmile
Copy link
Member

cc @liancheng @michal-databricks @cloud-fan Please double check and confirm the risk of these two Parquet PRs is low.

@SparkQA
Copy link

SparkQA commented May 11, 2018

Test build #90523 has finished for PR 21302 at commit c681819.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented May 11, 2018

retest this please

@SparkQA
Copy link

SparkQA commented May 11, 2018

Test build #90527 has finished for PR 21302 at commit 8566ba1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 12, 2018

Test build #90536 has finished for PR 21302 at commit 8566ba1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too

@vanzin
Copy link
Contributor

vanzin commented May 14, 2018

Any remaining feedback here? Otherwise I'd like to get this in soon-ish.

@vanzin
Copy link
Contributor

vanzin commented May 14, 2018

@henryr could you update the PR description (part about the test backport)? Thx

@henryr
Copy link
Contributor Author

henryr commented May 14, 2018

Done.

@vanzin
Copy link
Contributor

vanzin commented May 14, 2018

Merging to 2.3. In the unlikely event of issues, we can address them later.

asfgit pushed a commit that referenced this pull request May 14, 2018
## What changes were proposed in this pull request?

Upgrade Parquet dependency to 1.8.3 to avoid PARQUET-1217

## How was this patch tested?

Ran the included new test case.

Author: Henry Robinson <[email protected]>

Closes #21302 from henryr/branch-2.3.
@vanzin
Copy link
Contributor

vanzin commented May 14, 2018

Also, please close the PR manually (github doesn't do that for branches).

@henryr henryr closed this May 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants