Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jul 27, 2020

What changes were proposed in this pull request?

Currently, GitHub Action is broken due to SparkR UT failure from new Apache Arrow 1.0.0.

Screen Shot 2020-07-26 at 5 12 08 PM

This PR aims to update R code according to Apache Arrow 1.0.0 recommendation to pass R unit tests.

An alternative is pinning Apache Arrow version at 0.17.1 and I also created a PR to compare with this.

Why are the changes needed?

Does this PR introduce any user-facing change?

No.

How was this patch tested?

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-32451][R] Support Apache Arrow 1.0.0 in testing [SPARK-32451][R] Support Apache Arrow 1.0.0 Jul 27, 2020
@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jul 27, 2020

Hi, @HyukjinKwon .
Could you review this?
This will recover GitHub Action.

appveyor.yml Outdated
# This environment variable works around to test SparkR against a higher version.
R_REMOTES_NO_ERRORS_FROM_WARNINGS: true
# AppVeyor doesn't have python3 yet
PYSPARK_PYTHON: python
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I piggy-backed this because I want to make AppVeyor success in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, so AppVeyor was broken now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It has been broken for a while, so the recent Python3 change is not discovered correctly.

@SparkQA
Copy link

SparkQA commented Jul 27, 2020

Test build #126593 has finished for PR 29252 at commit 2df5caf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Hi, @viirya . Could you review this PR?

@SparkQA
Copy link

SparkQA commented Jul 27, 2020

Test build #126595 has finished for PR 29252 at commit caacdee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

output <- tryCatch({
doServerAuth(conn, authSecret)
arrowTable <- arrow::read_arrow(readRaw(conn))
arrowTable <- arrow::read_ipc_stream(readRaw(conn))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question. When was this API added? R side currently supports Arrow 0.15.1+.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jenkins has old Arrow, @HyukjinKwon , and this passed Jenkins~

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that the failure didn't occurred in Jenkins environment. Only GitHub Action and AppVeyor failed so far.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh.. Jenkins doesn't have arrow? Let me check then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, this was added since 0.17.0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh.. It's not 0.15. Got it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this part, if we are not bump up our minimum, we need to have if .. else. Please make a follow up. Thanks, guys.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe increasing minimum Arrow version to 0.17.1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me just bump up the minimal version of Arrow in SparkR at #29253. Should be fine since such minimal version bump-up is already documented.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jul 27, 2020

Thank you for review, @viirya . For AppVeyor, it seems to have another issue. I'm thinking of excluding AppVeyor from this PR.

[INFO] ------------------------------------------------------------------------
1906[INFO] BUILD SUCCESS
1907[INFO] ------------------------------------------------------------------------
1908[INFO] Total time:  22:38 min
1909[INFO] Finished at: 2020-07-27T01:13:32Z
1910[INFO] ------------------------------------------------------------------------
1911.\bin\spark-submit2.cmd --driver-java-options "-Dlog4j.configuration=file:///%CD:\=/%/R/log4j.properties" --conf spark.hadoop.fs.defaultFS="file:///" R\pkg\tests\run-all.R
1912"Presence of build for multiple Scala versions detected ( and )."
1913"Remove one of them or, set SPARK_SCALA_VERSION= in \spark-env.cmd."

@viirya
Copy link
Member

viirya commented Jul 27, 2020

I think we could just deal with Arrow in this PR.

@dongjoon-hyun
Copy link
Member Author

Thanks.

This reverts commit caacdee.
@dongjoon-hyun
Copy link
Member Author

I reverted AppVeyor part and updated the PR description. Can we proceed to merge since it's already verified?

@dongjoon-hyun
Copy link
Member Author

Oh, thank you so much, @viirya ! Merged to master.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126599/
Test FAILed.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-ARROW branch July 27, 2020 01:53
dongjoon-hyun pushed a commit that referenced this pull request Aug 19, 2020
### What changes were proposed in this pull request?

This PR ports back #29252 to support Arrow 1.0.0.

Currently, SparkR with Arrow tests fails with the latest Arrow version in branch-3.0, see https://github.com/apache/spark/pull/29460/checks?check_run_id=996972267

### Why are the changes needed?

To support higher Arrow R version with SparkR.

### Does this PR introduce _any_ user-facing change?

Yes, users will be able to use SparkR with Arrow 1.0.0+.

### How was this patch tested?

Manually tested, GitHub Actions will test it.

Closes #29462 from HyukjinKwon/SPARK-32451-3.0.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants