-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32451][R] Support Apache Arrow 1.0.0 #29252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi, @HyukjinKwon . |
appveyor.yml
Outdated
| # This environment variable works around to test SparkR against a higher version. | ||
| R_REMOTES_NO_ERRORS_FROM_WARNINGS: true | ||
| # AppVeyor doesn't have python3 yet | ||
| PYSPARK_PYTHON: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I piggy-backed this because I want to make AppVeyor success in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, so AppVeyor was broken now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It has been broken for a while, so the recent Python3 change is not discovered correctly.
|
Test build #126593 has finished for PR 29252 at commit
|
|
Hi, @viirya . Could you review this PR? |
|
Test build #126595 has finished for PR 29252 at commit
|
| output <- tryCatch({ | ||
| doServerAuth(conn, authSecret) | ||
| arrowTable <- arrow::read_arrow(readRaw(conn)) | ||
| arrowTable <- arrow::read_ipc_stream(readRaw(conn)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question. When was this API added? R side currently supports Arrow 0.15.1+.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jenkins has old Arrow, @HyukjinKwon , and this passed Jenkins~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that the failure didn't occurred in Jenkins environment. Only GitHub Action and AppVeyor failed so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh.. Jenkins doesn't have arrow? Let me check then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, this was added since 0.17.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh.. It's not 0.15. Got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this part, if we are not bump up our minimum, we need to have if .. else. Please make a follow up. Thanks, guys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe increasing minimum Arrow version to 0.17.1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me just bump up the minimal version of Arrow in SparkR at #29253. Should be fine since such minimal version bump-up is already documented.
|
Thank you for review, @viirya . For |
|
I think we could just deal with Arrow in this PR. |
|
Thanks. |
This reverts commit caacdee.
|
I reverted AppVeyor part and updated the PR description. Can we proceed to merge since it's already verified? |
|
Oh, thank you so much, @viirya ! Merged to master. |
|
Test FAILed. |
### What changes were proposed in this pull request? This PR ports back #29252 to support Arrow 1.0.0. Currently, SparkR with Arrow tests fails with the latest Arrow version in branch-3.0, see https://github.com/apache/spark/pull/29460/checks?check_run_id=996972267 ### Why are the changes needed? To support higher Arrow R version with SparkR. ### Does this PR introduce _any_ user-facing change? Yes, users will be able to use SparkR with Arrow 1.0.0+. ### How was this patch tested? Manually tested, GitHub Actions will test it. Closes #29462 from HyukjinKwon/SPARK-32451-3.0. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Currently,
GitHub Actionis broken due toSparkR UT failurefrom new Apache Arrow 1.0.0.This PR aims to update R code according to Apache Arrow 1.0.0 recommendation to pass R unit tests.
An alternative is pinning Apache Arrow version at 0.17.1 and I also created a PR to compare with this.
Why are the changes needed?
Does this PR introduce any user-facing change?
No.
How was this patch tested?