-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-18643][SPARKR] SparkR hangs at session start when installed as a package without Spark #16077
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #69386 has finished for PR 16077 at commit
|
|
Is this means that it will install spark package each time when starting a new session in interactive mode? Thanks. |
|
install.spark() checks if SPARK_HOME is set or the Spark Jar can be found in the cache location - it only downloads if Spark isn't found.
|
|
LGTM |
|
Just to clarify this limits the auto-install feature to SparkR running on an interactive shell -- Is that right ? I think thats mostly fine as its the use-case we were targeting but it might good to check our documentation and update it appropriately. |
|
Right, I was just reviewing possible code paths for this in the last few days and I'm pretty confident that this change will not run install.spark in cluster modes (which would have Spark/JVM already running). Also you are right, I just found that we didn't really talk about how install.spark would be called in sparkR.session() - I'll add that. |
|
Test build #69654 has finished for PR 16077 at commit
|
|
LGTM. Merging this to master, branch-2.1 |
… a package without Spark ## What changes were proposed in this pull request? If SparkR is running as a package and it has previously downloaded Spark Jar it should be able to run as before without having to set SPARK_HOME. Basically with this bug the auto install Spark will only work in the first session. This seems to be a regression on the earlier behavior. Fix is to always try to install or check for the cached Spark if running in an interactive session. As discussed before, we should probably only install Spark iff running in an interactive session (R shell, RStudio etc) ## How was this patch tested? Manually Author: Felix Cheung <[email protected]> Closes #16077 from felixcheung/rsessioninteractive. (cherry picked from commit b019b3a) Signed-off-by: Shivaram Venkataraman <[email protected]>
… a package without Spark ## What changes were proposed in this pull request? If SparkR is running as a package and it has previously downloaded Spark Jar it should be able to run as before without having to set SPARK_HOME. Basically with this bug the auto install Spark will only work in the first session. This seems to be a regression on the earlier behavior. Fix is to always try to install or check for the cached Spark if running in an interactive session. As discussed before, we should probably only install Spark iff running in an interactive session (R shell, RStudio etc) ## How was this patch tested? Manually Author: Felix Cheung <[email protected]> Closes apache#16077 from felixcheung/rsessioninteractive.
… a package without Spark ## What changes were proposed in this pull request? If SparkR is running as a package and it has previously downloaded Spark Jar it should be able to run as before without having to set SPARK_HOME. Basically with this bug the auto install Spark will only work in the first session. This seems to be a regression on the earlier behavior. Fix is to always try to install or check for the cached Spark if running in an interactive session. As discussed before, we should probably only install Spark iff running in an interactive session (R shell, RStudio etc) ## How was this patch tested? Manually Author: Felix Cheung <[email protected]> Closes apache#16077 from felixcheung/rsessioninteractive.
What changes were proposed in this pull request?
If SparkR is running as a package and it has previously downloaded Spark Jar it should be able to run as before without having to set SPARK_HOME. Basically with this bug the auto install Spark will only work in the first session.
This seems to be a regression on the earlier behavior.
Fix is to always try to install or check for the cached Spark if running in an interactive session.
As discussed before, we should probably only install Spark iff running in an interactive session (R shell, RStudio etc)
How was this patch tested?
Manually