Skip to content

Commit a3626ca

Browse files
felixcheungshivaram
authored andcommitted
[SPARK-19387][SPARKR] Tests do not run with SparkR source package in CRAN check
## What changes were proposed in this pull request? - this is cause by changes in SPARK-18444, SPARK-18643 that we no longer install Spark when `master = ""` (default), but also related to SPARK-18449 since the real `master` value is not known at the time the R code in `sparkR.session` is run. (`master` cannot default to "local" since it could be overridden by spark-submit commandline or spark config) - as a result, while running SparkR as a package in IDE is working fine, CRAN check is not as it is launching it via non-interactive script - fix is to add check to the beginning of each test and vignettes; the same would also work by changing `sparkR.session()` to `sparkR.session(master = "local")` in tests, but I think being more explicit is better. ## How was this patch tested? Tested this by reverting version to 2.1, since it needs to download the release jar with matching version. But since there are changes in 2.2 (specifically around SparkR ML) that are incompatible with 2.1, some tests are failing in this config. Will need to port this to branch-2.1 and retest with 2.1 release jar. manually as: ``` # modify DESCRIPTION to revert version to 2.1.0 SPARK_HOME=/usr/spark R CMD build pkg # run cran check without SPARK_HOME R CMD check --as-cran SparkR_2.1.0.tar.gz ``` Author: Felix Cheung <[email protected]> Closes #16720 from felixcheung/rcranchecktest.
1 parent ab9872d commit a3626ca

File tree

4 files changed

+21
-7
lines changed

4 files changed

+21
-7
lines changed

R/pkg/R/install.R

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@
2121
#' Download and Install Apache Spark to a Local Directory
2222
#'
2323
#' \code{install.spark} downloads and installs Spark to a local directory if
24-
#' it is not found. The Spark version we use is the same as the SparkR version.
25-
#' Users can specify a desired Hadoop version, the remote mirror site, and
26-
#' the directory where the package is installed locally.
24+
#' it is not found. If SPARK_HOME is set in the environment, and that directory is found, that is
25+
#' returned. The Spark version we use is the same as the SparkR version. Users can specify a desired
26+
#' Hadoop version, the remote mirror site, and the directory where the package is installed locally.
2727
#'
2828
#' The full url of remote file is inferred from \code{mirrorUrl} and \code{hadoopVersion}.
2929
#' \code{mirrorUrl} specifies the remote path to a Spark folder. It is followed by a subfolder
@@ -68,6 +68,16 @@
6868
#' \href{http://spark.apache.org/downloads.html}{Apache Spark}
6969
install.spark <- function(hadoopVersion = "2.7", mirrorUrl = NULL,
7070
localDir = NULL, overwrite = FALSE) {
71+
sparkHome <- Sys.getenv("SPARK_HOME")
72+
if (isSparkRShell()) {
73+
stopifnot(nchar(sparkHome) > 0)
74+
message("Spark is already running in sparkR shell.")
75+
return(invisible(sparkHome))
76+
} else if (!is.na(file.info(sparkHome)$isdir)) {
77+
message("Spark package found in SPARK_HOME: ", sparkHome)
78+
return(invisible(sparkHome))
79+
}
80+
7181
version <- paste0("spark-", packageVersion("SparkR"))
7282
hadoopVersion <- tolower(hadoopVersion)
7383
hadoopVersionName <- hadoopVersionName(hadoopVersion)

R/pkg/R/sparkR.R

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -588,13 +588,11 @@ processSparkPackages <- function(packages) {
588588
sparkCheckInstall <- function(sparkHome, master, deployMode) {
589589
if (!isSparkRShell()) {
590590
if (!is.na(file.info(sparkHome)$isdir)) {
591-
msg <- paste0("Spark package found in SPARK_HOME: ", sparkHome)
592-
message(msg)
591+
message("Spark package found in SPARK_HOME: ", sparkHome)
593592
NULL
594593
} else {
595594
if (interactive() || isMasterLocal(master)) {
596-
msg <- paste0("Spark not found in SPARK_HOME: ", sparkHome)
597-
message(msg)
595+
message("Spark not found in SPARK_HOME: ", sparkHome)
598596
packageLocalDir <- install.spark()
599597
packageLocalDir
600598
} else if (isClientMode(master) || deployMode == "client") {

R/pkg/tests/run-all.R

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,4 +21,7 @@ library(SparkR)
2121
# Turn all warnings into errors
2222
options("warn" = 2)
2323

24+
# Setup global test environment
25+
install.spark()
26+
2427
test_package("SparkR")

R/pkg/vignettes/sparkr-vignettes.Rmd

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,9 @@ library(SparkR)
4444

4545
We use default settings in which it runs in local mode. It auto downloads Spark package in the background if no previous installation is found. For more details about setup, see [Spark Session](#SetupSparkSession).
4646

47+
```{r, include=FALSE}
48+
install.spark()
49+
```
4750
```{r, message=FALSE, results="hide"}
4851
sparkR.session()
4952
```

0 commit comments

Comments
 (0)