Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
[SPARK-6797][SPARKR] Add support for YARN cluster mode.
  • Loading branch information
Sun Rui committed Jul 13, 2015
commit cedfbe211070b09bc109823f3fb74ba20b70feb2
5 changes: 5 additions & 0 deletions R/install-dev.bat
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,8 @@ set SPARK_HOME=%~dp0..
MKDIR %SPARK_HOME%\R\lib

R.exe CMD INSTALL --library="%SPARK_HOME%\R\lib" %SPARK_HOME%\R\pkg\

REM Zip the SparkR package so that it can be distributed to worker nodes on YARN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use rem instead of REM to be consistent with the rest of the file

pushd %SPARK_HOME%\R\lib
jar cfM "%SPARK_HOME%\R\lib\sparkr.zip" SparkR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sun-rui -- this should be jar.exe instead of jar. The other thing is that jar.exe is only available in the JDK and not in the JAR version. So sometimes this may not be in the PATH. There are a couple of options for things we can do here

  1. We can use %JAVA_HOME%\bin\jar.exe -- This might be more safer as users need to set JAVA_HOME for the compilation to work correctly
  2. Rtools [1] by default installs a zip utility [2] as zip.exe. At least on my machine running zip.exe -r sparkr.zip SparkR seems to work.

[1] http://cran.r-project.org/bin/windows/Rtools/
[2] http://www.info-zip.org/

popd
8 changes: 6 additions & 2 deletions R/install-dev.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,16 @@ LIB_DIR="$FWDIR/lib"

mkdir -p $LIB_DIR

pushd $FWDIR
pushd $FWDIR > /dev/null

# Generate Rd files if devtools is installed
Rscript -e ' if("devtools" %in% rownames(installed.packages())) { library(devtools); devtools::document(pkg="./pkg", roclets=c("rd")) }'

# Install SparkR to $LIB_DIR
R CMD INSTALL --library=$LIB_DIR $FWDIR/pkg/

popd
# Zip the SparkR package so that it can be distributed to worker nodes on YARN
cd $LIB_DIR
jar cfM "$LIB_DIR/sparkr.zip" SparkR

popd > /dev/null
4 changes: 2 additions & 2 deletions R/pkg/inst/profile/general.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
#

.First <- function() {
home <- Sys.getenv("SPARK_HOME")
.libPaths(c(file.path(home, "R", "lib"), .libPaths()))
packageDir <- Sys.getenv("SPARKR_PACKAGE_DIR")
.libPaths(c(packageDir, .libPaths()))
Sys.setenv(NOAWT=1)
}
7 changes: 5 additions & 2 deletions core/src/main/scala/org/apache/spark/deploy/RRunner.scala
Original file line number Diff line number Diff line change
Expand Up @@ -71,9 +71,12 @@ object RRunner {
val builder = new ProcessBuilder(Seq(rCommand, rFileNormalized) ++ otherArgs)
val env = builder.environment()
env.put("EXISTING_SPARKR_BACKEND_PORT", sparkRBackendPort.toString)
val sparkHome = System.getenv("SPARK_HOME")
// The SparkR package distributed as an archive resource should be pointed to
// by a symbol link "sparkr" in the current directory.
val rPackageDir = new File("sparkr").getAbsolutePath
env.put("SPARKR_PACKAGE_DIR", rPackageDir)
env.put("R_PROFILE_USER",
Seq(sparkHome, "R", "lib", "SparkR", "profile", "general.R").mkString(File.separator))
Seq(rPackageDir, "SparkR", "profile", "general.R").mkString(File.separator))
builder.redirectErrorStream(true) // Ugly but needed for stdout and stderr to synchronize
val process = builder.start()

Expand Down
17 changes: 17 additions & 0 deletions core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ object SparkSubmit {
private val SPARK_SHELL = "spark-shell"
private val PYSPARK_SHELL = "pyspark-shell"
private val SPARKR_SHELL = "sparkr-shell"
private val SPARKR_PACKAGE_ARCHIVE = "sparkr.zip"

private val CLASS_NOT_FOUND_EXIT_STATUS = 101

Expand Down Expand Up @@ -347,6 +348,22 @@ object SparkSubmit {
}
}

// In yarn mode for an R app, add the SparkR package archive to archives
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: elsewhere we use capital YARN. Here and L349 and L353.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// that can be distributed with the job
if (args.isR && clusterManager == YARN) {
val sparkHome = sys.env.get("SPARK_HOME")
if (sparkHome.isEmpty) {
printErrorAndExit("SPARK_HOME does not exist for R application in yarn mode.")
}
val rPackagePath = Seq(sparkHome.get, "R", "lib").mkString(File.separator)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make this also go through RUtils.sparkRPackagePath? It seems weird to have two separate code paths here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

val rPackageFile = new File(rPackagePath, SPARKR_PACKAGE_ARCHIVE)
if (!rPackageFile.exists()) {
printErrorAndExit(s"$SPARKR_PACKAGE_ARCHIVE does not exist for R application in yarn mode.")
}
val localURI = Utils.resolveURI(rPackageFile.getAbsolutePath)
args.archives = mergeFileLists(args.archives, localURI.toString + "#sparkr")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here as to why we have this '#sparkr" ? I believe this is to get the archive to unzip to a symlink named sparkr ?

}

// If we're running a R app, set the main class to our specific R runner
if (args.isR && deployMode == CLIENT) {
if (args.primaryResource == SPARKR_SHELL) {
Expand Down
1 change: 1 addition & 0 deletions make-distribution.sh
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,7 @@ cp -r "$SPARK_HOME/ec2" "$DISTDIR"
if [ -d "$SPARK_HOME"/R/lib/SparkR ]; then
mkdir -p "$DISTDIR"/R/lib
cp -r "$SPARK_HOME/R/lib/SparkR" "$DISTDIR"/R/lib
cp "$SPARK_HOME/R/lib/sparkr.zip" "$DISTDIR"/R/lib
fi

# Download and copy in tachyon, if requested
Expand Down