Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
250cb95
Do not ignore spark.driver.extra* for client mode
andrewor14 Aug 4, 2014
a2ab1b0
Parse spark.driver.extra* in bash
andrewor14 Aug 6, 2014
0025474
Revert SparkSubmit handling of --driver-* options for only cluster mode
andrewor14 Aug 6, 2014
63ed2e9
Merge branch 'master' of github.com:apache/spark into submit-driver-e…
andrewor14 Aug 6, 2014
75ee6b4
Remove accidentally added file
andrewor14 Aug 6, 2014
8843562
Fix compilation issues...
andrewor14 Aug 6, 2014
98dd8e3
Add warning if properties file does not exist
andrewor14 Aug 6, 2014
130f295
Handle spark.driver.memory too
andrewor14 Aug 6, 2014
4edcaa8
Redirect stdout to stderr for python
andrewor14 Aug 6, 2014
e5cfb46
Collapse duplicate code + fix potential whitespace issues
andrewor14 Aug 6, 2014
4ec22a1
Merge branch 'master' of github.com:apache/spark into submit-driver-e…
andrewor14 Aug 6, 2014
ef12f74
Minor formatting
andrewor14 Aug 6, 2014
fa2136e
Escape Java options + parse java properties files properly
andrewor14 Aug 7, 2014
dec2343
Only export variables if they exist
andrewor14 Aug 7, 2014
a4df3c4
Move parsing and escaping logic to utils.sh
andrewor14 Aug 7, 2014
de765c9
Print spark-class command properly
andrewor14 Aug 7, 2014
8e552b7
Include an example of spark.*.extraJavaOptions
andrewor14 Aug 7, 2014
c13a2cb
Merge branch 'master' of github.com:apache/spark into submit-driver-e…
andrewor14 Aug 7, 2014
c854859
Add small comment
andrewor14 Aug 7, 2014
1cdc6b1
Fix bug: escape escaped double quotes properly
andrewor14 Aug 7, 2014
45a1eb9
Fix bug: escape escaped backslashes and quotes properly...
andrewor14 Aug 7, 2014
aabfc7e
escape -> split (minor)
andrewor14 Aug 7, 2014
a992ae2
Escape spark.*.extraJavaOptions correctly
andrewor14 Aug 7, 2014
c7b9926
Minor changes to spark-defaults.conf.template
andrewor14 Aug 7, 2014
5d8f8c4
Merge branch 'master' of github.com:apache/spark into submit-driver-e…
andrewor14 Aug 7, 2014
e793e5f
Handle multi-line arguments
andrewor14 Aug 8, 2014
c2273fc
Fix typo (minor)
andrewor14 Aug 8, 2014
b3c4cd5
Fix bug: count the number of quotes instead of detecting presence
andrewor14 Aug 8, 2014
4ae24c3
Fix bug: escape properly in quote_java_property
andrewor14 Aug 8, 2014
8d26a5c
Add tests for bash/utils.sh
andrewor14 Aug 8, 2014
2732ac0
Integrate BASH tests into dev/run-tests + log error properly
andrewor14 Aug 9, 2014
aeb79c7
Merge branch 'master' of github.com:apache/spark into handle-configs-…
andrewor14 Aug 9, 2014
8d4614c
Merge branch 'master' of github.com:apache/spark into handle-configs-…
andrewor14 Aug 16, 2014
56ac247
Use eval and set to simplify splitting
andrewor14 Aug 16, 2014
bd0d468
Simplify parsing config file by ignoring multi-line arguments
andrewor14 Aug 16, 2014
be99eb3
Fix tests to not include multi-line configs
andrewor14 Aug 16, 2014
371cac4
Add function prefix (minor)
andrewor14 Aug 16, 2014
fa11ef8
Parse the properties file only if the special configs exist
andrewor14 Aug 16, 2014
7396be2
Explicitly comment that multi-line properties are not supported
andrewor14 Aug 16, 2014
7a4190a
Merge branch 'master' of github.com:apache/spark into handle-configs-…
andrewor14 Aug 16, 2014
c886568
Fix lines too long + a few comments / style (minor)
andrewor14 Aug 16, 2014
0effa1e
Add code in Scala that handles special configs
andrewor14 Aug 19, 2014
a396eda
Nullify my own hard work to simplify bash
andrewor14 Aug 19, 2014
c37e08d
Revert a few more changes
andrewor14 Aug 19, 2014
3a8235d
Only parse the properties file if special configs exist
andrewor14 Aug 19, 2014
7d94a8d
Merge branch 'master' of github.com:apache/spark into handle-configs-…
andrewor14 Aug 19, 2014
b71f52b
Revert a few more changes (minor)
andrewor14 Aug 19, 2014
c84f5c8
Remove debug print statement (minor)
andrewor14 Aug 19, 2014
158f813
Remove "client mode" boolean argument
andrewor14 Aug 19, 2014
a91ea19
Fix precedence of library paths, classpath, java opts and memory
andrewor14 Aug 19, 2014
1ea6bbe
SparkClassLauncher -> SparkSubmitDriverBootstrapper
andrewor14 Aug 19, 2014
d6488f9
Merge branch 'master' of github.com:apache/spark into handle-configs-…
andrewor14 Aug 19, 2014
19464ad
SPARK_SUBMIT_JAVA_OPTS -> SPARK_SUBMIT_OPTS
andrewor14 Aug 19, 2014
8867a09
A few more naming things (minor)
andrewor14 Aug 19, 2014
9ba37e2
Don't barf when the properties file does not exist
andrewor14 Aug 19, 2014
a78cb26
Revert a few changes in utils.sh (minor)
andrewor14 Aug 19, 2014
d0f20db
Don't pass empty library paths, classpath, java opts etc.
andrewor14 Aug 19, 2014
9a778f6
Fix PySpark: actually kill driver on termination
andrewor14 Aug 20, 2014
51aeb01
Filter out JVM memory in Scala rather than Bash (minor)
andrewor14 Aug 20, 2014
ff34728
Minor comments
andrewor14 Aug 20, 2014
08fd788
Warn against external usages of SparkSubmitDriverBootstrapper
andrewor14 Aug 20, 2014
24dba60
Merge branch 'master' of github.com:apache/spark into handle-configs-…
andrewor14 Aug 20, 2014
bed4bdf
Change a few comments / messages (minor)
andrewor14 Aug 20, 2014
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add code in Scala that handles special configs
The eventual goal of this is to shift the current complex BASH
logic to Scala. The new class should be invoked from `spark-class`.
For simplicity, this currently does not handle SPARK-2914. It is
likely that this will be dealt with in a future PR instead.
  • Loading branch information
andrewor14 committed Aug 19, 2014
commit 0effa1ee5ed302f0533522431db3620919dfbe61
25 changes: 0 additions & 25 deletions core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala
Original file line number Diff line number Diff line change
Expand Up @@ -40,28 +40,3 @@ private[spark] object PythonUtils {
paths.filter(_ != "").mkString(File.pathSeparator)
}
}


/**
* A utility class to redirect the child process's stdout or stderr.
*/
private[spark] class RedirectThread(
in: InputStream,
out: OutputStream,
name: String)
extends Thread(name) {

setDaemon(true)
override def run() {
scala.util.control.Exception.ignoring(classOf[IOException]) {
// FIXME: We copy the stream on the level of bytes to avoid encoding problems.
val buf = new Array[Byte](1024)
var len = in.read(buf)
while (len != -1) {
out.write(buf, 0, len)
out.flush()
len = in.read(buf)
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,14 @@

package org.apache.spark.api.python

import java.lang.Runtime
import java.io.{DataOutputStream, DataInputStream, InputStream, OutputStreamWriter}
import java.net.{InetAddress, ServerSocket, Socket, SocketException}

import scala.collection.mutable
import scala.collection.JavaConversions._

import org.apache.spark._
import org.apache.spark.util.Utils
import org.apache.spark.util.{RedirectThread, Utils}

private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String, String])
extends Logging {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ import java.net.URI
import scala.collection.mutable.ArrayBuffer
import scala.collection.JavaConversions._

import org.apache.spark.api.python.{PythonUtils, RedirectThread}
import org.apache.spark.util.Utils
import org.apache.spark.api.python.PythonUtils
import org.apache.spark.util.{RedirectThread, Utils}

/**
* A main class used by spark-submit to launch Python applications. It executes python as a
Expand Down
118 changes: 118 additions & 0 deletions core/src/main/scala/org/apache/spark/deploy/SparkClassLauncher.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.spark.deploy

import java.io.File

import scala.collection.JavaConversions._

import org.apache.spark.util.{RedirectThread, Utils}

/**
* Wrapper of `bin/spark-class` that prepares the launch environment of the child JVM properly.
*/
object SparkClassLauncher {

/**
* Launch a Spark class with the given class paths, library paths, java options and memory.
* If we are launching an application through Spark submit in client mode, we must also
* take into account special `spark.driver.*` properties needed to start the driver JVM.
*/
def main(args: Array[String]): Unit = {
if (args.size < 8) {
System.err.println(
"""
|Usage: org.apache.spark.deploy.SparkClassLauncher
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep it simpler for now, rather than having a bunch of command line arguments, why not just directly read the environment variables set in spark-class? Then you could just have this script take the main class and the arguments and you could just directly read RUNNER, PROPERTEIS_FILE, CLASSPATH, SPARK_SUBMIT_LIBRARY_PATH, JAVA_OPTS, and OUR_JAVA_MEM. This is one fewer levels of interpretation/parsing to worry about for now and would overall make this patch smaller.

|
| [properties file] - path to your Spark properties file
| [java runner] - command to launch the child JVM
| [java class paths] - class paths to pass to the child JVM
| [java library paths] - library paths to pass to the child JVM
| [java opts] - java options to pass to the child JVM
| [java memory] - memory used to launch the child JVM
| [client mode] - whether the child JVM will run the Spark driver
| [main class] - main class to run in the child JVM
| <main args> - arguments passed to this main class
|
|Example:
| org.apache.spark.deploy.SparkClassLauncher.SparkClassLauncher
| conf/spark-defaults.conf java /classpath1:/classpath2 /librarypath1:/librarypath2
| "-XX:-UseParallelGC -Dsome=property" 5g true org.apache.spark.deploy.SparkSubmit
| --master local --class org.apache.spark.examples.SparkPi 10
""".stripMargin)
System.exit(1)
}
val propertiesFile = args(0)
val javaRunner = args(1)
val clClassPaths = args(2)
val clLibraryPaths = args(3)
val clJavaOpts = args(4)
val clJavaMemory = args(5)
val clientMode = args(6) == "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see a case where this would ever not be "true" - to keep it simple and understandable it might be good to just omit this command line argument and the associated logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is anticipating the future where all usages of bin/spark-class are routed through this class. I was hesitant on removing this because it would technically break backward compatibility if we decide to add a new flag in the future, since this is not private[spark] or anything. Should I just add a warning saying this should be called only from bin/spark-class instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an internal tool - it doesn't need to have compatibility at all.

val mainClass = args(7)

// In client deploy mode, parse the properties file for certain `spark.driver.*` configs.
// These configs encode java options, class paths, and library paths needed to launch the JVM.
val properties =
if (clientMode) {
SparkSubmitArguments.getPropertiesFromFile(new File(propertiesFile)).toMap
} else {
Map[String, String]()
}
val confDriverMemory = properties.get("spark.driver.memory")
val confClassPaths = properties.get("spark.driver.extraClassPath")
val confLibraryPaths = properties.get("spark.driver.extraLibraryPath")
val confJavaOpts = properties.get("spark.driver.extraJavaOptions")

// Merge relevant command line values with the config equivalents, if any
val javaMemory =
if (clientMode) {
confDriverMemory.getOrElse(clJavaMemory)
} else {
clJavaMemory
}
val pathSeparator = sys.props("path.separator")
val classPaths = clClassPaths + confClassPaths.map(pathSeparator + _).getOrElse("")
val libraryPaths = clLibraryPaths + confLibraryPaths.map(pathSeparator + _).getOrElse("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than append one to the other - I think that users setting --driver-library-path should simply take precedence over any driver.library.path defined in the conf. That's the behavior we have in general for configs... it could cause some confusion to merge them here.

val javaOpts = Utils.splitCommandString(clJavaOpts) ++
confJavaOpts.map(Utils.splitCommandString).getOrElse(Seq.empty)
val filteredJavaOpts = javaOpts.filterNot { opt =>
opt.startsWith("-Djava.library.path") || opt.startsWith("-Xms") || opt.startsWith("-Xmx")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean if I put --driver-memory as a flag and I also include spark.driver.memory it in the config file that the config file will take precedence?

}

// Build up command
val command: Seq[String] =
Seq(javaRunner) ++
{ if (classPaths.nonEmpty) Seq("-cp", classPaths) else Seq.empty } ++
{ if (libraryPaths.nonEmpty) Seq(s"-Djava.library.path=$libraryPaths") else Seq.empty } ++
filteredJavaOpts ++
Seq(s"-Xms$javaMemory", s"-Xmx$javaMemory") ++
Seq(mainClass) ++
args.slice(8, args.size)

command.foreach(println)

val builder = new ProcessBuilder(command)
val process = builder.start()
new RedirectThread(System.in, process.getOutputStream, "redirect stdin").start()
new RedirectThread(process.getInputStream, System.out, "redirect stdout").start()
new RedirectThread(process.getErrorStream, System.err, "redirect stderr").start()
System.exit(process.waitFor())
}

}
21 changes: 21 additions & 0 deletions core/src/main/scala/org/apache/spark/util/Utils.scala
Original file line number Diff line number Diff line change
Expand Up @@ -1421,3 +1421,24 @@ private[spark] object Utils extends Logging {
}

}

/**
* A utility class to redirect the child process's stdout or stderr.
*/
private[spark] class RedirectThread(in: InputStream, out: OutputStream, name: String)
extends Thread(name) {

setDaemon(true)
override def run() {
scala.util.control.Exception.ignoring(classOf[IOException]) {
// FIXME: We copy the stream on the level of bytes to avoid encoding problems.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you understand this comment? I don't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shrug

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing the original author was reading Strings() before, which requires that you define how to interpret the bytes into strings (encodings like ASCII/UTF8, etc). There were probably some byte sequences coming through that weren't characters in UTF8 so exceptions were being thrown. Since this is just reading from an InputStream and writing to an OutputStream, copying bytes would be much more efficient than reading bytes, interpreting as characters, converting back to bytes, then sending those out the other side.

Conveniently, Apache has an IOUtils.copy(inputStream, outputStream, bufferSize) [method](http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/IOUtils.html#copy%28java.io.InputStream, java.io.OutputStream, int%29) that would do exactly this.

val buf = new Array[Byte](1024)
var len = in.read(buf)
while (len != -1) {
out.write(buf, 0, len)
out.flush()
len = in.read(buf)
}
}
}
}