Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Commenting, doc, and import fixes from Patrick's comments
  • Loading branch information
sryza committed Mar 26, 2014
commit d428d857971866aa5c8f75970a9655071bf016fd
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,18 @@ package org.apache.spark.deploy

import java.io.File
import java.net.URL
import java.net.URLClassLoader

import org.apache.spark.executor.ExecutorURLClassLoader

import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.HashMap
import scala.collection.mutable.Map

/**
* Scala code behind the spark-submit script. The script handles setting up the classpath with
* relevant Spark dependencies and provides a layer over the different cluster managers and deploy
* modes that Spark supports.
*/
object SparkSubmit {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind adding a high level comment here? It can be very brief (1 line) - just something to make it clear to developers what this is if someone runs in to this file.

val YARN = 1
val STANDALONE = 2
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -165,11 +165,11 @@ private[spark] class SparkSubmitArguments(args: Array[String]) {
| --executor-cores NUM Number of cores per executor (Default: 1).
| --executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G).
| --queue QUEUE_NAME The YARN queue to submit to (Default: 'default').
| --num-executors NUM Number of executors to start (Default: 2).
| --files FILES Comma separated list of files to be placed next to all
| executors.
| --archives ARCHIVES Comma separated list of archives to be extracted next to
| all executors.""".stripMargin
| --num-executors NUM Number of executors to (Default: 2).
| --files FILES Comma separated list of files to be placed in the working dir
| of each executor.
| --archives ARCHIVES Comma separated list of archives to be extracted into the
| working dir of each executor.""".stripMargin
)
System.exit(exitCode)
}
Expand Down
43 changes: 26 additions & 17 deletions docs/cluster-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,37 +56,40 @@ The recommended way to launch a compiled Spark application is through the spark-
bin directory), which takes care of setting up the classpath with Spark and its dependencies, as well as
provides a layer over the different cluster managers and deploy modes that Spark supports. It's usage is

spark-submit <jar> <options>
spark-submit `<jar>` `<options>`

Where options are any of:

- **--class** - The main class to run.
- **--master** - The URL of the cluster manager master, e.g. spark://host:port, mesos://host:port, yarn,
- **\--class** - The main class to run.
- **\--master** - The URL of the cluster manager master, e.g. spark://host:port, mesos://host:port, yarn,
or local.
- **--deploy-mode** - "client" to run the driver in the client process or "cluster" to run the driver in
- **\--deploy-mode** - "client" to run the driver in the client process or "cluster" to run the driver in
a process on the cluster. For Mesos, only "client" is supported.
- **--executor-memory** - Memory per executor (e.g. 1000M, 2G).
- **--executor-cores** - Number of cores per executor.
- **--driver-memory** - Memory for driver (e.g. 1000M, 2G)
- **--name** - Name of the application.
- **--arg** - Argument to be passed to the application's main class. This option can be specified
- **\--executor-memory** - Memory per executor (e.g. 1000M, 2G).
- **\--executor-cores** - Number of cores per executor. (Default: 2)
- **\--driver-memory** - Memory for driver (e.g. 1000M, 2G)
- **\--name** - Name of the application.
- **\--arg** - Argument to be passed to the application's main class. This option can be specified
multiple times to pass multiple arguments.
- **--jars** - A comma-separated list of local jars to include on the driver classpath and that
- **\--jars** - A comma-separated list of local jars to include on the driver classpath and that
SparkContext.addJar will work with. Doesn't work on standalone with 'cluster' deploy mode.

The following currently only work for Spark standalone with cluster deploy mode:
- **--driver-cores** - Cores for driver (Default: 1).
- **--supervise** - If given, restarts the driver on failure.

- **\--driver-cores** - Cores for driver (Default: 1).
- **\--supervise** - If given, restarts the driver on failure.

The following only works for Spark standalone and Mesos only:
- **--total-executor-cores** - Total cores for all executors.

- **\--total-executor-cores** - Total cores for all executors.

The following currently only work for YARN:

- **--queue** - The YARN queue to place the application in.
- **--files** - Comma separated list of files to be placed next to all executors
- **--archives** - Comma separated list of archives to be extracted next to all executors
- **--num-executors** - Number of executors to start.
- **\--queue** - The YARN queue to place the application in.
- **\--files** - Comma separated list of files to be placed in the working dir of each executor.
- **\--archives** - Comma separated list of archives to be extracted into the working dir of each
executor.
- **\--num-executors** - Number of executors (Default: 2).

The master and deploy mode can also be set with the MASTER and DEPLOY_MODE environment variables.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "deploy mode" is a new term that this PR introduces. Would you mind adding it to the glossary below? I think it's something like:

Deploy mode: Distinguishes who is responsible for launching the driver. In "cluster" mode the driver is launched inside of the cluster. In "client" mode, the driver is launched outside of the cluster.

Values for these options passed via command line will override the environment variables.
Expand Down Expand Up @@ -143,6 +146,12 @@ The following table summarizes terms you'll see used to refer to cluster concept
<td>Cluster manager</td>
<td>An external service for acquiring resources on the cluster (e.g. standalone manager, Mesos, YARN)</td>
</tr>
<tr>
<td>Deploy mode</td>
<td>Distinguishes where the driver process runs. In "cluster" mode, the framework launches
the driver inside of the cluster. In "client" mode, the submitter launches the driver
outside of the cluster.</td>
<tr>
<tr>
<td>Worker node</td>
<td>Any node that can run application code in the cluster</td>
Expand Down