Commenting, doc, and import fixes from Patrick's comments

apache · sryza · Mar 4, 2014 · Mar 14, 2014 · Mar 24, 2014 · Mar 25, 2014
commit d428d857971866aa5c8f75970a9655071bf016fd
diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
@@ -19,14 +19,18 @@ package org.apache.spark.deploy
 
 import java.io.File
 import java.net.URL
-import java.net.URLClassLoader
 
 import org.apache.spark.executor.ExecutorURLClassLoader
 
 import scala.collection.mutable.ArrayBuffer
 import scala.collection.mutable.HashMap
 import scala.collection.mutable.Map
 
+/**
+ * Scala code behind the spark-submit script.  The script handles setting up the classpath with
+ * relevant Spark dependencies and provides a layer over the different cluster managers and deploy
+ * modes that Spark supports.
+ */
 object SparkSubmit {
   val YARN = 1
   val STANDALONE = 2

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
@@ -165,11 +165,11 @@ private[spark] class SparkSubmitArguments(args: Array[String]) {
         |  --executor-cores NUM        Number of cores per executor (Default: 1).
         |  --executor-memory MEM       Memory per executor (e.g. 1000M, 2G) (Default: 1G).
         |  --queue QUEUE_NAME          The YARN queue to submit to (Default: 'default').
-        |  --num-executors NUM         Number of executors to start (Default: 2).
-        |  --files FILES               Comma separated list of files to be placed next to all
-        |                              executors.
-        |  --archives ARCHIVES         Comma separated list of archives to be extracted next to
-        |                              all executors.""".stripMargin
+        |  --num-executors NUM         Number of executors to (Default: 2).
+        |  --files FILES               Comma separated list of files to be placed in the working dir
+        |                              of each executor.
+        |  --archives ARCHIVES         Comma separated list of archives to be extracted into the
+        |                              working dir of each executor.""".stripMargin
     )
     System.exit(exitCode)
   }

diff --git a/docs/cluster-overview.md b/docs/cluster-overview.md
@@ -56,37 +56,40 @@ The recommended way to launch a compiled Spark application is through the spark-
 bin directory), which takes care of setting up the classpath with Spark and its dependencies, as well as
 provides a layer over the different cluster managers and deploy modes that Spark supports.  It's usage is
 
-  spark-submit <jar> <options>
+  spark-submit `<jar>` `<options>`
 
 Where options are any of:
 
-- **--class** - The main class to run.
-- **--master** - The URL of the cluster manager master, e.g. spark://host:port, mesos://host:port, yarn,
+- **\--class** - The main class to run.
+- **\--master** - The URL of the cluster manager master, e.g. spark://host:port, mesos://host:port, yarn,
   or local.
-- **--deploy-mode** - "client" to run the driver in the client process or "cluster" to run the driver in
+- **\--deploy-mode** - "client" to run the driver in the client process or "cluster" to run the driver in
   a process on the cluster.  For Mesos, only "client" is supported.
-- **--executor-memory** - Memory per executor (e.g. 1000M, 2G).
-- **--executor-cores** - Number of cores per executor.
-- **--driver-memory** - Memory for driver (e.g. 1000M, 2G)
-- **--name** - Name of the application.
-- **--arg** - Argument to be passed to the application's main class. This option can be specified
+- **\--executor-memory** - Memory per executor (e.g. 1000M, 2G).
+- **\--executor-cores** - Number of cores per executor. (Default: 2)
+- **\--driver-memory** - Memory for driver (e.g. 1000M, 2G)
+- **\--name** - Name of the application.
+- **\--arg** - Argument to be passed to the application's main class. This option can be specified
   multiple times to pass multiple arguments.
-- **--jars** - A comma-separated list of local jars to include on the driver classpath and that
+- **\--jars** - A comma-separated list of local jars to include on the driver classpath and that
   SparkContext.addJar will work with. Doesn't work on standalone with 'cluster' deploy mode.
 
 The following currently only work for Spark standalone with cluster deploy mode:
-- **--driver-cores** - Cores for driver (Default: 1).
-- **--supervise** - If given, restarts the driver on failure.
+
+- **\--driver-cores** - Cores for driver (Default: 1).
+- **\--supervise** - If given, restarts the driver on failure.
 
 The following only works for Spark standalone and Mesos only:
--  **--total-executor-cores** - Total cores for all executors.
+
+- **\--total-executor-cores** - Total cores for all executors.
 
 The following currently only work for YARN:
 
-- **--queue** - The YARN queue to place the application in.
-- **--files** - Comma separated list of files to be placed next to all executors
-- **--archives** - Comma separated list of archives to be extracted next to all executors
-- **--num-executors** - Number of executors to start.
+- **\--queue** - The YARN queue to place the application in.
+- **\--files** - Comma separated list of files to be placed in the working dir of each executor.
+- **\--archives** - Comma separated list of archives to be extracted into the working dir of each
+  executor.
+- **\--num-executors** - Number of executors (Default: 2).
 
 The master and deploy mode can also be set with the MASTER and DEPLOY_MODE environment variables.
 Values for these options passed via command line will override the environment variables.
@@ -143,6 +146,12 @@ The following table summarizes terms you'll see used to refer to cluster concept
       <td>Cluster manager</td>
       <td>An external service for acquiring resources on the cluster (e.g. standalone manager, Mesos, YARN)</td>
     </tr>
+    <tr>
+      <td>Deploy mode</td>
+      <td>Distinguishes where the driver process runs. In "cluster" mode, the framework launches
+        the driver inside of the cluster. In "client" mode, the submitter launches the driver
+        outside of the cluster.</td>
+    <tr>
     <tr>
       <td>Worker node</td>
       <td>Any node that can run application code in the cluster</td>