Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Response to Matei's review
  • Loading branch information
pwendell committed May 28, 2014
commit e0c17289ec77c7a2b9c717fbe5939435e2e2bb9e
65 changes: 33 additions & 32 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,22 +64,22 @@ This is a useful place to check to make sure that your properties have been set
that only values explicitly specified through either `spark-defaults.conf` or SparkConf will
appear. For all other configuration properties, you can assume the default value is used.

## All Configuration Properties
## Available Properties

Most of the properties that control internal settings have reasonable default values. Some
of the most common options to set are:

<table class="table">
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
<tr>
<td><strong><code>spark.app.name</code></strong></td>
<td><code>spark.app.name</code></td>
<td>(none)</td>
<td>
The name of your application. This will appear in the UI and in log data.
</td>
</tr>
<tr>
<td><strong><code>spark.master</code></strong></td>
<td><code>spark.master</code></td>
<td>(none)</td>
<td>
The cluster manager to connect to. See the list of
Expand Down Expand Up @@ -244,15 +244,6 @@ Apart from these, the following properties are also available, and may be useful
reduce the number of disk seeks and system calls made in creating intermediate shuffle files.
</td>
</tr>
<tr>
<td><code>spark.storage.memoryMapThreshold</code></td>
<td>8192</td>
<td>
Size of a block, in bytes, above which Spark memory maps when reading a block from disk.
This prevents Spark from memory mapping very small blocks. In general, memory
mapping has high overhead for blocks close to or below the page size of the operating system.
</td>
</tr>
<tr>
<td><code>spark.reducer.maxMbInFlight</code></td>
<td>48</td>
Expand Down Expand Up @@ -292,7 +283,7 @@ Apart from these, the following properties are also available, and may be useful
<td><code>spark.eventLog.enabled</code></td>
<td>false</td>
<td>
Whether to log spark events, useful for reconstructing the Web UI after the application has
Whether to log Spark events, useful for reconstructing the Web UI after the application has
finished.
</td>
</tr>
Expand All @@ -307,7 +298,7 @@ Apart from these, the following properties are also available, and may be useful
<td><code>spark.eventLog.dir</code></td>
<td>file:///tmp/spark-events</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file:/// URI implies to me that I could put HDFS or S3 URIs here. Is that allowed?

<td>
Base directory in which spark events are logged, if <code>spark.eventLog.enabled</code> is true.
Base directory in which Spark events are logged, if <code>spark.eventLog.enabled</code> is true.
Within this base directory, Spark creates a sub-directory for each application, and logs the
events specific to the application in this directory.
</td>
Expand Down Expand Up @@ -457,13 +448,33 @@ Apart from these, the following properties are also available, and may be useful
directories on Tachyon file system.
</td>
</tr>
<tr>
<td><code>spark.storage.memoryMapThreshold</code></td>
<td>8192</td>
<td>
Size of a block, in bytes, above which Spark memory maps when reading a block from disk.
This prevents Spark from memory mapping very small blocks. In general, memory
mapping has high overhead for blocks close to or below the page size of the operating system.
</td>
</tr>
<tr>
<td><code>spark.tachyonStore.url</code></td>
<td>tachyon://localhost:19998</td>
<td>
The URL of the underlying Tachyon file system in the TachyonStore.
</td>
</tr>
<tr>
<td><code>spark.cleaner.ttl</code></td>
<td>(infinite)</td>
<td>
Duration (seconds) of how long Spark will remember any metadata (stages generated, tasks
generated, etc.). Periodic cleanups will ensure that metadata older than this duration will be
forgotten. This is useful for running Spark for many hours / days (for example, running 24/7 in
case of Spark Streaming applications). Note that any RDD that persists in memory for more than
this duration will be cleared as well.
</td>
</tr>
</table>

#### Networking
Expand Down Expand Up @@ -539,7 +550,7 @@ Apart from these, the following properties are also available, and may be useful
`spark.akka.failure-detector.threshold` if you need to. Only positive use case for using
failure detector can be, a sensistive failure detector can help evict rogue executors really
quick. However this is usually not the case as gc pauses and network lags are expected in a
real spark cluster. Apart from that enabling this leads to a lot of exchanges of heart beats
real Spark cluster. Apart from that enabling this leads to a lot of exchanges of heart beats
between nodes leading to flooding the network with those.
</td>
</tr>
Expand Down Expand Up @@ -677,16 +688,16 @@ Apart from these, the following properties are also available, and may be useful
<td><code>spark.authenticate</code></td>
<td>false</td>
<td>
Whether spark authenticates its internal connections. See
<code>spark.authenticate.secret</code> if not running on Yarn.
Whether Spark authenticates its internal connections. See
<code>spark.authenticate.secret</code> if not running on YARN.
</td>
</tr>
<tr>
<td><code>spark.authenticate.secret</code></td>
<td>None</td>
<td>
Set the secret key used for Spark to authenticate between components. This needs to be set if
not running on Yarn and authentication is enabled.
not running on YARN and authentication is enabled.
</td>
</tr>
<tr>
Expand All @@ -702,7 +713,8 @@ Apart from these, the following properties are also available, and may be useful
<td>None</td>
<td>
Comma separated list of filter class names to apply to the Spark web ui. The filter should be a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UI

standard javax servlet Filter. Parameters to each filter can also be specified by setting a
standard <a href="http://docs.oracle.com/javaee/6/api/javax/servlet/Filter.html">
javax servlet Filter</a>. Parameters to each filter can also be specified by setting a
java system property of spark.&lt;class name of filter&gt;.params='param1=value1,param2=value2'
(e.g. -Dspark.ui.filters=com.test.filter1
-Dspark.com.test.filter1.params='param1=foo,param2=testing')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look weird now, they might be better with a <br /> before each line

Expand All @@ -712,7 +724,7 @@ Apart from these, the following properties are also available, and may be useful
<td><code>spark.ui.acls.enable</code></td>
<td>false</td>
<td>
Whether spark web ui acls should are enabled. If enabled, this checks to see if the user has
Whether Spark web ui acls should are enabled. If enabled, this checks to see if the user has
access permissions to view the web ui. See <code>spark.ui.view.acls</code> for more details.
Also note this requires the user to be known, if the user comes across as null no checks
are done. Filters can be used to authenticate and set the user.
Expand All @@ -722,7 +734,7 @@ Apart from these, the following properties are also available, and may be useful
<td><code>spark.ui.view.acls</code></td>
<td>Empty</td>
<td>
Comma separated list of users that have view access to the spark web ui. By default only the
Comma separated list of users that have view access to the Spark web ui. By default only the
user that started the Spark job has view access.
</td>
</tr>
Expand All @@ -731,17 +743,6 @@ Apart from these, the following properties are also available, and may be useful
#### Spark Streaming
<table class="table">
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
<tr>
<td><code>spark.cleaner.ttl</code></td>
<td>(infinite)</td>
<td>
Duration (seconds) of how long Spark will remember any metadata (stages generated, tasks
generated, etc.). Periodic cleanups will ensure that metadata older than this duration will be
forgotten. This is useful for running Spark for many hours / days (for example, running 24/7 in
case of Spark Streaming applications). Note that any RDD that persists in memory for more than
this duration will be cleared as well.
</td>
</tr>
<tr>
<td><code>spark.streaming.blockInterval</code></td>
<td>200</td>
Expand Down
12 changes: 6 additions & 6 deletions docs/spark-standalone.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ SPARK_MASTER_OPTS supports the following system properties:
<table class="table">
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
<tr>
<td>spark.deploy.spreadOut</td>
<td><code>spark.deploy.spreadOut</code></td>
<td>true</td>
<td>
Whether the standalone cluster manager should spread applications out across nodes or try
Expand All @@ -166,7 +166,7 @@ SPARK_MASTER_OPTS supports the following system properties:
</td>
</tr>
<tr>
<td>spark.deploy.defaultCores</td>
<td><code>spark.deploy.defaultCores</code></td>
<td>(infinite)</td>
<td>
Default number of cores to give to applications in Spark's standalone mode if they don't
Expand All @@ -177,7 +177,7 @@ SPARK_MASTER_OPTS supports the following system properties:
</td>
</tr>
<tr>
<td>spark.worker.timeout</td>
<td><code>spark.worker.timeout</code></td>
<td>60</td>
<td>
Number of seconds after which the standalone deploy master considers a worker lost if it
Expand All @@ -191,7 +191,7 @@ SPARK_WORKER_OPTS supports the following system properties:
<table class="table">
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
<tr>
<td>spark.worker.cleanup.enabled</td>
<td><code>spark.worker.cleanup.enabled</code></td>
<td>false</td>
<td>
Enable periodic cleanup of worker / application directories. Note that this only affects standalone
Expand All @@ -200,15 +200,15 @@ SPARK_WORKER_OPTS supports the following system properties:
</td>
</tr>
<tr>
<td>spark.worker.cleanup.interval</td>
<td><code>spark.worker.cleanup.interval</code></td>
<td>1800 (30 minutes)</td>
<td>
Controls the interval, in seconds, at which the worker cleans up old application work dirs
on the local machine.
</td>
</tr>
<tr>
<td>spark.worker.cleanup.appDataTtl</td>
<td><code>spark.worker.cleanup.appDataTtl</code></td>
<td>7 * 24 * 3600 (7 days)</td>
<td>
The number of seconds to retain application work directories on each worker. This is a Time To Live
Expand Down