Skip to content

Commit e0c1728

Browse files
committed
Response to Matei's review
1 parent 27d57db commit e0c1728

File tree

2 files changed

+39
-38
lines changed

2 files changed

+39
-38
lines changed

docs/configuration.md

Lines changed: 33 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -64,22 +64,22 @@ This is a useful place to check to make sure that your properties have been set
6464
that only values explicitly specified through either `spark-defaults.conf` or SparkConf will
6565
appear. For all other configuration properties, you can assume the default value is used.
6666

67-
## All Configuration Properties
67+
## Available Properties
6868

6969
Most of the properties that control internal settings have reasonable default values. Some
7070
of the most common options to set are:
7171

7272
<table class="table">
7373
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
7474
<tr>
75-
<td><strong><code>spark.app.name</code></strong></td>
75+
<td><code>spark.app.name</code></td>
7676
<td>(none)</td>
7777
<td>
7878
The name of your application. This will appear in the UI and in log data.
7979
</td>
8080
</tr>
8181
<tr>
82-
<td><strong><code>spark.master</code></strong></td>
82+
<td><code>spark.master</code></td>
8383
<td>(none)</td>
8484
<td>
8585
The cluster manager to connect to. See the list of
@@ -244,15 +244,6 @@ Apart from these, the following properties are also available, and may be useful
244244
reduce the number of disk seeks and system calls made in creating intermediate shuffle files.
245245
</td>
246246
</tr>
247-
<tr>
248-
<td><code>spark.storage.memoryMapThreshold</code></td>
249-
<td>8192</td>
250-
<td>
251-
Size of a block, in bytes, above which Spark memory maps when reading a block from disk.
252-
This prevents Spark from memory mapping very small blocks. In general, memory
253-
mapping has high overhead for blocks close to or below the page size of the operating system.
254-
</td>
255-
</tr>
256247
<tr>
257248
<td><code>spark.reducer.maxMbInFlight</code></td>
258249
<td>48</td>
@@ -292,7 +283,7 @@ Apart from these, the following properties are also available, and may be useful
292283
<td><code>spark.eventLog.enabled</code></td>
293284
<td>false</td>
294285
<td>
295-
Whether to log spark events, useful for reconstructing the Web UI after the application has
286+
Whether to log Spark events, useful for reconstructing the Web UI after the application has
296287
finished.
297288
</td>
298289
</tr>
@@ -307,7 +298,7 @@ Apart from these, the following properties are also available, and may be useful
307298
<td><code>spark.eventLog.dir</code></td>
308299
<td>file:///tmp/spark-events</td>
309300
<td>
310-
Base directory in which spark events are logged, if <code>spark.eventLog.enabled</code> is true.
301+
Base directory in which Spark events are logged, if <code>spark.eventLog.enabled</code> is true.
311302
Within this base directory, Spark creates a sub-directory for each application, and logs the
312303
events specific to the application in this directory.
313304
</td>
@@ -457,13 +448,33 @@ Apart from these, the following properties are also available, and may be useful
457448
directories on Tachyon file system.
458449
</td>
459450
</tr>
451+
<tr>
452+
<td><code>spark.storage.memoryMapThreshold</code></td>
453+
<td>8192</td>
454+
<td>
455+
Size of a block, in bytes, above which Spark memory maps when reading a block from disk.
456+
This prevents Spark from memory mapping very small blocks. In general, memory
457+
mapping has high overhead for blocks close to or below the page size of the operating system.
458+
</td>
459+
</tr>
460460
<tr>
461461
<td><code>spark.tachyonStore.url</code></td>
462462
<td>tachyon://localhost:19998</td>
463463
<td>
464464
The URL of the underlying Tachyon file system in the TachyonStore.
465465
</td>
466466
</tr>
467+
<tr>
468+
<td><code>spark.cleaner.ttl</code></td>
469+
<td>(infinite)</td>
470+
<td>
471+
Duration (seconds) of how long Spark will remember any metadata (stages generated, tasks
472+
generated, etc.). Periodic cleanups will ensure that metadata older than this duration will be
473+
forgotten. This is useful for running Spark for many hours / days (for example, running 24/7 in
474+
case of Spark Streaming applications). Note that any RDD that persists in memory for more than
475+
this duration will be cleared as well.
476+
</td>
477+
</tr>
467478
</table>
468479

469480
#### Networking
@@ -539,7 +550,7 @@ Apart from these, the following properties are also available, and may be useful
539550
`spark.akka.failure-detector.threshold` if you need to. Only positive use case for using
540551
failure detector can be, a sensistive failure detector can help evict rogue executors really
541552
quick. However this is usually not the case as gc pauses and network lags are expected in a
542-
real spark cluster. Apart from that enabling this leads to a lot of exchanges of heart beats
553+
real Spark cluster. Apart from that enabling this leads to a lot of exchanges of heart beats
543554
between nodes leading to flooding the network with those.
544555
</td>
545556
</tr>
@@ -677,16 +688,16 @@ Apart from these, the following properties are also available, and may be useful
677688
<td><code>spark.authenticate</code></td>
678689
<td>false</td>
679690
<td>
680-
Whether spark authenticates its internal connections. See
681-
<code>spark.authenticate.secret</code> if not running on Yarn.
691+
Whether Spark authenticates its internal connections. See
692+
<code>spark.authenticate.secret</code> if not running on YARN.
682693
</td>
683694
</tr>
684695
<tr>
685696
<td><code>spark.authenticate.secret</code></td>
686697
<td>None</td>
687698
<td>
688699
Set the secret key used for Spark to authenticate between components. This needs to be set if
689-
not running on Yarn and authentication is enabled.
700+
not running on YARN and authentication is enabled.
690701
</td>
691702
</tr>
692703
<tr>
@@ -702,7 +713,8 @@ Apart from these, the following properties are also available, and may be useful
702713
<td>None</td>
703714
<td>
704715
Comma separated list of filter class names to apply to the Spark web ui. The filter should be a
705-
standard javax servlet Filter. Parameters to each filter can also be specified by setting a
716+
standard <a href="http://docs.oracle.com/javaee/6/api/javax/servlet/Filter.html">
717+
javax servlet Filter</a>. Parameters to each filter can also be specified by setting a
706718
java system property of spark.&lt;class name of filter&gt;.params='param1=value1,param2=value2'
707719
(e.g. -Dspark.ui.filters=com.test.filter1
708720
-Dspark.com.test.filter1.params='param1=foo,param2=testing')
@@ -712,7 +724,7 @@ Apart from these, the following properties are also available, and may be useful
712724
<td><code>spark.ui.acls.enable</code></td>
713725
<td>false</td>
714726
<td>
715-
Whether spark web ui acls should are enabled. If enabled, this checks to see if the user has
727+
Whether Spark web ui acls should are enabled. If enabled, this checks to see if the user has
716728
access permissions to view the web ui. See <code>spark.ui.view.acls</code> for more details.
717729
Also note this requires the user to be known, if the user comes across as null no checks
718730
are done. Filters can be used to authenticate and set the user.
@@ -722,7 +734,7 @@ Apart from these, the following properties are also available, and may be useful
722734
<td><code>spark.ui.view.acls</code></td>
723735
<td>Empty</td>
724736
<td>
725-
Comma separated list of users that have view access to the spark web ui. By default only the
737+
Comma separated list of users that have view access to the Spark web ui. By default only the
726738
user that started the Spark job has view access.
727739
</td>
728740
</tr>
@@ -731,17 +743,6 @@ Apart from these, the following properties are also available, and may be useful
731743
#### Spark Streaming
732744
<table class="table">
733745
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
734-
<tr>
735-
<td><code>spark.cleaner.ttl</code></td>
736-
<td>(infinite)</td>
737-
<td>
738-
Duration (seconds) of how long Spark will remember any metadata (stages generated, tasks
739-
generated, etc.). Periodic cleanups will ensure that metadata older than this duration will be
740-
forgotten. This is useful for running Spark for many hours / days (for example, running 24/7 in
741-
case of Spark Streaming applications). Note that any RDD that persists in memory for more than
742-
this duration will be cleared as well.
743-
</td>
744-
</tr>
745746
<tr>
746747
<td><code>spark.streaming.blockInterval</code></td>
747748
<td>200</td>

docs/spark-standalone.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ SPARK_MASTER_OPTS supports the following system properties:
157157
<table class="table">
158158
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
159159
<tr>
160-
<td>spark.deploy.spreadOut</td>
160+
<td><code>spark.deploy.spreadOut</code></td>
161161
<td>true</td>
162162
<td>
163163
Whether the standalone cluster manager should spread applications out across nodes or try
@@ -166,7 +166,7 @@ SPARK_MASTER_OPTS supports the following system properties:
166166
</td>
167167
</tr>
168168
<tr>
169-
<td>spark.deploy.defaultCores</td>
169+
<td><code>spark.deploy.defaultCores</code></td>
170170
<td>(infinite)</td>
171171
<td>
172172
Default number of cores to give to applications in Spark's standalone mode if they don't
@@ -177,7 +177,7 @@ SPARK_MASTER_OPTS supports the following system properties:
177177
</td>
178178
</tr>
179179
<tr>
180-
<td>spark.worker.timeout</td>
180+
<td><code>spark.worker.timeout</code></td>
181181
<td>60</td>
182182
<td>
183183
Number of seconds after which the standalone deploy master considers a worker lost if it
@@ -191,7 +191,7 @@ SPARK_WORKER_OPTS supports the following system properties:
191191
<table class="table">
192192
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
193193
<tr>
194-
<td>spark.worker.cleanup.enabled</td>
194+
<td><code>spark.worker.cleanup.enabled</code></td>
195195
<td>false</td>
196196
<td>
197197
Enable periodic cleanup of worker / application directories. Note that this only affects standalone
@@ -200,15 +200,15 @@ SPARK_WORKER_OPTS supports the following system properties:
200200
</td>
201201
</tr>
202202
<tr>
203-
<td>spark.worker.cleanup.interval</td>
203+
<td><code>spark.worker.cleanup.interval</code></td>
204204
<td>1800 (30 minutes)</td>
205205
<td>
206206
Controls the interval, in seconds, at which the worker cleans up old application work dirs
207207
on the local machine.
208208
</td>
209209
</tr>
210210
<tr>
211-
<td>spark.worker.cleanup.appDataTtl</td>
211+
<td><code>spark.worker.cleanup.appDataTtl</code></td>
212212
<td>7 * 24 * 3600 (7 days)</td>
213213
<td>
214214
The number of seconds to retain application work directories on each worker. This is a Time To Live

0 commit comments

Comments
 (0)