Skip to content
Closed
Prev Previous commit
Next Next commit
add it to the docs
  • Loading branch information
weixiuli committed Mar 15, 2019
commit 7e87f64c3ce878078728c7585d2590085c7440c8
14 changes: 13 additions & 1 deletion docs/spark-standalone.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,10 +236,11 @@ SPARK_WORKER_OPTS supports the following system properties:
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
<tr>
<td><code>spark.worker.cleanup.enabled</code></td>
<td>false</td>
<td>true</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have you changed this default value in the documentation? As I see it is still false.

val WORKER_CLEANUP_ENABLED = ConfigBuilder("spark.worker.cleanup.enabled")
.booleanConf
.createWithDefault(false)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry,i missed it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe attila meant to revert your change to the docs, not to change the default

<td>
Enable periodic cleanup of worker / application directories. Note that this only affects standalone
mode, as YARN works differently. Only the directories of stopped applications are cleaned up.
This must be enabled if spark.shuffle.service.db.enabled is "true"
</td>
</tr>
<tr>
Expand All @@ -260,6 +261,17 @@ SPARK_WORKER_OPTS supports the following system properties:
especially if you run jobs very frequently.
</td>
</tr>
<tr>
<td><spark.shuffle.service.db.enabled</code></td>
<td>true</td>
<td>
Enable record RegisteredExecutors information by leveldb, which can be reloaded and
used again when the external shuffle service is restarted. Note that this only affects standalone
mode, its has always on for yarn. We should Enable `spark.worker.cleanup.enabled` to remove the entry
(It will leave an entry in the DB forever when an application is stopped while the external shuffle
service is down) in the leveldb with WorkDirCleanup. It may be removed in the future.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor rewordings:

Store External Shuffle service state on local disk so that when the external shuffle service is restarted, it will
automatically reload info on current executors.  This only affects standalone mode (yarn always has this behavior
enabled).  You should also enable <code>spark.worker.cleanup.enabled</code>, to ensure that the state
eventually gets cleaned up.  This config may be removed in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clear description, thank you.

</td>
</tr>
<tr>
<td><code>spark.storage.cleanupFilesAfterExecutorExit</code></td>
<td>true</td>
Expand Down