Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
ac2d65e
Change spark.local.dir -> SPARK_LOCAL_DIRS
pwendell Mar 31, 2014
0faa3b6
Stash of adding config options in submit script and YARN
pwendell Apr 1, 2014
6eaf7d0
executorJavaOpts
pwendell Apr 1, 2014
4982331
Remove SPARK_LIBRARY_PATH
pwendell Apr 1, 2014
1f75238
SPARK_JAVA_OPTS --> SPARK_MASTER_OPTS for master settings
pwendell Apr 1, 2014
84cc5e5
Small clean-up
pwendell Apr 1, 2014
5b0ba8e
Don't ship executor envs
pwendell Apr 2, 2014
7cc70e4
Clean up terminology inside of spark-env script
pwendell Apr 2, 2014
761ebcd
Library path and classpath for drivers
pwendell Apr 2, 2014
437aed1
Small fix
pwendell Apr 2, 2014
46555c1
Review feedback and import clean-ups
pwendell Apr 13, 2014
b72d183
Review feedback for spark env file
pwendell Apr 13, 2014
ace4ead
Responses to review feedback.
pwendell Apr 13, 2014
b08893b
Additional improvements.
pwendell Apr 13, 2014
afc9ed8
Cleaning up line limits and two compile errors.
pwendell Apr 14, 2014
4ee6f9d
Making YARN doc changes consistent
pwendell Apr 14, 2014
c2a2909
Test compile fixes
pwendell Apr 14, 2014
be42f35
Handle case where SPARK_HOME is not set
pwendell Apr 15, 2014
e83cd8f
Changes to allow re-use of test applications
pwendell Apr 15, 2014
308f1f6
Properly escape quotes and other clean-up for YARN
pwendell Apr 15, 2014
fda0301
Note
pwendell Apr 15, 2014
ffa00fe
Review feedback
pwendell Apr 18, 2014
a762901
Fixing test failures
pwendell Apr 18, 2014
d50c388
Merge remote-tracking branch 'apache/master' into config-cleanup
pwendell Apr 18, 2014
a56b125
Responses to Tom's review
pwendell Apr 18, 2014
af0adf7
Automatically add user jar
pwendell Apr 18, 2014
b16e6a2
Cleanup of spark-submit script and Scala quick start guide
pwendell Apr 20, 2014
af09e3e
Mention config file in docs and clean-up docs
pwendell Apr 21, 2014
0086939
Minor style fixes
pwendell Apr 21, 2014
b4b496c
spark-defaults.properties -> spark-defaults.conf
pwendell Apr 21, 2014
a006464
Moving properties file template.
pwendell Apr 21, 2014
127f301
Improvements to testing
pwendell Apr 21, 2014
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Change spark.local.dir -> SPARK_LOCAL_DIRS
  • Loading branch information
pwendell committed Apr 13, 2014
commit ac2d65e9299109759ee9b46687acee2cac5b276c
2 changes: 2 additions & 0 deletions conf/spark-env.sh.template
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
#
# The following variables can be set in this file:
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
# - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
# we recommend setting app-wide options in the application's driver program.
Expand Down
9 changes: 9 additions & 0 deletions core/src/main/scala/org/apache/spark/SparkConf.scala
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,15 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {
new SparkConf(false).setAll(settings)
}

/** Print any necessary deprecation warnings based on the values set in this configuration. */
private[spark] def printDeprecationWarnings() {
if (settings.contains("spark.local.dir")) {
val msg = "In Spark 1.0 and later spark.local.dir will be overridden by the value set by " +
"the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN)."
logWarning(msg)
}
}

/**
* Return a string listing all keys and values, one per line. This is useful to print the
* configuration out for debugging.
Expand Down
5 changes: 3 additions & 2 deletions core/src/main/scala/org/apache/spark/executor/Executor.scala
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,10 @@ private[spark] class Executor(
// to what Yarn on this system said was available. This will be used later when SparkEnv
// created.
if (java.lang.Boolean.valueOf(
System.getProperty("SPARK_YARN_MODE", System.getenv("SPARK_YARN_MODE"))))
{
System.getProperty("SPARK_YARN_MODE", System.getenv("SPARK_YARN_MODE")))) {
conf.set("spark.local.dir", getYarnLocalDirs())
} else if (sys.env.contains("SPARK_LOCAL_DIRS")) {
conf.set("spark.local.dir", sys.env("SPARK_LOCAL_DIRS"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwendell

If we're running on local mode, then SparkEnv will have already been created and DiskBlockManager will have already created the local dirs using the previous value of "spark.local.dir". When we change "spark.local.dir" here, the local Executor will attempt to use local directories that might not exist, causing problems for local jobs that use addFIle().

I discovered this issue when debugging some spark-perf tests in local mode on an EC2 node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the problem here lies with spark-ec2's default configuration setting SPARK_LOCAL_DIRS on the master when it should only really be used on workers, and in not setting spark.local.dir.

I think the current documentation for SPARK_LOCAL_DIRS sort of suggests that it acts as an override, without any caveats about whether it only should be used on workers, etc.

}

if (!isLocal) {
Expand Down
6 changes: 4 additions & 2 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,9 @@ there are at least five properties that you will commonly want to control:
Directory to use for "scratch" space in Spark, including map output files and RDDs that get stored
on disk. This should be on a fast, local disk in your system. It can also be a comma-separated
list of multiple directories on different disks.

NOTE: In Spark 1.0 and later this will be overriden by SPARK_LOCAL_DIRS (Standalone, Mesos) or
LOCAL_DIRS (YARN) envrionment variables set by the cluster manager.
</td>
</tr>
<tr>
Expand Down Expand Up @@ -671,8 +674,7 @@ The following variables can be set in `spark-env.sh`:
Note that applications can also add dependencies for themselves through `SparkContext.addJar` -- we recommend
doing that when possible.
* `SPARK_JAVA_OPTS`, to add JVM options. This includes Java options like garbage collector settings and any system
properties that you'd like to pass with `-D`. One use case is to set some Spark properties differently on this
machine, e.g., `-Dspark.local.dir=/disk1,/disk2`.
properties that you'd like to pass with `-D`.
* Options for the Spark [standalone cluster scripts](spark-standalone.html#cluster-launch-scripts), such as number of cores
to use on each machine and maximum memory.

Expand Down