Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 10 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,17 +39,21 @@ And run the following command, which should also return 1000:
## Example Programs

Spark also comes with several sample programs in the `examples` directory.
To run one of them, use `./bin/run-example <class> <params>`. For example:
To run one of them, use the `./bin/spark-submit` script. For example:

./bin/run-example org.apache.spark.examples.SparkLR local[2]
./bin/spark-submit \
--class org.apache.spark.examples.SparkLR \
--master local[2] \
lib/spark-examples*.jar

will run the Logistic Regression example locally on 2 CPUs.

Each of the example programs prints usage help if no params are given.
Many of the example programs print usage help if no params are given.

All of the Spark samples take a `<master>` parameter that is the cluster URL
to connect to. This can be a mesos:// or spark:// URL, or "local" to run
locally with one thread, or "local[N]" to run locally with N threads.
When running Spark examples you can pass `--master` parameter to the submission
script. This can be a mesos:// or spark:// URL, "yarn-cluster" or "yarn-client"
to run on YARN, and "local" to run locally with one thread, or "local[N]" to
run locally with N thread.

## Running Tests

Expand Down
2 changes: 1 addition & 1 deletion bin/pyspark
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ if [ ! -f "$FWDIR/RELEASE" ]; then
ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/spark-assembly*hadoop*.jar >& /dev/null
if [[ $? != 0 ]]; then
echo "Failed to find Spark assembly in $FWDIR/assembly/target" >&2
echo "You need to build Spark with sbt/sbt assembly before running this program" >&2
echo "You need to build Spark before running this program" >&2
exit 1
fi
fi
Expand Down
71 changes: 19 additions & 52 deletions bin/run-example
Original file line number Diff line number Diff line change
Expand Up @@ -17,28 +17,10 @@
# limitations under the License.
#

cygwin=false
case "`uname`" in
CYGWIN*) cygwin=true;;
esac

SCALA_VERSION=2.10

# Figure out where the Scala framework is installed
FWDIR="$(cd `dirname $0`/..; pwd)"

# Export this as SPARK_HOME
export SPARK_HOME="$FWDIR"

. $FWDIR/bin/load-spark-env.sh

if [ -z "$1" ]; then
echo "Usage: run-example <example-class> [<args>]" >&2
exit 1
fi

# Figure out the JAR file that our examples were packaged into. This includes a bit of a hack
# to avoid the -sources and -doc packages that are built by publish-local.
EXAMPLES_DIR="$FWDIR"/examples

if [ -f "$FWDIR/RELEASE" ]; then
Expand All @@ -49,46 +31,31 @@ fi

if [[ -z $SPARK_EXAMPLES_JAR ]]; then
echo "Failed to find Spark examples assembly in $FWDIR/lib or $FWDIR/examples/target" >&2
echo "You need to build Spark with sbt/sbt assembly before running this program" >&2
echo "You need to build Spark before running this program" >&2
exit 1
fi

SPARK_EXAMPLES_JAR_REL=${SPARK_EXAMPLES_JAR#$FWDIR/}

# Since the examples JAR ideally shouldn't include spark-core (that dependency should be
# "provided"), also add our standard Spark classpath, built using compute-classpath.sh.
CLASSPATH=`$FWDIR/bin/compute-classpath.sh`
CLASSPATH="$SPARK_EXAMPLES_JAR:$CLASSPATH"

if $cygwin; then
CLASSPATH=`cygpath -wp $CLASSPATH`
export SPARK_EXAMPLES_JAR=`cygpath -w $SPARK_EXAMPLES_JAR`
fi

# Find java binary
if [ -n "${JAVA_HOME}" ]; then
RUNNER="${JAVA_HOME}/bin/java"
else
if [ `command -v java` ]; then
RUNNER="java"
else
echo "JAVA_HOME is not set" >&2
exit 1
fi
fi
EXAMPLE_CLASS="<example-class>"
EXAMPLE_ARGS="[<example args>]"
EXAMPLE_MASTER=${MASTER:-"<master>"}

# Set JAVA_OPTS to be able to load native libraries and to set heap size
JAVA_OPTS="$SPARK_JAVA_OPTS"
# Load extra JAVA_OPTS from conf/java-opts, if it exists
if [ -e "$FWDIR/conf/java-opts" ] ; then
JAVA_OPTS="$JAVA_OPTS `cat $FWDIR/conf/java-opts`"
if [ -n "$1" ]; then
EXAMPLE_CLASS="$1"
shift
fi
export JAVA_OPTS

if [ "$SPARK_PRINT_LAUNCH_COMMAND" == "1" ]; then
echo -n "Spark Command: "
echo "$RUNNER" -cp "$CLASSPATH" $JAVA_OPTS "$@"
echo "========================================"
echo
if [ -n "$1" ]; then
EXAMPLE_ARGS="$@"
fi

exec "$RUNNER" -cp "$CLASSPATH" $JAVA_OPTS "$@"
echo "NOTE: This script has been replaced with ./bin/spark-submit. Please run:" >&2
echo
echo "./bin/spark-submit \\" >&2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this some more, I think maybe we should just call spark-submit with the supplied master instead of telling the user this stuff. Or we could call spark submit and then print out the user how to run this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I completely agree. We dont want the user to have to type out this more complicated stuff with library path and all. Just

bin/run-example org.apache.examples.spark.SparkPi <example params>

In fact, now that all the examples are inside spark.examples. package, we can try to make it even simpler. To run SparkPi, one should be able to just say

./bin/run-example SparkPi

That would very simple!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well but then you have streaming examples and mllib examples. Do we expect the user to type in millib.MovieLensALS then? I actually think the org.apache.examples.spark.SparkPi is more consistent with the rest (i.e. SparkSubmit). Maybe we should accept both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea so I think if it starts with org.apache.spark.examples we would pass it through. If not, we'll prepend org.apache.spark.examples.

echo " --master $EXAMPLE_MASTER \\" >&2
echo " --class $EXAMPLE_CLASS \\" >&2
echo " $SPARK_EXAMPLES_JAR_REL \\" >&2
echo " $EXAMPLE_ARGS" >&2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to self: if we call this directly we'll need to pass "$@"

echo
exit 1
2 changes: 1 addition & 1 deletion bin/spark-class
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ if [ ! -f "$FWDIR/RELEASE" ]; then
jars_list=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep "spark-assembly.*hadoop.*.jar")
if [ "$num_jars" -eq "0" ]; then
echo "Failed to find Spark assembly in $FWDIR/assembly/target/scala-$SCALA_VERSION/" >&2
echo "You need to build Spark with 'sbt/sbt assembly' before running this program." >&2
echo "You need to build Spark before running this program." >&2
exit 1
fi
if [ "$num_jars" -gt "1" ]; then
Expand Down
2 changes: 1 addition & 1 deletion docs/running-on-yarn.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ For example:
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1
examples/target/scala-{{site.SCALA_BINARY_VERSION}}/spark-examples-assembly-{{site.SPARK_VERSION}}.jar \
lib/spark-examples*.jar \
yarn-cluster 5

The above starts a YARN client program which starts the default Application Master. Then SparkPi will be run as a child thread of Application Master. The client will periodically poll the Application Master for status updates and display them in the console. The client will exit once your application has finished running. Refer to the "Viewing Logs" section below for how to see driver and executor logs.
Expand Down
2 changes: 2 additions & 0 deletions make-distribution.sh
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@
#

set -o pipefail
set -e

# Figure out where the Spark framework is installed
FWDIR="$(cd `dirname $0`; pwd)"
DISTDIR="$FWDIR/dist"
Expand Down