Skip to content
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Apache Spark

Spark is a fast and general cluster computing system for Big Data. It provides
high-level APIs in Scala, Java, and Python, and an optimized engine that
high-level APIs in Scala, Java, Python, and R, and an optimized engine that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear whether R is advertised as stable enough for general use?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R is listed in the first paragraph on this page, so I figured it should be in the README.md as well.

supports general computation graphs for data analysis. It also supports a
rich set of higher-level tools including Spark SQL for SQL and DataFrames,
MLlib for machine learning, GraphX for graph processing,
Expand Down Expand Up @@ -94,5 +94,5 @@ distribution.

## Configuration

Please refer to the [Configuration guide](http://spark.apache.org/docs/latest/configuration.html)
Please refer to the [Configuration Guide](http://spark.apache.org/docs/latest/configuration.html)
in the online documentation for an overview on how to configure Spark.
20 changes: 10 additions & 10 deletions docs/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ scala> val wordCounts = textFile.flatMap(line => line.split(" ")).map(word => (w
wordCounts: spark.RDD[(String, Int)] = spark.ShuffledAggregatedRDD@71f027b8
{% endhighlight %}

Here, we combined the [`flatMap`](programming-guide.html#transformations), [`map`](programming-guide.html#transformations) and [`reduceByKey`](programming-guide.html#transformations) transformations to compute the per-word counts in the file as an RDD of (String, Int) pairs. To collect the word counts in our shell, we can use the [`collect`](programming-guide.html#actions) action:
Here, we combined the [`flatMap`](programming-guide.html#transformations), [`map`](programming-guide.html#transformations), and [`reduceByKey`](programming-guide.html#transformations) transformations to compute the per-word counts in the file as an RDD of (String, Int) pairs. To collect the word counts in our shell, we can use the [`collect`](programming-guide.html#actions) action:

{% highlight scala %}
scala> wordCounts.collect()
Expand Down Expand Up @@ -163,7 +163,7 @@ One common data flow pattern is MapReduce, as popularized by Hadoop. Spark can i
>>> wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)
{% endhighlight %}

Here, we combined the [`flatMap`](programming-guide.html#transformations), [`map`](programming-guide.html#transformations) and [`reduceByKey`](programming-guide.html#transformations) transformations to compute the per-word counts in the file as an RDD of (string, int) pairs. To collect the word counts in our shell, we can use the [`collect`](programming-guide.html#actions) action:
Here, we combined the [`flatMap`](programming-guide.html#transformations), [`map`](programming-guide.html#transformations), and [`reduceByKey`](programming-guide.html#transformations) transformations to compute the per-word counts in the file as an RDD of (string, int) pairs. To collect the word counts in our shell, we can use the [`collect`](programming-guide.html#actions) action:

{% highlight python %}
>>> wordCounts.collect()
Expand Down Expand Up @@ -217,13 +217,13 @@ a cluster, as described in the [programming guide](programming-guide.html#initia
</div>

# Self-Contained Applications
Now say we wanted to write a self-contained application using the Spark API. We will walk through a
simple application in both Scala (with SBT), Java (with Maven), and Python.
Suppose we wish to write a self-contained application using the Spark API. We will walk through a
simple application in Scala (with SBT), Java (with Maven), and Python.

<div class="codetabs">
<div data-lang="scala" markdown="1">

We'll create a very simple Spark application in Scala. So simple, in fact, that it's
We'll create a very simple Spark application in Scala--so simple, in fact, that it's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough about R. This LGTM. I would have put spaces around this dash but it's too small to bother with.

named `SimpleApp.scala`:

{% highlight scala %}
Expand Down Expand Up @@ -258,8 +258,8 @@ We pass the SparkContext constructor a
object which contains information about our
application.

Our application depends on the Spark API, so we'll also include an sbt configuration file,
`simple.sbt` which explains that Spark is a dependency. This file also adds a repository that
Our application depends on the Spark API, so we'll also include an SBT configuration file,
`simple.sbt`, which explains that Spark is a dependency. This file also adds a repository that
Spark depends on:

{% highlight scala %}
Expand All @@ -272,7 +272,7 @@ scalaVersion := "{{site.SCALA_VERSION}}"
libraryDependencies += "org.apache.spark" %% "spark-core" % "{{site.SPARK_VERSION}}"
{% endhighlight %}

For sbt to work correctly, we'll need to layout `SimpleApp.scala` and `simple.sbt`
For SBT to work correctly, we'll need to layout `SimpleApp.scala` and `simple.sbt`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sbt is lower-case. jar seems equally upper and lower case, so, fine to change but on the whole I tend to leave things like that. I think the other changes are OK but not sure I'd bother with things that aren't obviously wrong, like an Oxford comma.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've switched all usages of "SBT" back to "sbt". At one point in time, SBT stood for "Simple Build Tool," but I can't find any remaining evidence of that on their site, so I'm guessing they've moved away from that.

according to the typical directory structure. Once that is in place, we can create a JAR package
containing the application's code, then use the `spark-submit` script to run our program.

Expand Down Expand Up @@ -302,7 +302,7 @@ Lines with a: 46, Lines with b: 23

</div>
<div data-lang="java" markdown="1">
This example will use Maven to compile an application jar, but any similar build system will work.
This example will use Maven to compile an application JAR, but any similar build system will work.

We'll create a very simple Spark application, `SimpleApp.java`:

Expand Down Expand Up @@ -374,7 +374,7 @@ $ find .
Now, we can package the application using Maven and execute it with `./bin/spark-submit`.

{% highlight bash %}
# Package a jar containing your application
# Package a JAR containing your application
$ mvn package
...
[INFO] Building jar: {..}/{..}/target/simple-project-1.0.jar
Expand Down