Adikteev
diff --git a/‎docs/custom-docker.md‎
Lines changed: 30 additions & 0 deletions b/‎docs/custom-docker.md‎
Lines changed: 30 additions & 0 deletions
diff --git a/‎docs/fault-tolerance.md‎
Lines changed: 16 additions & 11 deletions b/‎docs/fault-tolerance.md‎
Lines changed: 16 additions & 11 deletions
diff --git a/‎docs/hdfs.md‎
Lines changed: 105 additions & 0 deletions b/‎docs/hdfs.md‎
Lines changed: 105 additions & 0 deletions
diff --git a/‎docs/history-server.md‎
Lines changed: 43 additions & 0 deletions b/‎docs/history-server.md‎
Lines changed: 43 additions & 0 deletions
diff --git a/‎docs/index.md‎
Lines changed: 63 additions & 0 deletions b/‎docs/index.md‎
Lines changed: 63 additions & 0 deletions
@@ -0,0 +1,30 @@
+---
+post_title: Interactive Spark Shell
+menu_order: 95
+enterprise: 'yes'
+---
+
+**Note:** Custom Docker images are not supported by Mesosphere.
+
+You can customize the Docker image in which Spark runs by extending
+the standard Spark Docker image. In this way, you can install your own
+libraries, such as a custom Python library.
+
+1. In your Dockerfile, extend from the standard Spark image and add your
+customizations:
+
+    ```
+    FROM mesosphere/spark:1.0.4-2.0.1
+    RUN apt-get install -y python-pip
+    RUN pip install requests
+    ```
+
+1. Then, build an image from your Dockerfile.
+
+        $ docker build -t username/image:tag .
+        $ docker push username/image:tag
+
+1. Reference your custom Docker image with the `--docker-image` option
+when running a Spark job.
+
+        $ dcos spark run --docker-image=myusername/myimage:v1 --submit-args="http://external.website/mysparkapp.py 30"
@@ -1,4 +1,9 @@
-#Fault Tolerance
+---
+post_title: Fault Tolerance
+menu_order: 100
+feature_maturity: stable
+enterprise: 'yes'
+---
 
 Failures such as host, network, JVM, or application failures can
 affect the behavior of three types of Spark components:
@@ -7,7 +12,7 @@ affect the behavior of three types of Spark components:
 - Batch Jobs
 - Streaming Jobs
 
-## DC/OS Spark Service
+# DC/OS Spark Service
 
 The DC/OS Spark service runs in Marathon and includes the Mesos Cluster
 Dispatcher and the Spark History Server.  The Dispatcher manages jobs
@@ -16,19 +21,19 @@ The Spark History Server reads event logs from HDFS. If the service
 dies, Marathon will restart it, and it will reload data from these
 highly available stores.
 
-## Batch Jobs
+# Batch Jobs
 
 Batch jobs are resilient to executor failures, but not driver
 failures.  The Dispatcher will restart a driver if you submit with
 `--supervise`.
 
-### Driver
+## Driver
 
 When the driver fails, executors are terminated, and the entire Spark
 application fails.  If you submitted your job with `--supervise`, then
 the Dispatcher will restart the job.
 
-### Executors
+## Executors
 
 Batch jobs are resilient to executor failure.  Upon failure, cached
 data, shuffle files, and partially computed RDDs are lost.  However,
@@ -37,7 +42,7 @@ recompute this data from the original data source, caches, or shuffle
 files.  There is a performance cost as data is recomputed, but an
 executor failure will not cause a job to fail.
 
-## Streaming Jobs
+# Streaming Jobs
 
 Whereas batch jobs run once and can usually be restarted upon failure,
 streaming jobs often need to run constantly.  The application must
@@ -50,30 +55,30 @@ you can use the Direct Kafka API.
 For exactly once processing semantics, you must use the Direct Kafka
 API.  All other receivers provide at least once semantics.
 
-### Failures
+## Failures
 
 There are two types of failures:
 
 - Driver
 - Executor
 
-### Job Features
+## Job Features
 
 There are a few variables that affect the reliability of your job:
 
 - [WAL][1]
 - [Receiver reliability][2]
 - [Storage level][3]
 
-### Reliability Features
+## Reliability Features
 
 The two reliability features of a job are data loss and processing
 semantics.  Data loss occurs when the source sends data, but the job
 fails to process it.  Processing semantics describe how many times a
 received message is processed by the job.  It can be either "at least
 once" or "exactly once"
 
-#### Data loss
+### Data loss
 
 A Spark Job loses data when delivered data does not get processed.
 The following is a list of configurations with increasing data
@@ -140,7 +145,7 @@ preservation guarantees:
   executor failure => **no data loss**  
   driver failure => **no data loss**
 
-#### Processing semantics
+### Processing semantics
 
 Processing semantics apply to how many times received messages get
 processed.  With Spark Streaming, this can be either "at least once"
 
@@ -0,0 +1,105 @@
+---
+post_title: Configure Spark for HDFS
+nav_title: HDFS
+menu_order: 20
+enterprise: 'yes'
+---
+
+To configure Spark for a specific HDFS cluster, configure
+`hdfs.config-url` to be a URL that serves your `hdfs-site.xml` and
+`core-site.xml`. For example:
+
+    {
+      "hdfs": {
+        "config-url": "http://mydomain.com/hdfs-config"
+      }
+    }
+
+
+where `http://mydomain.com/hdfs-config/hdfs-site.xml` and
+`http://mydomain.com/hdfs-config/core-site.xml` are valid
+URLs.[Learn more][8].
+
+For DC/OS HDFS, these configuration files are served at
+`http://<hdfs.framework-name>.marathon.mesos:<port>/v1/connection`, where
+`<hdfs.framework-name>` is a configuration variable set in the HDFS
+package, and `<port>` is the port of its marathon app.
+
+# HDFS Kerberos
+
+You can access external (i.e. non-DC/OS) Kerberos-secured HDFS clusters
+from Spark on Mesos.
+
+## HDFS Configuration
+
+Once you've set up a Kerberos-enabled HDFS cluster, configure Spark to
+connect to it. See instructions [here](#hdfs).
+
+## Installation
+
+1.  A krb5.conf file tells Spark how to connect to your KDC.  Base64
+    encode this file:
+
+        $ cat krb5.conf | base64
+
+1.  Add the following to your JSON configuration file to enable
+Kerberos in Spark:
+
+        {
+           "security": {
+             "kerberos": {
+              "krb5conf": "<base64 encoding>"
+              }
+           }
+        }
+
+1. If you've enabled the history server via `history-server.enabled`,
+you must also configure the principal and keytab for the history
+server.  **WARNING**: The keytab contains secrets, so you should
+ensure you have SSL enabled while installing DC/OS Spark.
+
+    Base64 encode your keytab:
+
+        $ cat spark.keytab | base64
+
+    And add the following to your configuration file:
+
+         {
+            "history-server": {
+                "kerberos": {
+                  "principal": "spark@REALM",
+                  "keytab": "<base64 encoding>"
+                }
+            }
+         }
+
+1.  Install Spark with your custom configuration, here called
+`options.json`:
+
+        $ dcos package install --options=options.json spark
+
+## Job Submission
+
+To authenticate to a Kerberos KDC, DC/OS Spark supports keytab
+files as well as ticket-granting tickets (TGTs).
+
+Keytabs are valid infinitely, while tickets can expire. Especially for
+long-running streaming jobs, keytabs are recommended.
+
+### Keytab Authentication
+
+Submit the job with the keytab:
+
+    $ dcos spark run --submit-args="--principal user@REALM --keytab <keytab-file-path>..."
+
+### TGT Authentication
+
+Submit the job with the ticket:
+
+    $ dcos spark run --principal user@REALM --tgt <ticket-file-path>
+
+**Note:** These credentials are security-critical. We highly
+recommended configuring SSL encryption between the Spark
+components when accessing Kerberos-secured HDFS clusters. See the Security section for information on how to do this.
+
+[8]: http://spark.apache.org/docs/latest/configuration.html#inheriting-hadoop-cluster-configuration
@@ -0,0 +1,43 @@
+---
+post_title: History Server
+menu_order: 30
+enterprise: 'yes'
+---
+
+DC/OS Spark includes the [Spark history server][3]. Because the history
+server requires HDFS, you must explicitly enable it.
+
+1.  Install HDFS first:
+
+        $ dcos package install hdfs
+
+    **Note:** HDFS requires 5 private nodes.
+
+1.  Create a history HDFS directory (default is `/history`). [SSH into
+your cluster][10] and run:
+
+        $ hdfs dfs -mkdir /history
+
+1.  Enable the history server when you install Spark. Create a JSON
+configuration file. Here we call it `options.json`:
+
+        {
+           "history-server": {
+             "enabled": true
+           }
+        }
+
+1.  Install Spark:
+
+        $ dcos package install spark --options=options.json
+
+1.  Run jobs with the event log enabled:
+
+        $ dcos spark run --submit-args="-Dspark.eventLog.enabled=true -Dspark.eventLog.dir=hdfs://hdfs/history ... --class MySampleClass  http://external.website/mysparkapp.jar"
+
+1.  Visit your job in the dispatcher at
+`http://<dcos_url>/service/spark/Dispatcher/`. It will include a link
+to the history server entry for that job.
+
+ [3]: http://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact
+ [10]: https://docs.mesosphere.com/1.8/administration/access-node/sshcluster/
@@ -0,0 +1,63 @@
+---
+post_title: Spark
+menu_order: 110
+enterprise: 'yes'
+---
+
+Apache Spark is a fast and general-purpose cluster computing system for big
+data. It provides high-level APIs in Scala, Java, Python, and R, and
+an optimized engine that supports general computation graphs for data
+analysis. It also supports a rich set of higher-level tools including
+Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX
+for graph processing, and Spark Streaming for stream processing. For
+more information, see the [Apache Spark documentation][1].
+
+Apache DC/OS Spark consists of
+[Apache Spark with a few custom commits][17]
+along with
+[DC/OS-specific packaging][18].
+
+DC/OS Spark includes:
+
+*   [Mesos Cluster Dispatcher][2]
+*   [Spark History Server][3]
+*   DC/OS Spark CLI
+*   Interactive Spark shell
+
+# Benefits
+
+*   Utilization: DC/OS Spark leverages Mesos to run Spark on the same
+cluster as other DC/OS services
+*   Improved efficiency
+*   Simple Management
+*   Multi-team support
+*   Interactive analytics through notebooks
+*   UI integration
+*   Security
+
+# Features
+
+*   Multiversion support
+*   Run multiple Spark dispatchers
+*   Run against multiple HDFS clusters
+*   Backports of scheduling improvements
+*   Simple installation of all Spark components, including the
+dispatcher and the history server
+*   Integration of the dispatcher and history server
+*   Zeppelin integration
+*   Kerberos and SSL support
+
+# Related Services
+
+*   [HDFS][4]
+*   [Kafka][5]
+*   [Zeppelin][6]
+
+ [1]: http://spark.apache.org/documentation.html
+ [2]: http://spark.apache.org/docs/latest/running-on-mesos.html#cluster-mode
+ [3]: http://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact
+ [4]: https://docs.mesosphere.com/1.8/usage/service-guides/hdfs/
+ [5]: https://docs.mesosphere.com/1.8/usage/service-guides/kafka/
+ [6]: https://zeppelin.incubator.apache.org/
+ [17]: https://github.com/mesopshere/spark
+ [18]: https://github.com/mesopshere/spark-build