Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Provide limitations and instructions about running on GKE
  • Loading branch information
foxish committed Mar 8, 2017
commit bcb779bdf9ccd57a0f7cb8e50a6ce57e42ca9348
25 changes: 25 additions & 0 deletions docs/running-on-kubernetes-cloud.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
layout: global
title: Running Spark on the cloud with Kubernetes
---

For general information about running Spark on Kubernetes, refer to [this section](running-on-kubernetes.md).

A Kubernetes cluster may be brought up on different cloud providers or on premise. It is commonly provisioned through [Google Container Engine](https://cloud.google.com/container-engine/), or using [kops](https://github.com/kubernetes/kops) on AWS, or on premise using [kubeadm](https://kubernetes.io/docs/getting-started-guides/kubeadm/).

## Running on Google Container Engine (GKE)

* Create a GKE [container cluster](https://cloud.google.com/container-engine/docs/clusters/operations).
* Find the name of the master associated with this project.

> kubectl cluster-info
Kubernetes master is running at https://x.y.z.w:443

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: https://<master-ip>:443

* Run spark-submit with the master option set to `k8s://https://x.y.z.w:443`. The instructions for running spark-submit are provided in the [running on kubernetes](running-on-kubernetes.md) tutorial.
* Check that your driver pod, and subsequently your executor pods are launched using `kubectl get pods`.
* Read the stdout and stderr of the driver pod using `kubectl get logs`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the logs resource a GKE-specific thing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't exist. I made a mistake, it's the same "kubectl logs" as it is elsewhere. Fixed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps... kubectl logs <name-of-driver-pod> or a streaming version of it kubectl logs -f <name-of-driver-pod>


Known issues:
* If you face OAuth token expiry errors when you run spark-submit, it is likely because the token needs to be refreshed. The easiest way to fix this is to run any `kubectl` command, say, `kubectl version` and then retry your submission.



2 changes: 2 additions & 0 deletions docs/running-on-kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,8 @@ the submitting machine, and are uploaded to the driver running in Kubernetes bef

### Accessing Kubernetes Clusters

For details about running on public cloud environments, such as Google Container Engine (GKE), please refer to [our documentation](running-on-kubernetes-cloud.md).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the anchor text for this link something like "running on kubernetes cloud" ("our documentation" is pretty vague)


Spark-submit also supports submission through the
[local kubectl proxy](https://kubernetes.io/docs/user-guide/accessing-the-cluster/#using-kubectl-proxy). One can use the
authenticating proxy to communicate with the api server directly without passing credentials to spark-submit.
Expand Down