Skip to content

Start Docker Environment

Chris Fregly edited this page Sep 5, 2016 · 71 revisions

Setup Secure Shell (SSH)

MacOS + Linux ONLY

  • Navigate your browser to ...
MacOS + Linux ONLY:  http://advancedspark.com/keys/pipeline-training-gce.pem
  • Create the ~/.ssh/ directory if it doesn't already exist
mkdir -p ~/.ssh
  • Copy the downloaded file to the /Users/<username>/.ssh directory
cp ~/Downloads/pipeline-training-gce.pem ~/.ssh
  • Change the permission on the .pem file so that ssh doesn't complain
chmod 600 ~/.ssh/pipeline-training-gce.pem

Windows ONLY

  • Navigate your browser to ...
Windows ONLY:  http://advancedspark.com/keys/pipeline-training-gce.ppk
  • Download the .ppk file to a well-known directory

SSH Into Your Instance

Username: pipeline-training
Password: password9

MacOS + Linux ONLY

ssh -i ~/.ssh/pipeline-training-gce.pem pipeline-training@<your-cloud-ip>

Windows ONLY

  • Download Putty if you don't already have it
  • Enter pipeline-training@<your-cloud-ip> in the Host Name text box

Putty Host IP

  • Select the location of the .ppk under Connection -> SSH -> Auth and click Open

Putty ppk File

  • Select Yes to accept the scary security alert

Putty ppk accept security alert

  • Type password9 for the passphrase to avoid the scary, confusing error message

ppk type password

  • You're in!

ppk you are in

Verify Docker Images

  • Run the following to verify that you have the latest fluxcapacitor/pipeline Docker image
sudo docker images

### EXAMPLE OUTPUT ###
...
REPOSITORY               TAG                 IMAGE ID            CREATED             SIZE
fluxcapacitor/pipeline   latest              c392786d2afc        3 mins ago          13.17 GB
  • If you don't see the fluxcapacitor/pipeline Docker image listed, you will need to do sudo docker pull pipeline/fluxcapacitor

Advanced notes for those trying this at home:

  • At this point, you should be SSH'd into your specific cloud instance
  • Verify that you see the pipeline-training@<something> prompt
  • The cloud instance should be a fresh instance with no external processes running or bound to any ports
  • The Docker container will bind to many ports including port 80, so make sure even apache2 is disabled

Run Docker Container

  • Run the following command
sudo docker run -itd --privileged --name pipeline --net=host -m 50g -e "SPRING_PROFILES_ACTIVE=local" fluxcapacitor/pipeline

### EXAMPLE OUTPUT ###
WARNING: Your kernel does not support swap limit capabilities, memory limited without swap.
...
(Ignore This ^^^ WARNING ^^^)

Advanced notes for those trying this at home:

  • You may need to adjust the -m 50g memory if you're not on a cloud instance with 50+ GB of RAM
  • We highly recommend 50+ GB of RAM

Shell into you Docker Container

sudo docker exec -it pipeline bash
  • Verify that you see the root@<something>:~/pipeline# prompt

Configure and Start Pipeline Environment

  • Note: At this point, you are inside the Docker Container

  • Run the following command

cd $PIPELINE_HOME && git pull && source $CONFIG_HOME/bash/pipeline.bashrc && $SCRIPTS_HOME/setup/RUNME_ONCE.sh

Wait a few mins for initialization to complete... Ignore all errors!! This may take some time.

Verify Pipeline Environment

  • Run jps -l and verify that most of these services are running
jps -l

### EXAMPLE OUTPUT ###
...
737 org.elasticsearch.bootstrap.Elasticsearch                   <-- ElasticSearch
738 org.jruby.Main                                              <-- Logstash
1987 org.apache.zeppelin.server.ZeppelinServer                  <-- Zeppelin
2243 org.apache.spark.deploy.worker.Worker                      <-- Spark Worker
2123 org.apache.spark.deploy.master.Master                      <-- Spark Master
3479 sun.tools.jps.Jps                                          <-- this (jps)
1529 org.apache.zookeeper.server.quorum.QuorumPeerMain          <-- ZooKeeper
1973 io.confluent.support.metrics.SupportedKafka                <-- Kafka
2555 io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain  <-- Kafka SchemaRegistry
3408 io.confluent.kafkarest.Main                                <-- Kafka REST API
6107 org.apache.flink.runtime.jobmanager.JobManager             <-- Flink Service
2547 org.apache.hadoop.util.RunJar                              <-- Hive Metastore Service (Uses MySQL as backing store)
2908 com.facebook.presto.server.PrestoServer                    <-- Presto Server
...
  • Run export and verify that most of these services are running
export

### EXAMPLE OUTPUT ###
...
declare -x PIPELINE_HOME="/root/pipeline"
...
declare -x MYSQL_CONNECTOR_JAR="/usr/share/java/mysql-connector-java.jar"
...
declare -x SPRING_PROFILES_ACTIVE="local"

Run the Demo Spark Streaming Apps

  • Run the start-spark-streaming.sh script from anywhere
start-spark-streaming.sh

### EXAMPLE OUTPUT ###
...Starting Spark Streaming App...
...logs available with "tail -f $PIPELINE_HOME/logs/spark/streaming/ratings-kafka-cassandra.log"
  • Verify that the Spark Streaming process is running using jps -l
jps -l

### EXAMPLE OUTPUT ###
25688 org.apache.spark.executor.CoarseGrainedExecutorBackend  <-- Spark Executor JVM
25566 org.apache.spark.deploy.SparkSubmit                     <-- Long-running Spark Streaming App
  • Monitor the live Spark Streaming log file
tail-spark-streaming.sh

### EXAMPLE OUTPUT ###
-------------------------------------------
Time: 1466368854000 ms
-------------------------------------------
...
  • Hit Ctrl-C to exit

Verify Your Environment Setup

  • Navigate your browser to the Demo Home Page
  • Follow the steps detailed on the Demo Home Page
  • Keep an eye on the Spark Streaming Application log file from the previous step
  • You should see ratings flowing through the Spark Streaming Application log file.
http://<your-cloud-ip>
  • Click on the navigation links at the top to familiarize yourself with the tools of the environment

Troubleshooting

Cannot Connect to Demo Home Page or Navigation Links?

  • Your services are either not started or you have not configured your cloud instance firewall rules (GCE) or security groups (AWS) properly
  • Check out this Troubleshooting Guide if you're having problems
Clone this wiki locally