forked from andypetrella/pipeline
-
Notifications
You must be signed in to change notification settings - Fork 0
Start Docker Environment
Chris Fregly edited this page Sep 5, 2016
·
71 revisions
- Navigate your browser to ...
MacOS + Linux ONLY: http://advancedspark.com/keys/pipeline-training-gce.pem
- Create the
~/.ssh/
directory if it doesn't already exist
mkdir -p ~/.ssh
- Copy the downloaded file to the
/Users/<username>/.ssh
directory
cp ~/Downloads/pipeline-training-gce.pem ~/.ssh
- Change the permission on the
.pem
file so thatssh
doesn't complain
chmod 600 ~/.ssh/pipeline-training-gce.pem
- Navigate your browser to ...
Windows ONLY: http://advancedspark.com/keys/pipeline-training-gce.ppk
- Download the
.ppk
file to a well-known directory
Username: pipeline-training
Password: password9
ssh -i ~/.ssh/pipeline-training-gce.pem pipeline-training@<your-cloud-ip>
- Download Putty if you don't already have it
- Enter
pipeline-training@<your-cloud-ip>
in theHost Name
text box
- Select the location of the
.ppk
underConnection -> SSH -> Auth
and clickOpen
- Select
Yes
to accept the scary security alert
- Type
password9
for the passphrase to avoid the scary, confusing error message
- You're in!
- Run the following to verify that you have the latest
fluxcapacitor/pipeline
Docker image
sudo docker images
### EXAMPLE OUTPUT ###
...
REPOSITORY TAG IMAGE ID CREATED SIZE
fluxcapacitor/pipeline latest c392786d2afc 3 mins ago 13.17 GB
- If you don't see the
fluxcapacitor/pipeline
Docker image listed, you will need to dosudo docker pull pipeline/fluxcapacitor
Advanced notes for those trying this at home:
- At this point, you should be SSH'd into your specific cloud instance
- Verify that you see the
pipeline-training@<something>
prompt - The cloud instance should be a fresh instance with no external processes running or bound to any ports
- The Docker container will bind to many ports including port 80, so make sure even
apache2
is disabled
- Run the following command
sudo docker run -itd --privileged --name pipeline --net=host -m 50g -e "SPRING_PROFILES_ACTIVE=local" fluxcapacitor/pipeline
### EXAMPLE OUTPUT ###
WARNING: Your kernel does not support swap limit capabilities, memory limited without swap.
...
(Ignore This ^^^ WARNING ^^^)
Advanced notes for those trying this at home:
- You may need to adjust the
-m 50g
memory if you're not on a cloud instance with 50+ GB of RAM - We highly recommend 50+ GB of RAM
sudo docker exec -it pipeline bash
- Verify that you see the
root@<something>:~/pipeline#
prompt
-
Note: At this point, you are inside the Docker Container
-
Run the following command
cd $PIPELINE_HOME && git pull && source $CONFIG_HOME/bash/pipeline.bashrc && $SCRIPTS_HOME/setup/RUNME_ONCE.sh
Wait a few mins for initialization to complete... Ignore all errors!! This may take some time.
- Run
jps -l
and verify that most of these services are running
jps -l
### EXAMPLE OUTPUT ###
...
737 org.elasticsearch.bootstrap.Elasticsearch <-- ElasticSearch
738 org.jruby.Main <-- Logstash
1987 org.apache.zeppelin.server.ZeppelinServer <-- Zeppelin
2243 org.apache.spark.deploy.worker.Worker <-- Spark Worker
2123 org.apache.spark.deploy.master.Master <-- Spark Master
3479 sun.tools.jps.Jps <-- this (jps)
1529 org.apache.zookeeper.server.quorum.QuorumPeerMain <-- ZooKeeper
1973 io.confluent.support.metrics.SupportedKafka <-- Kafka
2555 io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain <-- Kafka SchemaRegistry
3408 io.confluent.kafkarest.Main <-- Kafka REST API
6107 org.apache.flink.runtime.jobmanager.JobManager <-- Flink Service
2547 org.apache.hadoop.util.RunJar <-- Hive Metastore Service (Uses MySQL as backing store)
2908 com.facebook.presto.server.PrestoServer <-- Presto Server
...
- Run
export
and verify that most of these services are running
export
### EXAMPLE OUTPUT ###
...
declare -x PIPELINE_HOME="/root/pipeline"
...
declare -x MYSQL_CONNECTOR_JAR="/usr/share/java/mysql-connector-java.jar"
...
declare -x SPRING_PROFILES_ACTIVE="local"
- Run the
start-spark-streaming.sh
script from anywhere
start-spark-streaming.sh
### EXAMPLE OUTPUT ###
...Starting Spark Streaming App...
...logs available with "tail -f $PIPELINE_HOME/logs/spark/streaming/ratings-kafka-cassandra.log"
- Verify that the Spark Streaming process is running using
jps -l
jps -l
### EXAMPLE OUTPUT ###
25688 org.apache.spark.executor.CoarseGrainedExecutorBackend <-- Spark Executor JVM
25566 org.apache.spark.deploy.SparkSubmit <-- Long-running Spark Streaming App
- Monitor the live Spark Streaming log file
tail-spark-streaming.sh
### EXAMPLE OUTPUT ###
-------------------------------------------
Time: 1466368854000 ms
-------------------------------------------
...
- Hit
Ctrl-C
to exit
- Navigate your browser to the Demo Home Page
- Follow the steps detailed on the Demo Home Page
- Keep an eye on the Spark Streaming Application log file from the previous step
- You should see ratings flowing through the Spark Streaming Application log file.
http://<your-cloud-ip>
- Click on the navigation links at the top to familiarize yourself with the tools of the environment
- Your services are either not started or you have not configured your cloud instance firewall rules (GCE) or security groups (AWS) properly
- Check out this Troubleshooting Guide if you're having problems
Environment Setup
Demos
6. Serve Batch Recommendations
8. Streaming Probabilistic Algos
9. TensorFlow Image Classifier
Active Research (Unstable)
15. Kubernetes Docker Spark ML
Managing Environment
15. Stop and Start Environment