Skip to content

Commit 9cd33a3

Browse files
committed
updated the license placement and included hands-on docker swarm
1 parent 982f580 commit 9cd33a3

18 files changed

+536
-69
lines changed

README.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,20 @@
1+
<!--
2+
Copyright (C) 2017 Jordi Blasco
3+
Permission is granted to copy, distribute and/or modify this document
4+
under the terms of the GNU Free Documentation License, Version 1.3
5+
or any later version published by the Free Software Foundation;
6+
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
7+
A copy of the license is included in the section entitled "GNU
8+
Free Documentation License".
9+
10+
HPCNow!, hereby disclaims all copyright interest in this document
11+
`snow-labs' written by Jordi Blasco.
12+
-->
113
# sNow! Labs
14+
215
This repository contains the required material to deliver training to [sNow!](https://hpcnow.github.io/snow-documentation/) users and administrators. The content of this repository is distributed under the [GPLv3 license](LICENSE). As part of the sNow! project and as part of a community effort, we would like to encourage you to contribute with feedback, improving the quality and/or including new training material to speed up the learning curve for end users.
316

417
* [Official documentation of sNow!](https://hpcnow.github.io/snow-documentation/)
518
* [Source code project](https://bitbucket.org/hpcnow/snow-tools)
6-
* [HPCNow! website](http://hpcnow.com)
19+
* [HPCNow! website](http://hpcnow.com)
720
* [sNow! website](https://hpcnow.github.io/snow-web/) (under-construction)

admin-training/00-overview-demo.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,15 @@
1+
<!--
2+
Copyright (C) 2017 Jordi Blasco
3+
Permission is granted to copy, distribute and/or modify this document
4+
under the terms of the GNU Free Documentation License, Version 1.3
5+
or any later version published by the Free Software Foundation;
6+
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
7+
A copy of the license is included in the section entitled "GNU
8+
Free Documentation License".
9+
10+
HPCNow!, hereby disclaims all copyright interest in this document
11+
`snow-labs' written by Jordi Blasco.
12+
-->
113
# Hands-On 00: Overview Demo
214
In this hands-on, we are going to install a standard High Performance Computing cluster based on a single sNow! server.
315

@@ -218,7 +230,7 @@ Example:
218230
snow boot mycluster centos-7.4-minimal
219231
```
220232
## Modify Single System Image
221-
The following command provides write access to a chroot environment inside a rootfs image. The prompt provided by this command also shows that the shell session is allocated inside a particular image chroot.
233+
The following command provides write access to a chroot environment inside a rootfs image. The prompt provided by this command also shows that the shell session is allocated inside a particular image chroot.
222234

223235
In order to exit from this environment, type ```exit``` or press ```Ctrl+d```.
224236
```

admin-training/01-getting-access.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,15 @@
1+
<!--
2+
Copyright (C) 2017 Jordi Blasco
3+
Permission is granted to copy, distribute and/or modify this document
4+
under the terms of the GNU Free Documentation License, Version 1.3
5+
or any later version published by the Free Software Foundation;
6+
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
7+
A copy of the license is included in the section entitled "GNU
8+
Free Documentation License".
9+
10+
HPCNow!, hereby disclaims all copyright interest in this document
11+
`snow-labs' written by Jordi Blasco.
12+
-->
113
# Hands-On 01: Getting access
214
In this hands-on, we are going to deploy required services to operate a standard High Performance Computing cluster.
315

@@ -46,4 +58,3 @@ source ~/.bashrc
4658
```
4759
git clone --recursive https://github.com/HPCNow/snow-labs
4860
```
49-
Lines changed: 247 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,247 @@
1+
<!--
2+
Copyright (C) 2017 Jordi Blasco
3+
Permission is granted to copy, distribute and/or modify this document
4+
under the terms of the GNU Free Documentation License, Version 1.3
5+
or any later version published by the Free Software Foundation;
6+
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
7+
A copy of the license is included in the section entitled "GNU
8+
Free Documentation License".
9+
10+
HPCNow!, hereby disclaims all copyright interest in this document
11+
`snow-labs' written by Jordi Blasco.
12+
-->
13+
# Hands-On 0: Docker Swarm Introduction
14+
In this hands-on, we are going to learn how to interact with Docker Swarm cluster provisioned with sNow! cluster manager.
15+
16+
*Estimated time : ~1 hour*
17+
18+
## Requirements
19+
The following notes describe how to interact with a Docker Swarm cluster provisioned with sNow! cluster manager.
20+
21+
This guide assumes that:
22+
23+
1. You have at least one sNow! server. Ideally, one sNow! server and three compute nodes for production ready environment.
24+
2. The sNow! server will also provide access to share file system via NFS (/home and /sNow). Check the [sNow! documentation](https://hpcnow.github.io/snow-documentation) in order to integrate other cluster file systems like BeeGFS, Lustre or IBM Spectrum Scale.
25+
26+
## Installation
27+
Docker Swarm manager nodes implement the Raft Consensus Algorithm to manage the global cluster state.
28+
This is key for managing and scheduling tasks in the cluster, and also storing the same consistent state.
29+
30+
Raft tolerates up to (N-1)/2 failures and requires a majority or quorum of (N/2)+1 members to agree on values proposed to the cluster. This means that the size of the cluster should be at least 3 to resist one node failure or 5 to resist 3 nodes failures.
31+
32+
This hands-on assumes that you have already deployed three VMs (domains) dedicated for Docker Swarm cluster or three compute nodes (production solution).
33+
34+
By default manager nodes also act as a worker nodes. For a small systems or non-critical services, this is relatively low-risk.
35+
However, because manager nodes use the Raft consensus algorithm to replicate data in a consistent way, they are sensitive to resource starvation. In sNow! environment you can isolate managers in VMs without running other services and deploy few bare metal nodes as Docker Swarm workers. In order to do so, you can drain manager nodes to make them unavailable as worker nodes:
36+
```
37+
docker node update --availability drain <NODEID>
38+
```
39+
<!--
40+
### Option 1: Deploy Docker Swarm in VMs
41+
Assuming that you have already defined three VMs (domains) dedicated for Docker Swarm cluster:
42+
43+
```
44+
snow add domain swarm01 --role swarm-manager
45+
snow add domain swarm02 --role swarm-worker
46+
snow add domain swarm03 --role swarm-worker
47+
snow deploy swarm01
48+
snow deploy swarm02
49+
snow deploy swarm03
50+
```
51+
### Option 2: Deploy Docker Swarm in three compute nodes (production solution)
52+
Assuming that you have already defined three nodes dedicated for Docker Swarm cluster:
53+
54+
```
55+
snow add node swarm01 --role swarm-manager
56+
snow add node swarm02 --role swarm-worker
57+
snow add node swarm03 --role swarm-worker
58+
snow deploy swarm01
59+
snow deploy swarm02
60+
snow deploy swarm03
61+
```
62+
-->
63+
## Swarm Interaction
64+
65+
1. Check the status of the Docker Swarm cluster
66+
67+
```
68+
snow@swarm01:~$ docker node ls
69+
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
70+
hxakgodvxtz9ynsc0nniyz7ol * swarm01 Ready Active Leader 18.03.1-ce
71+
t3z5ru9ssu20b7a9i9bj4p5sl swarm02 Ready Active 18.03.1-ce
72+
o4vbamp797i2yrvo7anqlgd8y swarm03 Ready Active 18.03.1-ce
73+
```
74+
75+
2. Deploy the first application component as Docker service
76+
The following example will create a very simple container running one hour sleep as a service.
77+
78+
```
79+
snow@swarm01:~$ docker service create --name sleep_app alpine sleep 3600
80+
8o9bicf4mkcpt2s0h23wwckn6
81+
overall progress: 1 out of 1 tasks
82+
1/1: running [==================================================>]
83+
verify: Service converged
84+
85+
```
86+
This will pull the ubuntu image and run 'sleep 3600' in one container.
87+
88+
3. Verify that the service has been created in the Swarm cluster.
89+
90+
```
91+
snow@swarm01:~$ docker service ls
92+
ID NAME MODE REPLICAS IMAGE PORTS
93+
8o9bicf4mkcp sleep_app replicated 1/1 alpine:latest
94+
```
95+
If you have previous experiences with Docker, it may not seem that we have done anything very differently than just running a docker run. The key difference is that the container has been scheduled on a swarm cluster.
96+
97+
4. Scale the application
98+
Imagine a situation were this particular application is under high demand. Docker Swarm allows to re-scale and re-balance the service across the three swarm nodes.
99+
100+
In the following example we will create 9 replicas of the example application.
101+
102+
```
103+
snow@swarm01:~$ docker service update --replicas 9 sleep_app
104+
sleep_app
105+
overall progress: 9 out of 9 tasks
106+
1/9: running [==================================================>]
107+
2/9: running [==================================================>]
108+
3/9: running [==================================================>]
109+
4/9: running [==================================================>]
110+
5/9: running [==================================================>]
111+
6/9: running [==================================================>]
112+
7/9: running [==================================================>]
113+
8/9: running [==================================================>]
114+
9/9: running [==================================================>]
115+
verify: Service converged
116+
```
117+
118+
The new replicas of the application will be scheduled evenly across the Swarm nodes.
119+
```
120+
snow@swarm01:~$ docker service ps sleep_app
121+
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
122+
zgmceedsgj3l sleep_app.1 alpine:latest swarm03 Running Running about a minute ago
123+
m12an8ab0gx1 sleep_app.2 alpine:latest swarm03 Running Running 39 seconds ago
124+
n1f8t2scpaqo sleep_app.3 alpine:latest swarm02 Running Running 40 seconds ago
125+
f0leytx1fj3i sleep_app.4 alpine:latest swarm01 Running Running 35 seconds ago
126+
add5r8ik6npz sleep_app.5 alpine:latest swarm03 Running Running 40 seconds ago
127+
45md2xfryhqi sleep_app.6 alpine:latest swarm02 Running Running 39 seconds ago
128+
26vn4t7cyuuo sleep_app.7 alpine:latest swarm01 Running Running 35 seconds ago
129+
rtmfau8n152p sleep_app.8 alpine:latest swarm02 Running Running 39 seconds ago
130+
3o9nev8wwtu6 sleep_app.9 alpine:latest swarm01 Running Running 36 seconds ago
131+
```
132+
133+
5. Shrink the service
134+
Docker Swarm also allows running the inverse operation. The following example will reduce the number of replicas to 3.
135+
136+
```
137+
snow@swarm01:~$ docker service update --replicas 3 sleep_app
138+
```
139+
You can see how Docker Swarm reduces the replicas with the following command:
140+
```
141+
snow@swarm01:~$ watch docker service ps sleep_app
142+
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
143+
zgmceedsgj3l sleep_app.1 alpine:latest swarm03 Running Running 8 minutes ago
144+
n1f8t2scpaqo sleep_app.3 alpine:latest swarm02 Running Running 6 minutes ago
145+
f0leytx1fj3i sleep_app.4 alpine:latest swarm01 Running Running 6 minutes ago
146+
```
147+
6. Reschedule the containers after a node failure or node draining.
148+
The following section will illustrate what happen when a node is drained or faulty.
149+
150+
Take a look at the status of your nodes again by running ``docker node ls``.
151+
```
152+
snow@swarm01:~$ docker node ls
153+
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
154+
hxakgodvxtz9ynsc0nniyz7ol * swarm01 Ready Active Leader 18.03.1-ce
155+
t3z5ru9ssu20b7a9i9bj4p5sl swarm02 Ready Active 18.03.1-ce
156+
o4vbamp797i2yrvo7anqlgd8y swarm03 Ready Active 18.03.1-ce
157+
```
158+
We will simulate a node failure with the command ``snow destroy swarm02``
159+
160+
We can check the status of the cluster with the following command:
161+
```
162+
root@swarm01:~# docker node ls
163+
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
164+
hxakgodvxtz9ynsc0nniyz7ol * swarm01 Ready Active Leader 18.03.1-ce
165+
t3z5ru9ssu20b7a9i9bj4p5sl swarm02 Down Active 18.03.1-ce
166+
o4vbamp797i2yrvo7anqlgd8y swarm03 Ready Active 18.03.1-ce
167+
```
168+
169+
The following command will show how the services have been re-balanced and re-scheduled:
170+
```
171+
snow@swarm01:~$ docker service ps sleep_app
172+
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
173+
zgmceedsgj3l sleep_app.1 alpine:latest swarm03 Running Running 14 minutes ago
174+
1yv65qqonvju sleep_app.3 alpine:latest swarm03 Running Running 2 minutes ago
175+
n1f8t2scpaqo \_ sleep_app.3 alpine:latest swarm02 Shutdown Running 12 minutes ago
176+
f0leytx1fj3i sleep_app.4 alpine:latest swarm01 Running Running 12 minutes ago
177+
```
178+
179+
Finally, we can drain the Docker Swarm manager node (swarm03), which will degrade the cluster to only one node (swarm01).
180+
181+
Using the command ``docker node update --availability drain <NODEID>``. Where the <NODEID> is provided by the command ``docker node ls``.
182+
183+
```
184+
snow@swarm01:~$ docker node update --availability drain o4vbamp797i2yrvo7anqlgd8y
185+
```
186+
Check the status of the nodes
187+
```
188+
snow@swarm01:~$ docker node ls
189+
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
190+
hxakgodvxtz9ynsc0nniyz7ol * swarm01 Ready Drain Leader 18.03.1-ce
191+
t3z5ru9ssu20b7a9i9bj4p5sl swarm02 Down Active 18.03.1-ce
192+
o4vbamp797i2yrvo7anqlgd8y swarm03 Ready Active 18.03.1-ce
193+
```
194+
Finally, as expected, all the services have been migrated to the node swarm03 which also assumed the manager role.
195+
```
196+
snow@swarm01:~$ docker service ps sleep_app
197+
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
198+
zgmceedsgj3l sleep_app.1 alpine:latest swarm03 Running Running 20 minutes ago
199+
1yv65qqonvju sleep_app.3 alpine:latest swarm03 Running Running 8 minutes ago
200+
n1f8t2scpaqo \_ sleep_app.3 alpine:latest swarm02 Shutdown Running 19 minutes ago
201+
wu8hj9j9e9uh sleep_app.4 alpine:latest swarm03 Running Running 2 minutes ago
202+
f0leytx1fj3i \_ sleep_app.4 alpine:latest swarm01 Shutdown Shutdown 2 minutes ago
203+
```
204+
205+
6. Cleaning Up
206+
The following example will remove the service.
207+
```
208+
docker service rm sleep_app
209+
```
210+
## Service Orchestration
211+
The following example represents much better the benefits of adopting this technology.
212+
213+
### Defining a stack of services
214+
Docker Swarm allows to deploy a complete application stack to the swarm. The deploy command accepts a stack description in the form of a Compose file.
215+
216+
In the container folder there is a file called monitoring-stack.yml which contains an example of a complete monitoring stack including a ElasticSearch cluster, kibana, InfluxDB, grafana, etc.
217+
218+
Download this file to your swarm01 and execute the following command to orchestrate the monitoring stack example:
219+
220+
```
221+
snow@swarm01:~$ docker stack deploy -c monitoring-stack.yml monitor
222+
Creating service monitor_cadvisor
223+
Creating service monitor_elasticsearch
224+
Creating service monitor_elasticsearch2
225+
Creating service monitor_elasticsearch3
226+
Creating service monitor_kibana
227+
Creating service monitor_headPlugin
228+
Creating service monitor_influxdb
229+
Creating service monitor_grafana
230+
```
231+
Now you can check the status of each service with the command:
232+
```
233+
root@swarm01:~# docker stack services monitor
234+
ID NAME MODE REPLICAS IMAGE PORTS
235+
058pildrvl8i monitor_elasticsearch3 replicated 1/1 docker.elastic.co/elasticsearch/elasticsearch:6.2.4
236+
4gbk7a7zoqz8 monitor_kibana replicated 0/1 docker.elastic.co/kibana/kibana:6.3.2 *:5601->5601/tcp
237+
aqsu06woh4wf monitor_headPlugin replicated 1/1 mobz/elasticsearch-head:5 *:9100->9100/tcp
238+
bh0r4hsbx3s3 monitor_grafana replicated 1/1 grafana/grafana:latest *:80->3000/tcp
239+
h0h8hhvwzd19 monitor_elasticsearch2 replicated 1/1 docker.elastic.co/elasticsearch/elasticsearch:6.2.4
240+
juwnpqap4q1o monitor_influxdb replicated 1/1 influxdb:latest
241+
m0isumy3f1fe monitor_elasticsearch replicated 1/1 docker.elastic.co/elasticsearch/elasticsearch:6.2.4 *:9200->9200/tcp
242+
wdz7qvp39qwz monitor_cadvisor global 2/2 google/cadvisor:latest
243+
```
244+
245+
More information:
246+
* [Admin guide](https://docs.docker.com/engine/swarm/admin_guide/).
247+
* [Rolling update](https://docs.docker.com/engine/swarm/swarm-tutorial/rolling-update/)

0 commit comments

Comments
 (0)