|
| 1 | +<!-- |
| 2 | +Copyright (C) 2017 Jordi Blasco |
| 3 | +Permission is granted to copy, distribute and/or modify this document |
| 4 | +under the terms of the GNU Free Documentation License, Version 1.3 |
| 5 | +or any later version published by the Free Software Foundation; |
| 6 | +with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. |
| 7 | +A copy of the license is included in the section entitled "GNU |
| 8 | +Free Documentation License". |
| 9 | +
|
| 10 | +HPCNow!, hereby disclaims all copyright interest in this document |
| 11 | +`snow-labs' written by Jordi Blasco. |
| 12 | +--> |
| 13 | +# Hands-On 0: Docker Swarm Introduction |
| 14 | +In this hands-on, we are going to learn how to interact with Docker Swarm cluster provisioned with sNow! cluster manager. |
| 15 | + |
| 16 | +*Estimated time : ~1 hour* |
| 17 | + |
| 18 | +## Requirements |
| 19 | +The following notes describe how to interact with a Docker Swarm cluster provisioned with sNow! cluster manager. |
| 20 | + |
| 21 | +This guide assumes that: |
| 22 | + |
| 23 | +1. You have at least one sNow! server. Ideally, one sNow! server and three compute nodes for production ready environment. |
| 24 | +2. The sNow! server will also provide access to share file system via NFS (/home and /sNow). Check the [sNow! documentation](https://hpcnow.github.io/snow-documentation) in order to integrate other cluster file systems like BeeGFS, Lustre or IBM Spectrum Scale. |
| 25 | + |
| 26 | +## Installation |
| 27 | +Docker Swarm manager nodes implement the Raft Consensus Algorithm to manage the global cluster state. |
| 28 | +This is key for managing and scheduling tasks in the cluster, and also storing the same consistent state. |
| 29 | + |
| 30 | +Raft tolerates up to (N-1)/2 failures and requires a majority or quorum of (N/2)+1 members to agree on values proposed to the cluster. This means that the size of the cluster should be at least 3 to resist one node failure or 5 to resist 3 nodes failures. |
| 31 | + |
| 32 | +This hands-on assumes that you have already deployed three VMs (domains) dedicated for Docker Swarm cluster or three compute nodes (production solution). |
| 33 | + |
| 34 | +By default manager nodes also act as a worker nodes. For a small systems or non-critical services, this is relatively low-risk. |
| 35 | +However, because manager nodes use the Raft consensus algorithm to replicate data in a consistent way, they are sensitive to resource starvation. In sNow! environment you can isolate managers in VMs without running other services and deploy few bare metal nodes as Docker Swarm workers. In order to do so, you can drain manager nodes to make them unavailable as worker nodes: |
| 36 | +``` |
| 37 | +docker node update --availability drain <NODEID> |
| 38 | +``` |
| 39 | +<!-- |
| 40 | +### Option 1: Deploy Docker Swarm in VMs |
| 41 | +Assuming that you have already defined three VMs (domains) dedicated for Docker Swarm cluster: |
| 42 | +
|
| 43 | +``` |
| 44 | +snow add domain swarm01 --role swarm-manager |
| 45 | +snow add domain swarm02 --role swarm-worker |
| 46 | +snow add domain swarm03 --role swarm-worker |
| 47 | +snow deploy swarm01 |
| 48 | +snow deploy swarm02 |
| 49 | +snow deploy swarm03 |
| 50 | +``` |
| 51 | +### Option 2: Deploy Docker Swarm in three compute nodes (production solution) |
| 52 | +Assuming that you have already defined three nodes dedicated for Docker Swarm cluster: |
| 53 | +
|
| 54 | +``` |
| 55 | +snow add node swarm01 --role swarm-manager |
| 56 | +snow add node swarm02 --role swarm-worker |
| 57 | +snow add node swarm03 --role swarm-worker |
| 58 | +snow deploy swarm01 |
| 59 | +snow deploy swarm02 |
| 60 | +snow deploy swarm03 |
| 61 | +``` |
| 62 | +--> |
| 63 | +## Swarm Interaction |
| 64 | + |
| 65 | +1. Check the status of the Docker Swarm cluster |
| 66 | + |
| 67 | +``` |
| 68 | +snow@swarm01:~$ docker node ls |
| 69 | +ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION |
| 70 | +hxakgodvxtz9ynsc0nniyz7ol * swarm01 Ready Active Leader 18.03.1-ce |
| 71 | +t3z5ru9ssu20b7a9i9bj4p5sl swarm02 Ready Active 18.03.1-ce |
| 72 | +o4vbamp797i2yrvo7anqlgd8y swarm03 Ready Active 18.03.1-ce |
| 73 | +``` |
| 74 | + |
| 75 | +2. Deploy the first application component as Docker service |
| 76 | +The following example will create a very simple container running one hour sleep as a service. |
| 77 | + |
| 78 | +``` |
| 79 | +snow@swarm01:~$ docker service create --name sleep_app alpine sleep 3600 |
| 80 | +8o9bicf4mkcpt2s0h23wwckn6 |
| 81 | +overall progress: 1 out of 1 tasks |
| 82 | +1/1: running [==================================================>] |
| 83 | +verify: Service converged |
| 84 | +
|
| 85 | +``` |
| 86 | +This will pull the ubuntu image and run 'sleep 3600' in one container. |
| 87 | + |
| 88 | +3. Verify that the service has been created in the Swarm cluster. |
| 89 | + |
| 90 | +``` |
| 91 | +snow@swarm01:~$ docker service ls |
| 92 | +ID NAME MODE REPLICAS IMAGE PORTS |
| 93 | +8o9bicf4mkcp sleep_app replicated 1/1 alpine:latest |
| 94 | +``` |
| 95 | +If you have previous experiences with Docker, it may not seem that we have done anything very differently than just running a docker run. The key difference is that the container has been scheduled on a swarm cluster. |
| 96 | + |
| 97 | +4. Scale the application |
| 98 | +Imagine a situation were this particular application is under high demand. Docker Swarm allows to re-scale and re-balance the service across the three swarm nodes. |
| 99 | + |
| 100 | +In the following example we will create 9 replicas of the example application. |
| 101 | + |
| 102 | +``` |
| 103 | +snow@swarm01:~$ docker service update --replicas 9 sleep_app |
| 104 | +sleep_app |
| 105 | +overall progress: 9 out of 9 tasks |
| 106 | +1/9: running [==================================================>] |
| 107 | +2/9: running [==================================================>] |
| 108 | +3/9: running [==================================================>] |
| 109 | +4/9: running [==================================================>] |
| 110 | +5/9: running [==================================================>] |
| 111 | +6/9: running [==================================================>] |
| 112 | +7/9: running [==================================================>] |
| 113 | +8/9: running [==================================================>] |
| 114 | +9/9: running [==================================================>] |
| 115 | +verify: Service converged |
| 116 | +``` |
| 117 | + |
| 118 | +The new replicas of the application will be scheduled evenly across the Swarm nodes. |
| 119 | +``` |
| 120 | +snow@swarm01:~$ docker service ps sleep_app |
| 121 | +ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS |
| 122 | +zgmceedsgj3l sleep_app.1 alpine:latest swarm03 Running Running about a minute ago |
| 123 | +m12an8ab0gx1 sleep_app.2 alpine:latest swarm03 Running Running 39 seconds ago |
| 124 | +n1f8t2scpaqo sleep_app.3 alpine:latest swarm02 Running Running 40 seconds ago |
| 125 | +f0leytx1fj3i sleep_app.4 alpine:latest swarm01 Running Running 35 seconds ago |
| 126 | +add5r8ik6npz sleep_app.5 alpine:latest swarm03 Running Running 40 seconds ago |
| 127 | +45md2xfryhqi sleep_app.6 alpine:latest swarm02 Running Running 39 seconds ago |
| 128 | +26vn4t7cyuuo sleep_app.7 alpine:latest swarm01 Running Running 35 seconds ago |
| 129 | +rtmfau8n152p sleep_app.8 alpine:latest swarm02 Running Running 39 seconds ago |
| 130 | +3o9nev8wwtu6 sleep_app.9 alpine:latest swarm01 Running Running 36 seconds ago |
| 131 | +``` |
| 132 | + |
| 133 | +5. Shrink the service |
| 134 | +Docker Swarm also allows running the inverse operation. The following example will reduce the number of replicas to 3. |
| 135 | + |
| 136 | +``` |
| 137 | +snow@swarm01:~$ docker service update --replicas 3 sleep_app |
| 138 | +``` |
| 139 | +You can see how Docker Swarm reduces the replicas with the following command: |
| 140 | +``` |
| 141 | +snow@swarm01:~$ watch docker service ps sleep_app |
| 142 | +ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS |
| 143 | +zgmceedsgj3l sleep_app.1 alpine:latest swarm03 Running Running 8 minutes ago |
| 144 | +n1f8t2scpaqo sleep_app.3 alpine:latest swarm02 Running Running 6 minutes ago |
| 145 | +f0leytx1fj3i sleep_app.4 alpine:latest swarm01 Running Running 6 minutes ago |
| 146 | +``` |
| 147 | +6. Reschedule the containers after a node failure or node draining. |
| 148 | +The following section will illustrate what happen when a node is drained or faulty. |
| 149 | + |
| 150 | +Take a look at the status of your nodes again by running ``docker node ls``. |
| 151 | +``` |
| 152 | +snow@swarm01:~$ docker node ls |
| 153 | +ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION |
| 154 | +hxakgodvxtz9ynsc0nniyz7ol * swarm01 Ready Active Leader 18.03.1-ce |
| 155 | +t3z5ru9ssu20b7a9i9bj4p5sl swarm02 Ready Active 18.03.1-ce |
| 156 | +o4vbamp797i2yrvo7anqlgd8y swarm03 Ready Active 18.03.1-ce |
| 157 | +``` |
| 158 | +We will simulate a node failure with the command ``snow destroy swarm02`` |
| 159 | + |
| 160 | +We can check the status of the cluster with the following command: |
| 161 | +``` |
| 162 | +root@swarm01:~# docker node ls |
| 163 | +ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION |
| 164 | +hxakgodvxtz9ynsc0nniyz7ol * swarm01 Ready Active Leader 18.03.1-ce |
| 165 | +t3z5ru9ssu20b7a9i9bj4p5sl swarm02 Down Active 18.03.1-ce |
| 166 | +o4vbamp797i2yrvo7anqlgd8y swarm03 Ready Active 18.03.1-ce |
| 167 | +``` |
| 168 | + |
| 169 | +The following command will show how the services have been re-balanced and re-scheduled: |
| 170 | +``` |
| 171 | +snow@swarm01:~$ docker service ps sleep_app |
| 172 | +ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS |
| 173 | +zgmceedsgj3l sleep_app.1 alpine:latest swarm03 Running Running 14 minutes ago |
| 174 | +1yv65qqonvju sleep_app.3 alpine:latest swarm03 Running Running 2 minutes ago |
| 175 | +n1f8t2scpaqo \_ sleep_app.3 alpine:latest swarm02 Shutdown Running 12 minutes ago |
| 176 | +f0leytx1fj3i sleep_app.4 alpine:latest swarm01 Running Running 12 minutes ago |
| 177 | +``` |
| 178 | + |
| 179 | +Finally, we can drain the Docker Swarm manager node (swarm03), which will degrade the cluster to only one node (swarm01). |
| 180 | + |
| 181 | +Using the command ``docker node update --availability drain <NODEID>``. Where the <NODEID> is provided by the command ``docker node ls``. |
| 182 | + |
| 183 | +``` |
| 184 | +snow@swarm01:~$ docker node update --availability drain o4vbamp797i2yrvo7anqlgd8y |
| 185 | +``` |
| 186 | +Check the status of the nodes |
| 187 | +``` |
| 188 | +snow@swarm01:~$ docker node ls |
| 189 | +ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION |
| 190 | +hxakgodvxtz9ynsc0nniyz7ol * swarm01 Ready Drain Leader 18.03.1-ce |
| 191 | +t3z5ru9ssu20b7a9i9bj4p5sl swarm02 Down Active 18.03.1-ce |
| 192 | +o4vbamp797i2yrvo7anqlgd8y swarm03 Ready Active 18.03.1-ce |
| 193 | +``` |
| 194 | +Finally, as expected, all the services have been migrated to the node swarm03 which also assumed the manager role. |
| 195 | +``` |
| 196 | +snow@swarm01:~$ docker service ps sleep_app |
| 197 | +ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS |
| 198 | +zgmceedsgj3l sleep_app.1 alpine:latest swarm03 Running Running 20 minutes ago |
| 199 | +1yv65qqonvju sleep_app.3 alpine:latest swarm03 Running Running 8 minutes ago |
| 200 | +n1f8t2scpaqo \_ sleep_app.3 alpine:latest swarm02 Shutdown Running 19 minutes ago |
| 201 | +wu8hj9j9e9uh sleep_app.4 alpine:latest swarm03 Running Running 2 minutes ago |
| 202 | +f0leytx1fj3i \_ sleep_app.4 alpine:latest swarm01 Shutdown Shutdown 2 minutes ago |
| 203 | +``` |
| 204 | + |
| 205 | +6. Cleaning Up |
| 206 | +The following example will remove the service. |
| 207 | +``` |
| 208 | +docker service rm sleep_app |
| 209 | +``` |
| 210 | +## Service Orchestration |
| 211 | +The following example represents much better the benefits of adopting this technology. |
| 212 | + |
| 213 | +### Defining a stack of services |
| 214 | +Docker Swarm allows to deploy a complete application stack to the swarm. The deploy command accepts a stack description in the form of a Compose file. |
| 215 | + |
| 216 | +In the container folder there is a file called monitoring-stack.yml which contains an example of a complete monitoring stack including a ElasticSearch cluster, kibana, InfluxDB, grafana, etc. |
| 217 | + |
| 218 | +Download this file to your swarm01 and execute the following command to orchestrate the monitoring stack example: |
| 219 | + |
| 220 | +``` |
| 221 | +snow@swarm01:~$ docker stack deploy -c monitoring-stack.yml monitor |
| 222 | +Creating service monitor_cadvisor |
| 223 | +Creating service monitor_elasticsearch |
| 224 | +Creating service monitor_elasticsearch2 |
| 225 | +Creating service monitor_elasticsearch3 |
| 226 | +Creating service monitor_kibana |
| 227 | +Creating service monitor_headPlugin |
| 228 | +Creating service monitor_influxdb |
| 229 | +Creating service monitor_grafana |
| 230 | +``` |
| 231 | +Now you can check the status of each service with the command: |
| 232 | +``` |
| 233 | +root@swarm01:~# docker stack services monitor |
| 234 | +ID NAME MODE REPLICAS IMAGE PORTS |
| 235 | +058pildrvl8i monitor_elasticsearch3 replicated 1/1 docker.elastic.co/elasticsearch/elasticsearch:6.2.4 |
| 236 | +4gbk7a7zoqz8 monitor_kibana replicated 0/1 docker.elastic.co/kibana/kibana:6.3.2 *:5601->5601/tcp |
| 237 | +aqsu06woh4wf monitor_headPlugin replicated 1/1 mobz/elasticsearch-head:5 *:9100->9100/tcp |
| 238 | +bh0r4hsbx3s3 monitor_grafana replicated 1/1 grafana/grafana:latest *:80->3000/tcp |
| 239 | +h0h8hhvwzd19 monitor_elasticsearch2 replicated 1/1 docker.elastic.co/elasticsearch/elasticsearch:6.2.4 |
| 240 | +juwnpqap4q1o monitor_influxdb replicated 1/1 influxdb:latest |
| 241 | +m0isumy3f1fe monitor_elasticsearch replicated 1/1 docker.elastic.co/elasticsearch/elasticsearch:6.2.4 *:9200->9200/tcp |
| 242 | +wdz7qvp39qwz monitor_cadvisor global 2/2 google/cadvisor:latest |
| 243 | +``` |
| 244 | + |
| 245 | +More information: |
| 246 | +* [Admin guide](https://docs.docker.com/engine/swarm/admin_guide/). |
| 247 | +* [Rolling update](https://docs.docker.com/engine/swarm/swarm-tutorial/rolling-update/) |
0 commit comments