I have an old computer cluster, and the nodes have not any virtualization extensions. So I’m trying to use it to run Docker containers. But I do not want to choose in which of the internal nodes I have to run the containers. So I am using Docker Swarm, and I will use it as a single Docker host, by calling the main node to execute the containers and the swarm will decide the host in which the container will be ran. So this time…
I learned how to create a simple Docker Swarm cluster with a single front-end and multiple internal nodes
The official documentation of Docker includes this post that describes how to do it, but whereas it is very easy, I prefer to describe my specific use case.
- 1 Master node with the public IP 220.127.116.11 and the private IP 10.100.0.1.
- 3 Nodes with the private IPs 10.100.0.2, 10.100.0.3 and 10.100.0.4
I want to call the master node to create a container from other computer (e.g. 18.104.22.168), and leave the master to choose in which internal node is hosted the container.
Preparing the master node
First of all, I will install Docker
$ curl -sSL https://get.docker.com/ | sh
Now it is needed to install consul that is a backend for key-value storage. It will run as a container in the front-end (and it will be used by the internal nodes to synchronize with the master)
$ docker run -d -p 8500:8500 --name=consul progrium/consul -server -bootstrap
Finally I will launch the swarm master
$ docker run -d -p 4000:4000 swarm manage -H :4000 --advertise 10.100.0.1:4000 consul://10.100.0.1:8500
(*) remember that consul is installed in the front-end, but you could detach it and install in another node if you want (need) to.
Installing the internal nodes
Again, we should install Docker and export docker through the IP
$ curl -sSL https://get.docker.com/ | sh
And once it is running, it is needed to expose the docker API through the IP address of the node. The easy way to test it is to launch the daemon using the following option:
$ docker daemon -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock
Now you should be able to issue command line options such as$ docker -H :2375 info
or even from other hosts$ docker -H 10.100.0.2:2375 info
The underlying aim is that with swarm you are able to expose the local docker daemon to be used remotely in the swarm.
To make the changes persistent, you should set the parameters in the docker configuration file /etc/default/docker:
DOCKER_OPTS="-H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock"
It seems that docker version 1.11 has a bug and does not properly use that file (at least in ubuntu 16.04). So you can modify the file /lib/systemd/system/docker.service and set new commandline to launch the docker daemon.ExecStart=/usr/bin/docker daemon -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock -H fd://
Finally now we have to launch the swarm on each node
- On node 10.100.0.2
docker run --restart=always -d swarm join --advertise=10.100.0.2:2375 consul://10.100.0.1:8500
- On node 10.100.0.3
docker run --restart=always -d swarm join --advertise=10.100.0.3:2375 consul://10.100.0.1:8500
- On node 10.100.0.4
docker run --restart=always -d swarm join --advertise=10.100.0.4:2375 consul://10.100.0.1:8500Next steps: communicating containers between them
Next steps: communicating the containers
If you launch new containers as usual (i.e. docker run -it containerimage bash), you will get containers with overlapping IPs. This is because you are using the default network scheme in the individual docker servers.
If you want to have a common network, you need to create an overlay network that spans across the different docker daemons.
But in order to be able to make it, you need to change the way that the docker daemons are being started. You need a system to coordinate the network, and it can be the same consul that we are using.
So you have to append the next flags to the command line that starts docker:
--cluster-advertise eth1:2376 --cluster-store consul://10.100.0.1:8500
You can add the parameters to the docker configuration file /etc/default/docker. In the case of the internal nodes, the result will be the next (according to our previous modifications):
DOCKER_OPTS="-H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock --cluster-advertise eth1:2376 --cluster-store consul://10.100.0.1:8500"
As stated before, docker version 1.11 has a bug and does not properly use that file. In the meanwhile you can modify the file /lib/systemd/system/docker.service and set new commandline to launch the docker daemon.ExecStart=/usr/bin/docker daemon -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock --cluster-advertise eth1:2376 --cluster-store consul://10.100.0.1:8500
(*) We are using eth1 because it is the device in which our internal IP address is. You should use the device to which the 10.100.0.x address is assigned.
Now you must restart the docker daemons of ALL the nodes in the swarm.
Once they have been restarted, you can create a new network for the swarm:
$ docker -H 10.100.0.1:4000 network create swarm-network
And then you can use it for the creation of the containers:
$ docker -H 10.100.0.1:4000 run -it --net=swarm-network ubuntu:latest bash
Now the IPs will be given in a coordinated way, and the containers will have several IPs (the IP in the swarm and its IP in the local docker server).
Some more words on this
This post is made in May/2016. Both docker and swarm are evolving and maybe this post is outdated soon.
Some things that bother me on this installation…
- While using the overlay network, if you expose one port using the flag -p, the port is exposed in the IP from the internal docker host. I think that you should be able to express in which IP you want to expose the port or use the IP from the main server.
- I solve this issue by using a development made by me IPFloater: Once I create the container, I get the internal IP in which the port is exposed and I create a redirection in IPFloater, to be able to access the container through a specific IP.
- Consul fails A LOT. If I leave the swarm running for hours (i.e. 8 hours) consul will probably fail. If I run a command like this: “docker run –rm=true swarm list consul://10.100.0.1:8500”, it states that it has a fail. Then I have to delete the container and create a new one.