In part one of this series, we looked at getting a MariaDB cluster up and running on top of a multi-master docker swarm with Consul running on the swarm itself – in under 15 minutes. The accompanying github repository includes several helper scripts geared towards getting the environment up and running rapidly in IBM Softlayer. In this second part of the series, we’ll drill-down and take a closer look at the process itself for those who use a different cloud provider, want to adapt things to suit their orchestration process, or just prefer to learn by doing things from scratch.
Build Goals, Revisited
- Multi-master, highly-available docker swarm cluster.
- HA Consul key-value store running on the swarm itself.
- Containerized MariaDB Galera cluster running on the swarm, natch.
- Use the btrfs storage driver (or alternately, device-mapper with LVM).
- Overlay network between MariaDB nodes for cluster communication etc.
- Percona Xtrabackup instead of rsync to reduce locking during state transfers.
- Create the multi-master Swarm and Consul cluster
- Deploy MariaDB Galera cluster on the Swarm
Provision the instances
To provision the swarm nodes we’re going to use the generic
driver for docker-machine. To get ready for to use the generic driver, we’ll need to use our provider’s toolset to pre-provision the instances so we can then feed them over to machine for creation as swarm nodes. Yes, there is a Softlayer driver for machine, but we’re not going to use it. More on that later-on, but in the mean time lets use the provisioning script to create our initial Softlayer instances.
In order to keep things self-contained, the helper script
generates an SSH key to use with the provisioned instances, adds that key to your softlayer account and then loops through and provisions the nodes specified in your swarm.local
file. The generated ssh key is stored in the ssh directory. To provision manually, you will need to generate an ssh key or use an existing key in your account and then pass the name of the key to slcli
when ordering:
# Vars taken from swarm.local
node="sw1" # Node Name
sl_sshkey_name="mysshkey" # SSH Key to use from Control Panel
sl_domain="" # Softlayer Domain
sl_cpu=1 # Number of Cores
sl_memory=1 # Amount of RAM
sl_region="tor01" # Softlayer Datacenter
sl_billing="hourly" # Hourly or Monthly Billing
sl_os_disk_size=25 # OS volume Size
sl_docker_disk_size=25 # Docker volume size
sl_public_vlan_id=123456 # Public iface VLAN ID
sl_private_vlan_id=654321 # Private iface VLAN ID
# Order the instance
slcli vs create -H ${node} -D ${sl_domain} \
-k $(slcli sshkey list | grep ${sl_sshkey_name} | awk '{print $1}') \
-c ${sl_cpu} -m ${sl_memory} \
-d ${sl_region} -o CENTOS_LATEST \
--billing ${sl_billing} --public \
--disk ${sl_os_disk_size} \
--disk ${sl_docker_disk_size} -n 100 \
--vlan-public ${sl_public_vlan_id} \
--vlan-private ${sl_private_vlan_id} \
--tag dockerhost
To check when the provisioning process is completed, run:
slcli vs detail ${node} | grep state
Once the API reports back a state of RUNNING
, your node is ready.
Note: The
script needsexpect
installed on your system to get aroundslcli
not having a-y
option to force through orders/cancellations without any input. To perform the raw API calls to order and cancel without any prompting, or for more information on using the Softlayer ordering API programatically, refer to the documentation.
Run post-provisioning scripts
Once the intstances are provisioned, we’re going to want to prep them a bit before adding them with machine.
copies and executes
from the instance
subdirectory over to the newly-provisioned instances and runs it once they are accessible.
Since we’re using docker’s btrfs storage driver, we’ll run the following actions to prep the system and build our btrfs docker volume via
- Update the system
- Install the necessary packages to support LVM and btrfs
- Install the net-tools package to workaround docker-machine issue #2481
- Partition the secondary volume on the instance
- Add the secondary volume to LVM and format it as a btrfs filesystem
- Mount the btrfs filesystem at
Tip: Depending on your environment, you may want to also run your ansible playbooks, install additional software or further tune the system to fit your standards.
First, we’ll update the system and install the necessary packages:
yum -y update && yum clean all && yum makecache fast
yum -y install lvm2 lvm2-libs btrfs-progs git net-tools
Next, we partition and format the secondary volume we’ll be using for the /var/lib/docker
cat << EOF > /tmp/xvdc.layout
# partition table of /dev/xvde
unit: sectors
/dev/xvdc1 : start= 2048, size= 26213376, Id=8e
/dev/xvdc2 : start= 0, size= 0, Id= 0
/dev/xvdc3 : start= 0, size= 0, Id= 0
/dev/xvdc4 : start= 0, size= 0, Id= 0
… and convert to a logical volume under LVM:
sfdisk --force /dev/xvdc < /tmp/xvdc.layout
pvcreate /dev/xvdc1
vgcreate docker_vg /dev/xvdc1
lvcreate -l 100%FREE -n docker_lv1 docker_vg
You could stop here and just set up the device mapper driver when deploying docker to use the LVM group for the docker volume and metadata. You can find more information on configuring the device mapper driver in the official Docker Documentation.
Building with
, we go one step further and format the LVM volume with btrfs. To configure the LVM partition as a btrfs filesystem, simply run the following on the provisioned instance:
mkfs.btrfs /dev/docker_vg/docker_lv1
Note: In going the btrfs route, there is also no need to manually configure a storage driver – when docker is started it will detect that the
directory is on a btrfs partition and automatically load the btrfs storage driver.There’s some discussion still on whether btrfs is production-ready or not, but for noncritical environments I like the convenience of working with the btrfs feature-set. With Redhat (and thus CentOS) 7.1, btrfs is still considered a technology preview – so while I’ve not had any issues with it day-to-day, YMMV. Caveat emptor.
Once the post-provisioning process is complete, we’re ready to move on to building the actual swarm nodes using docker-machine
Build the Swarm nodes
Once the instances we’ll be using have been provisioned, the next helper script,
will create the swarm using the generic
machine driver and deploy consul across each of the swarm masters.
So why not just use machine’s built-in Softlayer driver to deploy the nodes? Why the extra step?
The primary reason we’re giving it a skip is because we need to pass IP address information to docker-machine as part of the swarm creation – right now, there’s no way to do that with the standard drivers. By pre-provisioning with our provider’s CLI, we can use it to grab the provisioned instance’s private and public IPs to pass to machine when creating the swarm. While this is defnitely a shortcoming of the current machine drivers, provisioning with
(or your provider’s tool of choice) offers some more flexibility with provisioning the instance with non-standard images or configurations.
For each node in the cluster, we’ll run docker-machine create
and tell it to create a master node with the --swarm-master
switch, and tell it to replicate to the other masters via the --swarm-opt="replication=true"
switch. We’re then telling the swarm master to find the other masters by using the key-value store located at consul://consul.service.consul
. Since we’re also using the node’s local IP address for our primary DNS (via --engine-opt="dns ${node_private_ip}"
), each node will also find all of the other participating consul instances. You’ll need to have an ssh key in ssh/swarm.rsa
(or configured elsewhere) that has its public key added to the authorized_keys
file of root or another user on the nodes you will be provisioning, as specified with the --generic-ssh-key
and --generic-ssh-user
# Vars taken from swarm.local
node_ssh_ip= # Public or Private IP machine will use to connect
docker-machine create \
--driver generic \
--generic-ip-address ${node_ssh_ip} \
--generic-ssh-key ${__root}/ssh/swarm.rsa \
--generic-ssh-user root \
--engine-storage-driver btrfs \
--swarm --swarm-master \
--swarm-opt="replication=true" \
--swarm-opt="advertise=${node_private_ip}:3376" \
--swarm-discovery="consul://${node_consul}:8500" \
--engine-opt="cluster-store consul://${node_consul}:8500" \
--engine-opt="cluster-advertise=eth0:2376" \
--engine-opt="dns ${node_private_ip}" \
--engine-opt="dns ${dns_primary}" \
--engine-opt="dns ${dns_secondary}" \
--engine-opt="log-driver json-file" \
--engine-opt="log-opt max-file=10" \
--engine-opt="log-opt max-size=10m" \
--engine-opt="dns-search=${dns_search_domain}" \
--engine-label="dc=${datacenter}" \
--engine-label="instance_type=public_cloud" \
--tls-san ${node} \
--tls-san ${node_private_ip} \
--tls-san ${node_public_ip} \
Deploy Consul cluster to the Swarm
In Jacob Blain Christen’s article Toward a Production-Ready Docker Swarm with Consul, he made a very useful discovery – if you configure machine to point swarm at a non-existent consul address, it will still create the node succesfully and just continue to retry its connection rather than giving up.
Since we haven’t deployed the actual consul containers yet, the swarm nodes can’t yet form a multi-master cluster. The
script and compose file borrow his technique to buy some time to deploy the consul containers after we’ve got the individual swarm nodes started in a semi-functional state.
After the swarm masters have been provisioned with machine,
will then run docker-compose
to deploy Consul across all nodes.
Note: As this is a self-contained cluster, we’re using consul server images on each node – larger swarms would be better served using the consul agent image instead on any swarm members that aren’t part of the initial HA contol nodes:
# Vars taken from swarm.local
docker-machine ssh ${node} "printf 'nameserver ${node_cluster_ip}\nnameserver ${dns_primary}\nnameserver ${dns_secondary}\ndomain ${dns_search_domain}\n' > /etc/resolv.conf"
docker-machine scp -r ./compose/consul/config ${node}:/tmp/consul
docker-machine ssh ${node} "mv /tmp/consul /etc"
eval $(docker-machine env ${node})
docker-compose -f ./compose/consul/consul.yml up -d consul
docker-machine ssh ${node} "systemctl restart docker"
This copies over the consul configuration file, changes the dockerhost’s resolv.conf to use the local consul for its primary DNS and runs docker-compose
to build the consul container using some variables we’ll either pull from the swarm.local
config or figure out automatically. After building consul, we then bounce the docker daemon so it can successfully connect to the consul.service.consul
service in DNS. Our compose file looks like this:
command: -dc ${datacenter} -server -node ${node} -client -bootstrap-expect ${swarm_total_nodes} -advertise ${node_cluster_ip} -retry-interval 10s -recursor ${dns_primary} -recursor ${dns_secondary} -retry-join ${othernode0_cluster_ip} -retry-join ${othernode1_cluster_ip}
container_name: consul
net: host
image: gliderlabs/consul-server:latest
- ${node_cluster_ip}:53:53
- ${node_cluster_ip}:53:53/udp
- ${node_cluster_ip}:8300-8302:8300-8302
- ${node_cluster_ip}:8300-8302:8300-8302/udp
- ${node_cluster_ip}:8400:8400
- ${node_cluster_ip}:8500:8500
restart: always
- "consul-data:/data"
- "/etc/consul/consul.json:/config/consul.json:ro"
- "/etc/docker/ca.pem:/certs/ca.pem:ro"
- "/etc/docker/server.pem:/certs/server.pem:ro"
- "/etc/docker/server-key.pem:/certs/server-key.pem:ro"
- "/var/run/docker.sock:/var/run/docker.sock"
… and this consul.json
"ca_file": "/certs/ca.pem",
"cert_file": "/certs/server.pem",
"key_file": "/certs/server-key.pem",
"ports": {
"dns": 53
"verify_incoming": true,
"verify_outgoing": true
This exposes consul’s DNS on port 53 locally on the swarm node, adds our upstream DNS as recursors for consul and tells it to join the cluster with the other consul containers living on the other swarm masters. For more information on configuring and using consul, refer to the consul documentation. If you’re looking to perform automated service discovery, at this point you could also deploy registrator to automatically populate consul with any publicly exposed services running across the swarm.
Verify the Swarm
By now,
should have exited. Within a few minutes, the restarted docker daemons should start finding the consul cluster and being the process of starting the swarm and connecting to the other swarm masters. To check on the status, load the environment variables for the swarm using “eval $(docker-machine env --swarm ${node}
” and issue a “docker info
” command. If the swarm is operational, you should see output similar to below – with each swarm member listed in the swarm node list. If the number of Nodes
is less than what you deployed or is empty, wait a few seconds and try again:
Containers: 9
Running: 9
Paused: 0
Stopped: 0
Images: 9
Server Version: swarm/1.1.3
Role: replica
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 3
└ Status: Healthy
└ Containers: 4
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 1.013 GiB
└ Labels: dc=tor01, executiondriver=native-0.2, instance_type=public_cloud,
kernelversion=3.10.0-327.10.1.el7.x86_64, operatingsystem=CentOS Linux 7 (Core),
provider=generic, storagedriver=btrfs
└ Error: (none)
└ UpdatedAt: 2016-03-28T14:09:03Z
└ Status: Healthy
└ Containers: 4
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 1.013 GiB
└ Labels: dc=tor01, executiondriver=native-0.2, instance_type=public_cloud,
kernelversion=3.10.0-327.10.1.el7.x86_64, operatingsystem=CentOS Linux 7 (Core),
provider=generic, storagedriver=btrfs
└ Error: (none)
└ UpdatedAt: 2016-03-28T14:09:27Z
└ Status: Healthy
└ Containers: 4
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 1.013 GiB
└ Labels: dc=tor01, executiondriver=native-0.2, instance_type=public_cloud,
kernelversion=3.10.0-327.10.1.el7.x86_64, operatingsystem=CentOS Linux 7 (Core),
provider=generic, storagedriver=btrfs
└ Error: (none)
└ UpdatedAt: 2016-03-28T14:09:37Z
Kernel Version: 3.10.0-327.10.1.el7.x86_64
Operating System: linux
Architecture: amd64
CPUs: 3
Total Memory: 3.038 GiB
Name: sw1
Standing up MariaDB
So now we’ve got a multi-master swarm up and running using a high-availabilty consul cluster running directly on the swarm masters. Sweet! Next, we’ll move on to getting MariaDB up and running on the swarm. To accomplish this, the
script will create an overlay network on top of the swarm and then bootstrap the MariaDB cluster using docker-compose
Note: Keeping with the CentOS 7 theme, we’ll be using my centos7-mariadb-10.1-galera image from Docker Hub to deploy the MariaDB Galera cluster across the swarm nodes. This image is based upon the offical MariaDB 10.1 image, except it uses CentOS insted of Ubuntu and has both Galera cluster and Percona Xtrabackup support added in. More info on customizing/configuring the image can be found at the previous link or at the source repository on github.
Before we start deploying the MariaDB image though, we’ll create our overlay network. The overlay network will be used for all inter-cluster communications.
will also expose port 3306 on each instance to the internal network of the dockerhost. Assuming you’re running you app servers across the same swarm, you could skip exposing the port at all, and run your application traffic against the overlay network as well.
Create the overlay network
Once swarm is up and running, creating the overlay network is just a matter of issuing a couple of commands and optionally picking out a subnet for your network if you need to specify it (i.e. to avoid conflicts with other local subnets). Make sure you have machine pointed at the swarm by issuing “eval $(docker-machine env --swarm ${node})
”, and then create the subnet with:
docker network create -d overlay --subnet= mariadb
Bootstrap the MariaDB Galera cluster
Now that we have our overlay network sorted, we can start getting our MariaDB Galera cluster up and running using the overlay network for cluster communications. After creating the overlay network,
will dynamically generate compose files for each swarm node and bring the cluster members up. The first cluster member is brought up with a special value for the $cluster_members
environment variable – by setting this to BOOTSTRAP
the first node will know to initialize a new cluster.
Tip: You can customize the
compose file to not only use a volume on the dockerhost for a persistent database, but also add any scripts you want executed on start (such as to create databases or import data) by mounting a directory containing your scripts as a container volume called/docker-entrypoint-initdb.d
. For a full list of configurable options, please refer to the image description on docker hub. Don’t forget, since this is a cluster you’ll only need to run your scripts on the first node!
As there are no nodes running, the first node will detect it is not operational and perform the bootstrap process. Running the script a second time with the cluster active will restart the first node as a regular cluster member. Passing some variables to compose, the script bootstraps the first node:
# Vars taken from swarm.local
cluster_members=( array of swarm nodes taken from swarm.local )
mariadb_cluster_name=A unique name to allow for multiple clusters on the same network
eval $(docker-machine env ${node})
export cluster_members=BOOTSTRAP # First node - set the cluster mode to bootstrap
export node
sed "s/%%DBNODE%%/db-${node}/g" mariadb.yml > ${node}.yml
docker-compose -f ${__root}/compose/mariadb/${node}.yml up -d --no-recreate
…using the generated dockerfile ${node}.yml
. This places the MariaDB container on the overlay network we created earlier, exposes the necessary ports to the overlay network, and forwards port 3306 on the dockerhost to port 3306 of the MariaDB container:
version: '2'
image: dayreiner/centos7-mariadb-10.1-galera
container_name: db-sw1
hostname: db-sw1
restart: always
- mariadb
- "3306"
- "4567"
- "4444"
- ${mariadb_data_path}:/var/lib/mysql
- common.env
# This is set by the build script
- CLUSTER=${cluster_members}
# These are configured in swarm.conf
- CLUSTER_NAME=${mariadb_cluster_name}
- MYSQL_ROOT_PASSWORD=${mysql_root_password}
- SST_USER=sst
- SST_PASS=${sst_password}
name: mariadb
Before moving on to the secondary cluster members, the script first waits for the bootstrap node to report that it is operational by looking for the log entry “Synchronized with group, ready for connections
” via the “docker logs
” command. The process is then repeated for the remaining nodes, except the secondary nodes are started with a list of nodes to connect to.
By deploying the DB cluster into swarm’s overlay network, the built-in DNS service will automatically allow the containers to find each other by name. This allows for relatively simple scaling of the database cluster, adding new members with just a couple of commands. While not exactly true “service discovery” per-se, as long as we stick to a standard naming scheme then adding new cluster members is a trivial process that can be easily automated.
Verify Galera cluster membership
Once all of the cluster members have started, you can confirm the Galera cluster is operational by setting your environment via “eval ${docker-machine env --swarm ${node})
” and running a few verification commands:
docker exec -ti sw1-db1 mysql -psecret "show status like 'wsrep_local_state_comment';"
…each cluster member should return Synced
docker exec -ti sw1-db1 mysql -psecret "show status like 'wsrep_cluster_size';"
…this value should be the same as the # of nodes. So for a three-node cluster, this should return 3
docker exec -ti sw1-db1 mysql -psecret "show status like 'wsrep_local_state_uuid';"
…all members should report back the same UUID.
Once you have confirmed the cluster is operational, you can optionally re-run
to redeploy the bootstrap node as a standard galera cluster member.
Destroying the Swarm
At this point, you may just want to clear everything and start over. To destroy the swarm (without cancelling any instances), run the
If you provisioned your nodes in IBM Softlayer, you can use the
script to cancel and destroy the instances you provisioned using
You can also tear down and rebuild the swarm using the
script (it just calls
followed by
), which has been included for convenience when experimenting.
Note: The
script (just like the ordering script) needsexpect
installed on your system to get aroundslcli
not having a-y
option to force through orders/cancellations without any input.
Happy Swarming!
That’s it! Hopefully this has given you a better understanding of the overall process. Dig through the repository, play with the scripts and compose files, and adapt the process to your own orchestration tools, providers and processes… You’ll be swarming in no time flat!