Docker (Swarm), TrueNAS, and ZFS

Problem

As soon as I got TrueNAS relatively stable and configured, I tried to migrate my Docker services over to the platform as part of my grand plan to re-home the majority of my infrastructure on to my R630. But my containers weren’t networking properly; in particular, they didn’t have external connectivity. Some quick Googling surfaced the answer:

  • the default configuration sets iptables=false so the containers have to route through a Kubernetes ingress object (like a LoadBalancer), which I didn’t have nor want
  • the default configuration assumes that you’re managing everything through the Web UI (which doesn’t satisfy my use cases) and running on Kubernetes (which is overkill for my use cases)

TrueNAS actively enforces system configuration at reboot by overwriting local changes from the system database, so some modifications are required to run vanilla Docker. Some quick cleanup and scripting to update the Docker configuration and I should be all set, right? Well…

Solution

Turns out there were two key pieces of information required to ultimately fix the issue:

  1. I needed a custom Docker daemon configuration to be applied at reboot that disabled the system configuration and replaced it with my user configuration, enabling a “vanilla” Docker experience
  2. I needed a custom Docker volume (more on this below) to be mounted at reboot to avoid performance and stability issues running Docker over ZFS with the default zfs storage driver

Since I had been playing around with the built-in Docker/k8s capability in the Web UI, the first thing I had to do was disable that capability by unassigning the pool. I didn’t have anything valuable stored at that location, so I also deleted the ix-applications dataset as part of the cleanup.

Now I needed a way to reliably configure the system at startup. After doing some research, I elected to not use the Docker zfs storage driver due to the snapshot issue and instead elected to create a new zvol — a block storage device in ZFS — formatted as ext4 and mounted at /var/lib/docker. Since this appears as an ext4 volume to the system, I can use the default overlay2 storage driver for Docker as well as the default location. To make this persistent across restarts, I assembled the following script, which runs PREINIT as an “Init” script in TrueNAS:

#!/usr/bin/env bash

####################################################
### pre()
####################################################
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"

# Write activities to local log file
exec > "${SCRIPT_DIR}/docker-output.log" 2>&1

####################################################
### main()
####################################################
echo "Stopping daemon..."
systemctl stop docker

### mount user volume for Docker storage
echo "Checking mounts..."
NUM_MOUNTS=$(mount -l | grep "/var/lib/docker " | wc -l)
if [ "$NUM_MOUNTS" = "1" ]; then
  # expected re-run behavior
  # something is already mounted; skip this block
  MOUNTPOINT=$(mount -l | grep "/var/lib/docker ")
  echo "Found 1 mount: $MOUNTPOINT"
elif [ "$NUM_MOUNTS" = "0" ]; then
  # expected first run behavior
  # remove any system data and mount user volume
  echo "No mounts found. Removing system data..."
  rm -rdf /var/lib/docker/*
  echo "Mounting user Docker volume..."
  mount /dev/data00/docker /var/lib/docker
  echo "User Docker volume mounted!"
  # exit 2
else
  # this is an illegal condition and should never trigger
  echo "Found $NUM_MOUNTS -- exiting..."
  exit 1
fi

### install user configuration for Docker daemon
echo "Replacing daemon.json config. file..."
cmp --silent "${SCRIPT_DIR}/docker-config.json" /etc/docker/daemon.json || echo "Updating Docker config..." && cp -f "${SCRIPT_DIR}/docker-config.json" /etc/docker/daemon.json
chown root:root /etc/docker/daemon.json
echo "User daemon config. file installed!"

echo "Starting daemon..."
systemctl start docker

####################################################
### post()
####################################################
echo "Done!"
exit 0

The script is a little verbose (and could definitely be hardened) but basically does the following:

  1. stop the docker daemon
  2. remove any lingering data at /var/lib/docker
  3. mount my custom /dev/data00/docker volume
  4. install my custom config.json file for Docker
  5. start the docker daemon

The script and the associated configuration file are stored in my user home directory. The custom configuration file is pretty close to the factory default for Docker:

{
  "data-root": "/var/lib/docker",
  "exec-opts": ["native.cgroupdriver=cgroupfs"],
  "storage-driver": "overlay2",
  "iptables": true,
  "bridge": "",
  "dns": ["1.1.1.1"]
}

with the key field being { "iptables": true } to enable vanilla Docker networking for Swarm and Compose.

Creating the zvol itself was straight-forward with the following caveats:

  • the zvol will appear under /dev/{ pool_name }/{ zvol_name }
  • don’t bother with fdisk; just format the raw volume via mkfs.ext4 /dev/{ pool_name }/{ zvol_name }
  • don’t bother with adding the mount to /etc/fstab, which is overwritten at startup from the TrueNAS database
  • if in doubt, make the zvol smaller; it’s trivial to grow it from the Web UI and almost impossible to shrink it

For reference, I settled on making my “docker” zvol 128 GiB in size and located on my NVMe-backed “data00” pool. I would have made it larger except the storage is preallocated and my pool is only 512 GB, so I wanted to keep utilization <50%.

To mirror previous deployments, I also created a dataset for my Docker applications, which I called “stacks” and placed on the same “data00” pool. This is native ZFS and used for bind-mounts into containers; if I run into issues, I’ll likely convert this to a zvol and mount it similarly to my “docker” zvol.

I have some test services running — namely Traefik and Portainer — so the real test will come when I migrate some of my real workloads over. Stay tuned!

References:

  1. TrueNAS Scale – Use Vanilla Docker
  2. Using Docker on TrueNAS SCALE (no Kubernetes)
  3. Setup TrueNAS Scale to work with portainer
  4. Home Server Setup (Ubuntu / Docker / ZFS) Sanity Check; in particular, this comment by u/myerscarpenter
  5. docker ZFS driver creates hundreds of datasets and doesn’t clean them #41055; in particular, this issue comment by kraduk

Published by jonbackhaus

Foodie, hacker, craftsman, budding audiophile. Tweets are my own.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: