Running kubernetes etcd in-memory

After setting up my kubernetes cluster at home back in June 2021, one of the first things I noticed was a lot more noise from the server. Apparently, it was just a lot of disk IO coming from kubernetes and in particular etcd. Therefore, I decided to fix this problem.

Before we proceed it is important to note that my cluster is running on a single server, so high availability (apart from restarting failed containers) is not my aim. Of course, I will experiment with replicated storage such as longhorn and perhaps ceph/rook in the future, but at the end of the day it is still a single server with a single kubernetes cluster and a single controller node. This means that running etcd from a ramdisk is an option. I am using a kubeadm setup of kubernetes so etcd is running as a pod inside my cluster.

Running etcd from a ramdisk requires the following:

  • regular backups of etcd state
  • an additional backup just after the kubelet has stopped but containers (such as etcd) are still running.
  • an additional restore of etcd before starting the kubelet
  • mounting the storage directory /var/lib/etcd as a ramdisk (tmpfs)

Systemd is a replacement for init that has been in use for several years now. The good thing about systemd is that it allows modifying behavior. This fact is used by the pre startup hook and post stop hooks. These hooks are created by placing .conf files in the extension directory /usr/lib/systemd/system/kubelet.service.d.

The pre startup hook is as follows:

[Unit]
After=containerd.service

[Service]
ExecStartPre=-/opt/wamblee/etcd/bin/etcd-restore-to-tmpfs

In this configuration, I have added a dependency on containerd, since the prestartup hook requires the container runtime to be started. Also a pre-startup script is triggered and because the script is prefixed with ‘-‘, startup will fail when the restore fails.

The post startup hook is similar:

[Service]
ExecStop=/opt/wamblee/etcd/bin/etcd-cron

The scripts here contain all the intelligence for backing up and restoring etcd data. In particular, the backup image requires a running etcd and runs a backup command in a container. It preserves the last 10 backups taken and preserves one backup per day for the last 31 days. It also backs up the name of the docker image for etcd, since that may vary depending on the kubernetes version. That etcd container is than used by the restore script so we are certain that backup and restore always use the same version of etcd as the kubernetes cluster.

To mount the storage directory /var/lib/etcd as a ramdisk requires adding a single line to /etc/fstab:

tmpfs  /var/lib/etcd   tmpfs   defaults,,noatime,size=2g  0 0 

The final step is to run the periodic backup using a cron script placed in /etc/cron.d:

*/15 * * * * root /opt/wamblee/etcd/bin/etcd-cron > /var/log/wamblee-etcd-backup 2>&1
30 0 * * * root /opt/wamblee/etcd/bin/etcdctl defrag --cluster > /var/log/wamblee-etcd-defrag 2>&1 

Note that above there is also a defragmentation task because shortly after setting up monitoring with prometheus, I got messages about etcd fragmentation.

Initially. the scripts were based on docker, but since that time, kubernetes no longer uses docker by default, and I have switched to containerd. To do this transparently, I added a docker script that simply delegates to containerd using nerdctl. In the same way, the backup solution can be adapted to other container runtimes.

See the full source code together with setup instructions here.

This entry was posted in Devops/Linux. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *