Are you worried about losing all your data in Kubernetes? Don’t worry, I’ve got you covered! In this blog post, I will guide you on how to backup and restore etcd in Kubernetes.
Let’s first see what is Etcd.
What is Etcd?
Etcd is a distributed key-value store used by Kubernetes to store all the data. It is the backbone of the Kubernetes cluster as it stores information about the state of the cluster, such as pods, services, and deployments.
Why Backup Etcd?
Say you have a production cluster with a set of applications running on it, and you have decided to upgraded the master node without taking any backup. Because you did not anticipate anything to go wrong.
After upgrading and rebooting the master node you notice that none of your applications are accessible. And because you didn’t take any backup, you can’t restore the original state of the cluster.
So Backing up Etcd is crucial as it ensures that your data is safe and recoverable in case of any disaster. Without a backup, you can lose all your data, and restoring it can be time-consuming and challenging.
Prerequisites Before taking the backup
First we need to check if etcdctl is installed because it is what we will use to take the backup:
#etcdctl --version etcdctl version: 3.3.13 API version: 2
Etcdctl is a command line tools for etcd. To make use of etcdctl for tasks such as backup and restore, you have to make sure that you set the ETCDCTL_API to 3.
#etcdctl version etcdctl version: 3.3.13 API version: 3.3
Now we will gonna need the path of the following ETCD certificates so we can authenticate to the ETCD server : CA Certificate, Certificate client and -Key certificate.
How to get those certificate?
Easy, first Check the name of etcd pods :
#kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-3d79dcq8ap-62kl3 1/1 Running 0 5m18s coredns-3d79dcq8ap-cv2km 1/1 Running 0 5m18s etcd-controlplane 1/1 Running 0 5m28s kube-apiserver-controlplane 1/1 Running 0 5m31s kube-controller-manager-controlplane 1/1 Running 0 5m31s kube-proxy-35tyg 1/1 Running 0 5m19s kube-scheduler-controlplane 1/1 Running 0 5m28s
Then check the etcd config by running the command follow :
#kubectl describe pods etcd-controlplane -n kube-system . . . Host Port: <none> Command: etcd --advertise-client-urls=https://220.127.116.11:2379 --cert-file=/etc/kubernetes/pki/etcd/server.crt --client-cert-auth=true --data-dir=/var/lib/etcd --experimental-initial-corrupt-check=true --experimental-watch-progress-notify-interval=5s --initial-advertise-peer-urls=https://18.104.22.168:2380 --initial-cluster=controlplane=https://22.214.171.124:2380 --key-file=/etc/kubernetes/pki/etcd/server.key --listen-client-urls=https://127.0.0.1:2379,https://126.96.36.199:2379 --listen-metrics-urls=http://127.0.0.1:2381 --listen-peer-urls=https://188.8.131.52:2380 --name=controlplane --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt --peer-client-cert-auth=true --peer-key-file=/etc/kubernetes/pki/etcd/peer.key --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt --snapshot-count=10000 --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
So here is our certificates :
-Server certificat : /etc/kubernetes/pki/etcd/server.crt
-Key Certificat: /etc/kubernetes/pki/etcd/server.key
-CA Certificat: /etc/kubernetes/pki/etcd/ca.crt
Or use another alternative method, by just checking the etcd definitaion file :
cat /etc/kubernetes/manifests/etcd.yaml | grep file - --cert-file=/etc/kubernetes/pki/etcd/server.crt - --key-file=/etc/kubernetes/pki/etcd/server.key - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
Now we have the 3 certificates to be able to authenticate to the ETCD server.
Next, we need information regarding the endpoints. How we can get that ?
Easy, run the following command :
#cat /etc/kubernetes/manifests/etcd.yaml | grep listen-client-urls - --listen-client-urls=https://127.0.0.1:2379,https://184.108.40.206:2379
As you may have noticed, since etcd and master node are the same server, the ip adresse is 127.0.0.1. If the etcd is running on another server, then we need to change the localhost ip with the ip of that server.
Now that we have all we need, let’s start to backup the Etcd cluster.
Steps to backup ETCD Cluster
Take a snapshot of the ETCD database using the built-in snapshot functionality :
#ETCDCTL_API=3 etcdctl snapshot save /opt/etcd-backup.db \ --endpoints=127.0.0.1:2379 \ --cert="/etc/kubernetes/pki/etcd/server.crt" \ --cacert="/etc/kubernetes/pki/etcd/ca.crt" \ --key="/etc/kubernetes/pki/etcd/server.key" Snapshot saved at /opt/etcd-backup.db
Here I have stored the backup file at location /opt/ :
#ls /opt/etcd-backup.db /opt/etcd-backup.db
As I mentioned before, to prevent to specify on each command use this command :
#etcdctl snapshot save /opt/etcd-backup.db \ --endpoints=127.0.0.1:2379 \ --cert="/etc/kubernetes/pki/etcd/server.crt" \ --cacert="/etc/kubernetes/pki/etcd/ca.crt" \ --key="/etc/kubernetes/pki/etcd/server.key" Snapshot saved at /opt/etcd-backup.db
Steps to restore ETCD Cluster
Say that After a reboot the master nodes came back online, but all applications are now inaccessibles :
#kubectl get pods No resources found in default namespace.
Luckily we took a backup, now let restore our ETCD cluster using the the backup file :
# etcdctl snapshot restore --data-dir /var/lib/etcd-backup /opt/snapshot-pre-boot.db 2023-06-21 18:50:32.631891 I | mvcc: restore compact to 1908 2023-06-21 18:50:32.639073 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster mff852141e1a6j32
Here I have specified the new etcd directory /var/lib/etcd-backup ( it will be created automatically ) where etcd cluster will be restored.
Here I don’t have to specify endpoint and certificates details, because this operation of restoring data is not done by communicating with the etcd server.
One last thing, you need to specify the new etcd directory in the etcd manifest file :
# vi /etc/kubernetes/manifests/etcd.yaml
Change the following parts :
- command: - etcd - --advertise-client-urls=https://220.127.116.11:2379 - --cert-file=/etc/kubernetes/pki/etcd/server.crt - --client-cert-auth=true - --data-dir=/var/lib/etcd-backup
volumeMounts: - mountPath: /var/lib/etcd-backup name: etcd-data
- hostPath: path: /var/lib/etcd-backup type: DirectoryOrCreate name: etcd-data
The pods will be recreated because any change on this file will force to recreate the pods, and this changes going to consider the new path that is the news etcd data directory.
Backing up and restoring Etcd is crucial for the safety and security of your Kubernetes cluster. By following the steps I mentioned above, you can ensure that your data is safe and recoverable in case of any disaster. Remember to always keep a backup of your data and store it in a secure location.