etcd
Kubernetes

etcd cluster : backup and restore on Kubernetes

Are you worried about losing all your data in Kubernetes? Don’t worry, I’ve got you covered! In this blog post, I will guide you on how to backup and restore etcd in Kubernetes.

Let’s first see what is Etcd.

etcd
etcd

What is Etcd?

Etcd is a distributed key-value store used by Kubernetes to store all the data. It is the backbone of the Kubernetes cluster as it stores information about the state of the cluster, such as pods, services, and deployments.

etcd is a CNCF project, check here.

Why Backup Etcd?

Say you have a production cluster with a set of applications running on it, and you have decided to upgraded the master node without taking any backup. Because you did not anticipate anything to go wrong.

etcd
etcd

After upgrading and rebooting the master node you notice that none of your applications are accessible. And because you didn’t take any backup, you can’t restore the original state of the cluster.

So Backing up Etcd is crucial as it ensures that your data is safe and recoverable in case of any disaster. Without a backup, you can lose all your data, and restoring it can be time-consuming and challenging.

Prerequisites Before taking the backup

First we need to check if etcdctl is installed because it is what we will use to take the backup:

#etcdctl --version
etcdctl version: 3.3.13
API version: 2

Etcdctl is a command line tools for etcd. To make use of etcdctl for tasks such as backup and restore, you have to make sure that you set the ETCDCTL_API to 3.

#export ETCDCTL_API=3
#etcdctl version
etcdctl version: 3.3.13
API version: 3.3

Now we will gonna need the path of the following ETCD certificates so we can authenticate to the ETCD server : CA Certificate, Certificate client and -Key certificate.

How to get those certificate?

Easy, first Check the name of etcd pods :

#kubectl get pods -n kube-system
NAME                                   READY   STATUS    RESTARTS   AGE
coredns-3d79dcq8ap-62kl3               1/1     Running   0          5m18s
coredns-3d79dcq8ap-cv2km               1/1     Running   0          5m18s
etcd-controlplane                      1/1     Running   0          5m28s
kube-apiserver-controlplane            1/1     Running   0          5m31s
kube-controller-manager-controlplane   1/1     Running   0          5m31s
kube-proxy-35tyg                       1/1     Running   0          5m19s
kube-scheduler-controlplane            1/1     Running   0          5m28s

Then check the etcd config by running the command follow :

#kubectl describe pods etcd-controlplane -n kube-system
    .
    .
    .
    Host Port:     <none>
    Command:
      etcd
      --advertise-client-urls=https://192.37.133.9:2379
      --cert-file=/etc/kubernetes/pki/etcd/server.crt
      --client-cert-auth=true
      --data-dir=/var/lib/etcd
      --experimental-initial-corrupt-check=true
      --experimental-watch-progress-notify-interval=5s
      --initial-advertise-peer-urls=https://192.37.133.9:2380
      --initial-cluster=controlplane=https://192.37.133.9:2380
      --key-file=/etc/kubernetes/pki/etcd/server.key
      --listen-client-urls=https://127.0.0.1:2379,https://192.37.133.9:2379
      --listen-metrics-urls=http://127.0.0.1:2381
      --listen-peer-urls=https://192.37.133.9:2380
      --name=controlplane
      --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
      --peer-client-cert-auth=true
      --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
      --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
      --snapshot-count=10000
      --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

So here is our certificates :

-Server certificat : /etc/kubernetes/pki/etcd/server.crt

-Key Certificat: /etc/kubernetes/pki/etcd/server.key

-CA Certificat: /etc/kubernetes/pki/etcd/ca.crt

Or use another alternative method, by just checking the etcd definitaion file :

cat /etc/kubernetes/manifests/etcd.yaml | grep file
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

Now we have the 3 certificates to be able to authenticate to the ETCD server.

Next, we need information regarding the endpoints. How we can get that ?

Easy, run the following command :

#cat /etc/kubernetes/manifests/etcd.yaml | grep listen-client-urls
    - --listen-client-urls=https://127.0.0.1:2379,https://192.34.131.9:2379

As you may have noticed, since etcd and master node are the same server, the ip adresse is 127.0.0.1. If the etcd is running on another server, then we need to change the localhost ip with the ip of that server.

Now that we have all we need, let’s start to backup the Etcd cluster.

Steps to backup ETCD Cluster

Take a snapshot of the ETCD database using the built-in snapshot functionality :

#ETCDCTL_API=3 etcdctl snapshot save /opt/etcd-backup.db \ 
 --endpoints=127.0.0.1:2379 \
 --cert="/etc/kubernetes/pki/etcd/server.crt" \
 --cacert="/etc/kubernetes/pki/etcd/ca.crt" \
 --key="/etc/kubernetes/pki/etcd/server.key"
Snapshot saved at /opt/etcd-backup.db

Here I have stored the backup file at location /opt/ :

#ls /opt/etcd-backup.db
/opt/etcd-backup.db

As I mentioned before, to prevent to specify on each command use this command :

#export ETCDCTL_API=3
#etcdctl snapshot save /opt/etcd-backup.db \ 
 --endpoints=127.0.0.1:2379 \
 --cert="/etc/kubernetes/pki/etcd/server.crt" \
 --cacert="/etc/kubernetes/pki/etcd/ca.crt" \
 --key="/etc/kubernetes/pki/etcd/server.key"
Snapshot saved at /opt/etcd-backup.db

Steps to restore ETCD Cluster

Say that After a reboot the master nodes came back online, but all applications are now inaccessibles :

#kubectl get pods
No resources found in default namespace.

Luckily we took a backup, now let restore our ETCD cluster using the the backup file :

# etcdctl snapshot restore --data-dir /var/lib/etcd-backup /opt/snapshot-pre-boot.db
2023-06-21 18:50:32.631891 I | mvcc: restore compact to 1908
2023-06-21 18:50:32.639073 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster mff852141e1a6j32

Here I have specified the new etcd directory /var/lib/etcd-backup ( it will be created automatically ) where etcd cluster will be restored.

Here I don’t have to specify endpoint and certificates details, because this operation of restoring data is not done by communicating with the etcd server.

One last thing, you need to specify the new etcd directory in the etcd manifest file :

# vi /etc/kubernetes/manifests/etcd.yaml

Change the following parts :

- command:
    - etcd
    - --advertise-client-urls=https://192.0.52.6:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd-backup
volumeMounts:
    - mountPath: /var/lib/etcd-backup
      name: etcd-data
- hostPath:
      path: /var/lib/etcd-backup
      type: DirectoryOrCreate
    name: etcd-data

The pods will be recreated because any change on this file will force to recreate the pods, and this changes going to consider the new path that is the news etcd data directory.

Conclusion

Backing up and restoring Etcd is crucial for the safety and security of your Kubernetes cluster. By following the steps I mentioned above, you can ensure that your data is safe and recoverable in case of any disaster. Remember to always keep a backup of your data and store it in a secure location.

More informations about etcd.

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *