So, how do we handle persistent data in our k8s cluster?

Yiğit İrez
4 min readApr 11, 2021

Consider the case where you need data in a pod to persist for some reason, so what do you do? You create a volume;

  • Container storage —Data lives as long as the container is alive
  • Pod Volume — Usable as long as pod is in the same node
  • External Storage Volume — Same as above but with external storage so multiple pods from different nodes cnan use it
  • Persistent Volume — Usable from whatever pod in wherever node and no hard dependancies.

If you are testing something and you can control whatever node your pod is going to run in, a quick pod volume is the way to go.

In your pod .yml, the below yaml states that the /opt path in the container is linked to the /data path in our node. So even if the pod dies, the data within it will be available for reuse as long as it starts with the same volume details and also in the same node.

apiVersion: v1
kind: Pod
metadata:
labels:
label: some-label
name: volume-test
spec:
containers:
- image: nginx
name: some-container
volumeMounts:
- mountPath: /opt
name: data-vol
volumes:
- name: data-vol
hostPath:
path: /data
type: Directory

Lets check the pod to see if the mount is done right.

kubectl exec -it volume-test
Inside of the pod at the mounted folder

For Windows, I tried out with minikube but you have to mount a folder first to use as volume path. Run below commands to check current volumes first.

minikube ssh
df -hl

If you don’t see Users path mounted, mount it using below command while in host cmd. It will mount Users path of the host to /Users in minikube.

minikube mount "C://Users:/Users"

Then you can use your directory like so;

volumes:
- name: data-vol
hostPath:
path: /Users/voltest

So coming back to the main topic, Pod volume is ok if you have a single node or know for certain what node your pod is going to run but what if you have multiple nodes? With the above setup you are going to have volumes in all nodes with possibly different data.

Persistent volumes claim to solve this problem. For this we can deploy some form of NFS server ;

kind: Service
apiVersion: v1
metadata:
name: nfs-service
spec:
selector:
role: nfs
ports:
- name: tcp-2049
port: 2049
protocol: TCP
- name: udp-111
port: 111
protocol: UDP
---
kind: Pod
apiVersion: v1
metadata:
name: nfs-server-pod
labels:
role: nfs
spec:
containers:
- name: nfs-server-container
image: cpuguy83/nfs-server
securityContext:
privileged: true
args:
- /exports

After that use within our pod as below;

apiVersion: v1
kind: Pod
metadata:
labels:
label: some-label
name: volume-test
spec:
containers:
- image: nginx
name: some-container
volumeMounts:
- mountPath: /opt
name: nfs-volume
volumes:
- name: nfs-volume
nfs:
server: NFSIP #Your nfs server ip/dns
path: /voltest

So ok, we finally have a constant storage solution but;

  • We have to know NFS Server details
  • We would be creating a hard dependancy by ip and physical path
  • We have to remember what physical path is mounted to where. Mount mistakes are inevitable

We’re getting there so, to solve these new problems we create Persistent Volumes and match to them via requests in the form of Persistent Volume Claims. We can keep whatever data we want in Persistent Volumes that match with pods’ claims instead of giving physical locations and directly mounting to pods themselves. Persistent Volumes, on an ideal environment, would be managed by the sys admins and would probably have a yaml like below.

apiVersion: v1
kind: PersistentVolume
metadata:
labels:
somelabel: maybe-use-this-in-selector
name: persistent-volume
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 1Gi
nfs:
server: NFSIP #Your nfs server ip/dns
path: /voltest

Now, sys admins can control the presented NFS server, the access, volume, storage modes and the resources they want to allocate. Also since claims and volumes have 1–1 relationship, sysadmins might want to further makesure larger volumes don’t get claimed by small claims by adding labels. In a case where volumes do not become available

On the app side we first create a claim,

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: some-claim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 600Mi

And then use the claim in our pods like so;

volumes:
- name: persistent-volume-claim
persistentVolumeClaim:
claimName: some-claim

In the case above, our pod requests 600Mi of space to use as persistent storage and looks for available volumes. It matches with the one named persistent-volume and attaches itself to the pod. Henceforth, our pod would be keeping its data there.

What happens when NFS (or whatever storage) is down — There are multiple HA solutions for this specific case and this must be kept in mind when planning a production system.

The question that remains for another time:

  • What happens when a claim is deleted, will the pod data be recoverable. Volume doesnt get detached if ReclaimPolicy is set to Retain but how do we attach a new claim back to the old volume?

--

--

Yiğit İrez

Let’s talk devops, automation and architectures, everyday, all day long. https://www.linkedin.com/in/yigitirez/