So, how do we handle persistent data in our k8s cluster?

4 min readApr 11, 2021

Consider the case where you need data in a pod to persist for some reason, so what do you do? You create a volume;

Container storage —Data lives as long as the container is alive
Pod Volume — Usable as long as pod is in the same node
External Storage Volume — Same as above but with external storage so multiple pods from different nodes cnan use it
Persistent Volume — Usable from whatever pod in wherever node and no hard dependancies.

If you are testing something and you can control whatever node your pod is going to run in, a quick pod volume is the way to go.

In your pod .yml, the below yaml states that the /opt path in the container is linked to the /data path in our node. So even if the pod dies, the data within it will be available for reuse as long as it starts with the same volume details and also in the same node.

apiVersion: v1
kind: Pod
metadata: 
  labels:
    label: some-label
  name: volume-test
spec:
  containers: 
  - image: nginx
    name: some-container
    volumeMounts:
    - mountPath: /opt
      name: data-vol  
  volumes:
  - name: data-vol
    hostPath:
      path: /data
      type: Directory

Lets check the pod to see if the mount is done right.

kubectl exec -it volume-test

Inside of the pod at the mounted folder

For Windows, I tried out with minikube but you have to mount a folder first to use as volume path. Run below commands to check current volumes first.

minikube ssh
df -hl

If you don’t see Users path mounted, mount it using below command while in host cmd. It will mount Users path of the host to /Users in minikube.

minikube mount "C://Users:/Users"

Then you can use your directory like so;

volumes:
  - name: data-vol
    hostPath:
      path: /Users/voltest

So coming back to the main topic, Pod volume is ok if you have a single node or know for certain what node your pod is going to run but what if you have multiple nodes? With the above setup you are going to have volumes in all nodes with possibly different data.

Persistent volumes claim to solve this problem. For this we can deploy some form of NFS server ;

kind: Service
apiVersion: v1
metadata:
  name: nfs-service
spec:
  selector:
    role: nfs
  ports: 
    - name: tcp-2049
      port: 2049
      protocol: TCP 
    - name: udp-111
      port: 111
      protocol: UDP---  
kind: Pod
apiVersion: v1
metadata:
  name: nfs-server-pod
  labels:
    role: nfs
spec:
  containers:
    - name: nfs-server-container
      image: cpuguy83/nfs-server
      securityContext:
        privileged: true
      args:  
        - /exports

After that use within our pod as below;

apiVersion: v1
kind: Pod
metadata: 
  labels:
    label: some-label
  name: volume-test
spec: 
  containers: 
  - image: nginx
    name: some-container
    volumeMounts:
    - mountPath: /opt
      name: nfs-volume
  volumes:
    - name: nfs-volume
      nfs:  
        server: NFSIP #Your nfs server ip/dns
        path: /voltest

So ok, we finally have a constant storage solution but;

We have to know NFS Server details
We would be creating a hard dependancy by ip and physical path
We have to remember what physical path is mounted to where. Mount mistakes are inevitable

We’re getting there so, to solve these new problems we create Persistent Volumes and match to them via requests in the form of Persistent Volume Claims. We can keep whatever data we want in Persistent Volumes that match with pods’ claims instead of giving physical locations and directly mounting to pods themselves. Persistent Volumes, on an ideal environment, would be managed by the sys admins and would probably have a yaml like below.

apiVersion: v1
kind: PersistentVolume
metadata:  
  labels:
    somelabel: maybe-use-this-in-selector
  name: persistent-volume
spec: 
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 1Gi
  nfs:  
    server: NFSIP #Your nfs server ip/dns
    path: /voltest

Now, sys admins can control the presented NFS server, the access, volume, storage modes and the resources they want to allocate. Also since claims and volumes have 1–1 relationship, sysadmins might want to further makesure larger volumes don’t get claimed by small claims by adding labels. In a case where volumes do not become available

On the app side we first create a claim,

apiVersion: v1
kind: PersistentVolumeClaim
metadata:   
  name: some-claim
spec: 
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 600Mi

And then use the claim in our pods like so;

volumes:
  - name: persistent-volume-claim
    persistentVolumeClaim:
      claimName: some-claim

In the case above, our pod requests 600Mi of space to use as persistent storage and looks for available volumes. It matches with the one named persistent-volume and attaches itself to the pod. Henceforth, our pod would be keeping its data there.

What happens when NFS (or whatever storage) is down — There are multiple HA solutions for this specific case and this must be kept in mind when planning a production system.

The question that remains for another time:

What happens when a claim is deleted, will the pod data be recoverable. Volume doesnt get detached if ReclaimPolicy is set to Retain but how do we attach a new claim back to the old volume?

So, how do we handle persistent data in our k8s cluster?

Written by Yiğit İrez